[gt-user] questions about SEG, lost SGE jobs, and general stability
ppk at ats.ucla.edu
Mon Feb 6 16:35:50 CST 2012
This link is better. We don't run Arco any more. If I can
find my old script I will post it.
On 02/06/2012 02:27 PM, Prakashan Korambath wrote:
> I think you can find some solution in the old discussion.
> On 02/06/2012 02:22 PM, Brian O'Connor wrote:
>> Hi Prakashan,
>> Yep, that's configuration we have here too, accounting=true and
>> joblog=true. It only kinda works with joblog=false :-) There should
>> be a note somewhere in the SGE SEG setup details on the GRAM site
>> since this was something that I (and I'm sure other folks) stumbled
>> I just found out from our admin that they do load the reporting log
>> into a DB and Joseph indicated this would prevent SEG from running
>> correctly. Does anyone have a workaround for this? Would it work to
>> have another daemon periodically copy the reporting file to another
>> location, appending to the file? Does anyone know the exact nature of
>> why having SGE load it's reporting log into a DB would break SEG? Is
>> it related to file handle closure or does it actually change the
>> contents of what gets logged to the file?
>> Thanks again for everyone's help!
>> On Mon, Feb 6, 2012 at 5:01 PM, Prakashan Korambath<ppk at ats.ucla.edu> wrote:
>>> Quick question on the SGE configuration:
>>> qconf -sconf command on our cluster shows
>>> reporting_params accounting=true reporting=true \
>>> flush_time=00:00:15 joblog=true
>>> If reporting=true then there should be a reporting file in
>>> $SGE_ROOT/$SGE_CELL/common directory called reporting.
>>> I don't know about the reporting.0 file though.
>>>> My second question is related to SEG. I know SEG is the better way to
>>>> go. I'm currently using polling, though, since our sysadmin and I
>>>> weren't able to get SEG to work with SGE. When deployed we saw it
>>>> parse through the SGE log and, once it hits the end, it started
>>>> looking for a reporting.0 file rather than just wait for more data to
>>>> be written to the log file. There is no reporting.0 file so the event
>>>> generator just sat there and kept looking for it rather than watching
>>>> the real log file. So question (2) has anyone seen problems with the
>>>> SEG module for SGE looking for a non-existent reporting file and,
>>>> therefore, missing new job events? Is there any way to explicitly
>>>> tell SEG to read only one reporting file and not to try looking for
>>>> rotated log files?
More information about the gt-user