[gt-user] questions about SEG, lost SGE jobs, and general stability
Prakashan Korambath
ppk at ats.ucla.edu
Mon Feb 6 16:27:05 CST 2012
I think you can find some solution in the old discussion.
http://lists.globus.org/pipermail/gt-user/2008-November/000083.html
Prakashan
On 02/06/2012 02:22 PM, Brian O'Connor wrote:
> Hi Prakashan,
>
> Yep, that's configuration we have here too, accounting=true and
> joblog=true. It only kinda works with joblog=false :-) There should
> be a note somewhere in the SGE SEG setup details on the GRAM site
> since this was something that I (and I'm sure other folks) stumbled
> on.
>
> I just found out from our admin that they do load the reporting log
> into a DB and Joseph indicated this would prevent SEG from running
> correctly. Does anyone have a workaround for this? Would it work to
> have another daemon periodically copy the reporting file to another
> location, appending to the file? Does anyone know the exact nature of
> why having SGE load it's reporting log into a DB would break SEG? Is
> it related to file handle closure or does it actually change the
> contents of what gets logged to the file?
>
> Thanks again for everyone's help!
>
> --Brian
>
>
> On Mon, Feb 6, 2012 at 5:01 PM, Prakashan Korambath<ppk at ats.ucla.edu> wrote:
>> Quick question on the SGE configuration:
>>
>> qconf -sconf command on our cluster shows
>>
>> reporting_params accounting=true reporting=true \
>> flush_time=00:00:15 joblog=true
>> sharelog=00:00:00
>>
>> If reporting=true then there should be a reporting file in
>> $SGE_ROOT/$SGE_CELL/common directory called reporting.
>>
>> I don't know about the reporting.0 file though.
>>
>> Prakashan
>>
>>
>>
>>
>>>
>>> My second question is related to SEG. I know SEG is the better way to
>>> go. I'm currently using polling, though, since our sysadmin and I
>>> weren't able to get SEG to work with SGE. When deployed we saw it
>>> parse through the SGE log and, once it hits the end, it started
>>> looking for a reporting.0 file rather than just wait for more data to
>>> be written to the log file. There is no reporting.0 file so the event
>>> generator just sat there and kept looking for it rather than watching
>>> the real log file. So question (2) has anyone seen problems with the
>>> SEG module for SGE looking for a non-existent reporting file and,
>>> therefore, missing new job events? Is there any way to explicitly
>>> tell SEG to read only one reporting file and not to try looking for
>>> rotated log files?
More information about the gt-user
mailing list