[gt-user] questions about SEG, lost SGE jobs, and general stability

Prakashan Korambath ppk at ats.ucla.edu
Mon Feb 6 16:27:05 CST 2012


I think you can find some solution in the old discussion.

http://lists.globus.org/pipermail/gt-user/2008-November/000083.html


Prakashan


On 02/06/2012 02:22 PM, Brian O'Connor wrote:
> Hi Prakashan,
>
> Yep, that's configuration we have here too,  accounting=true and
> joblog=true.  It only kinda works with joblog=false :-) There should
> be a note somewhere in the SGE SEG setup details on the GRAM site
> since this was something that I (and I'm sure other folks) stumbled
> on.
>
> I just found out from our admin that they do load the reporting log
> into a DB and Joseph indicated this would prevent SEG from running
> correctly.  Does anyone have a workaround for this?  Would it work to
> have another daemon periodically copy the reporting file to another
> location, appending to the file?  Does anyone know the exact nature of
> why having SGE load it's reporting log into a DB would break SEG?  Is
> it related to file handle closure or does it actually change the
> contents of what gets logged to the file?
>
> Thanks again for everyone's help!
>
> --Brian
>
>
> On Mon, Feb 6, 2012 at 5:01 PM, Prakashan Korambath<ppk at ats.ucla.edu>  wrote:
>> Quick question on the SGE configuration:
>>
>> qconf -sconf command on our cluster shows
>>
>> reporting_params             accounting=true reporting=true \
>>                              flush_time=00:00:15 joblog=true
>> sharelog=00:00:00
>>
>> If reporting=true then there should be a reporting file in
>> $SGE_ROOT/$SGE_CELL/common directory called reporting.
>>
>> I don't know about the reporting.0 file though.
>>
>> Prakashan
>>
>>
>>
>>
>>>
>>> My second question is related to SEG.  I know SEG is the better way to
>>> go.  I'm currently using polling, though, since our sysadmin and I
>>> weren't able to get SEG to work with SGE.  When deployed we saw it
>>> parse through the SGE log and, once it hits the end, it started
>>> looking for a reporting.0 file rather than just wait for more data to
>>> be written to the log file.  There is no reporting.0 file so the event
>>> generator just sat there and kept looking for it rather than watching
>>> the real log file.  So question (2) has anyone seen problems with the
>>> SEG module for SGE looking for a non-existent reporting file and,
>>> therefore, missing new job events?  Is there any way to explicitly
>>> tell SEG to read only one reporting file and not to try looking for
>>> rotated log files?


More information about the gt-user mailing list