[gt-user] questions about SEG, lost SGE jobs, and general stability

Prakashan Korambath ppk at ats.ucla.edu
Mon Feb 6 16:35:50 CST 2012


This link is better.  We don't run Arco any more.  If I can 
find my old script I will post it.

http://lists.globus.org/pipermail/gt-user/2008-November/000061.html

Prakashan


On 02/06/2012 02:27 PM, Prakashan Korambath wrote:
> I think you can find some solution in the old discussion.
>
> http://lists.globus.org/pipermail/gt-user/2008-November/000083.html
>
>
> Prakashan
>
>
> On 02/06/2012 02:22 PM, Brian O'Connor wrote:
>> Hi Prakashan,
>>
>> Yep, that's configuration we have here too,  accounting=true and
>> joblog=true.  It only kinda works with joblog=false :-) There should
>> be a note somewhere in the SGE SEG setup details on the GRAM site
>> since this was something that I (and I'm sure other folks) stumbled
>> on.
>>
>> I just found out from our admin that they do load the reporting log
>> into a DB and Joseph indicated this would prevent SEG from running
>> correctly.  Does anyone have a workaround for this?  Would it work to
>> have another daemon periodically copy the reporting file to another
>> location, appending to the file?  Does anyone know the exact nature of
>> why having SGE load it's reporting log into a DB would break SEG?  Is
>> it related to file handle closure or does it actually change the
>> contents of what gets logged to the file?
>>
>> Thanks again for everyone's help!
>>
>> --Brian
>>
>>
>> On Mon, Feb 6, 2012 at 5:01 PM, Prakashan Korambath<ppk at ats.ucla.edu>   wrote:
>>> Quick question on the SGE configuration:
>>>
>>> qconf -sconf command on our cluster shows
>>>
>>> reporting_params             accounting=true reporting=true \
>>>                               flush_time=00:00:15 joblog=true
>>> sharelog=00:00:00
>>>
>>> If reporting=true then there should be a reporting file in
>>> $SGE_ROOT/$SGE_CELL/common directory called reporting.
>>>
>>> I don't know about the reporting.0 file though.
>>>
>>> Prakashan
>>>
>>>
>>>
>>>
>>>>
>>>> My second question is related to SEG.  I know SEG is the better way to
>>>> go.  I'm currently using polling, though, since our sysadmin and I
>>>> weren't able to get SEG to work with SGE.  When deployed we saw it
>>>> parse through the SGE log and, once it hits the end, it started
>>>> looking for a reporting.0 file rather than just wait for more data to
>>>> be written to the log file.  There is no reporting.0 file so the event
>>>> generator just sat there and kept looking for it rather than watching
>>>> the real log file.  So question (2) has anyone seen problems with the
>>>> SEG module for SGE looking for a non-existent reporting file and,
>>>> therefore, missing new job events?  Is there any way to explicitly
>>>> tell SEG to read only one reporting file and not to try looking for
>>>> rotated log files?


More information about the gt-user mailing list