[gt-user] questions about SEG, lost SGE jobs, and general stability

Prakashan Korambath ppk at ats.ucla.edu
Mon Feb 6 16:01:24 CST 2012


Quick question on the SGE configuration:

qconf -sconf command on our cluster shows

reporting_params             accounting=true reporting=true \
                              flush_time=00:00:15 
joblog=true sharelog=00:00:00

If reporting=true then there should be a reporting file in 
$SGE_ROOT/$SGE_CELL/common directory called reporting.

I don't know about the reporting.0 file though.

Prakashan



>
> My second question is related to SEG.  I know SEG is the better way to
> go.  I'm currently using polling, though, since our sysadmin and I
> weren't able to get SEG to work with SGE.  When deployed we saw it
> parse through the SGE log and, once it hits the end, it started
> looking for a reporting.0 file rather than just wait for more data to
> be written to the log file.  There is no reporting.0 file so the event
> generator just sat there and kept looking for it rather than watching
> the real log file.  So question (2) has anyone seen problems with the
> SEG module for SGE looking for a non-existent reporting file and,
> therefore, missing new job events?  Is there any way to explicitly
> tell SEG to read only one reporting file and not to try looking for
> rotated log files?


More information about the gt-user mailing list