[gt-user] questions about SEG, lost SGE jobs, and general stability
Prakashan Korambath
ppk at ats.ucla.edu
Mon Feb 6 16:01:24 CST 2012
Quick question on the SGE configuration:
qconf -sconf command on our cluster shows
reporting_params accounting=true reporting=true \
flush_time=00:00:15
joblog=true sharelog=00:00:00
If reporting=true then there should be a reporting file in
$SGE_ROOT/$SGE_CELL/common directory called reporting.
I don't know about the reporting.0 file though.
Prakashan
>
> My second question is related to SEG. I know SEG is the better way to
> go. I'm currently using polling, though, since our sysadmin and I
> weren't able to get SEG to work with SGE. When deployed we saw it
> parse through the SGE log and, once it hits the end, it started
> looking for a reporting.0 file rather than just wait for more data to
> be written to the log file. There is no reporting.0 file so the event
> generator just sat there and kept looking for it rather than watching
> the real log file. So question (2) has anyone seen problems with the
> SEG module for SGE looking for a non-existent reporting file and,
> therefore, missing new job events? Is there any way to explicitly
> tell SEG to read only one reporting file and not to try looking for
> rotated log files?
More information about the gt-user
mailing list