[gram-dev] Subject: PBS SEG not working properly
Charles Bacon
bacon at mcs.anl.gov
Mon Aug 11 10:37:48 CDT 2008
You can run the scheduler event generator by hand. Look at "ps" when
the container is up, and you'll get the right set of arguments. Try
running the SEG while you submit a job and see if any events are
getting into the system.
If that works, the problem could be firewalls, if that's what's
keeping the client from getting notifications. You can check that by
using globusrun-ws -batch instead, then doing queries instead of
waiting for notifications. The other thing that could be wrong is
that the JobIDs being tracked by GRAM might not correspond to the
JobIDs showing up in the scheduler logs. Like, if GRAM thinks the
job is called "12345.0" and the PBS logs are talking about jobs names
"12345.0.compute", it will not realize they are the same job. You can
fix that in the pbs.pm if that's the problem.
Charles
On Aug 11, 2008, at 9:11 AM, Andrew Howard wrote:
> Stu,
> Thanks for the reply. I've double-checked that the path and
> permissions are correct. What confuses me is that when I watch the
> container log when I submit a job, it sees that the job is submitted.
> (i.e., the Globus container log gives me the PBS job ID) It just never
> seems to tell the client that the job was submitted.
>
>
>
> On Fri, Aug 8, 2008 at 10:26 AM, Stuart Martin <smartin at mcs.anl.gov>
> wrote:
>> This email bounced due to majordomo finding u-n-s-u-b-m-i-t-t-e-d
>> in the
>> message body (/\buns\w*b/i at line 5), editing and resending...
>>
>> Andrew: take a look at the pbs section here:
>> http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/admin-index.html#s-wsgram-Interface_Config_Fragscheduler_specific_config
>>
>> Can you confirm that the path and permissions are correct? The
>> account the
>> container is running under must be able to read the pbs log file.
>>
>> -Stu
>>
>>>>>
>> Hi,
>> I've been struggling with getting Globus-WS working with PBS. It
>> worked at one point, but now it seems the PBS SEG isn't working
>> properly, even after I've configured it. It keeps giving me "Current
>> job state: Un$ubmitted"
>>
>> I ran $GLOBUS_LOCATION/setup/globus/setup-seg-pbs.pl and it produced
>> no errors. Then I ran the test at
>> $GLOBUS_LOCATION/test/globus_scheduler_event_generator_pbs_test/
>> TESTS.pl
>> and got this output:
>>
>> root at tg-steele globus_scheduler_event_generator_pbs_test]# ./TESTS.pl
>> Warning: Do not start a service container while this test script is
>> running.
>> test-pbs-seg....ok
>> All tests successful.
>> Files=1, Tests=1, 10 wallclock secs ( 0.05 cusr + 0.06 csys =
>> 0.11 CPU)
>>
>> Seeing that that was happy, I submitted a job to the server, but it
>> still returns "Current job state: Un$ubmitted":
>> [ahoward at tg-steele globus_test]$ globusrun-ws -submit -F
>> https://tg-steele.purdue.teragrid.org -Ft PBS -f hostname_ws.rsl
>> Submitting job...Done.
>> Job ID: uuid:4af67660-64b3-11dd-86dd-001ec9aa7d43
>> Termination time: 08/08/2008 19:01 GMT
>> Current job state: Un$ubmitted
>>
>> However, if I look in the $GLOBUS_LOCATION/var/container.log, I can
>> see that the job was successfully submitted to PBS:
>> 2008-08-07 15:01:51,426 INFO exec.StateMachine
>> [RunQueueThread_11,logJobAccepted:3424] Job
>> 4b298a00-64b3-11dd-a07c-da8d50e1996e accepted for local user
>> 'ahoward'
>> 2008-08-07 15:01:52,056 INFO exec.StateMachine
>> [RunQueueThread_15,logJobSubmitted:3436] Job
>> 4b298a00-64b3-11dd-a07c-da8d50e1996e submitted with local job ID
>> '150799.steele-adm.rcac.purdue.edu'
>>
>> FWIW, if I try running the SEG test script again as myself, it fails:
>> [ahoward at tg-steele globus_scheduler_event_generator_pbs_test]$ ./
>> TESTS.pl
>> Warning: Do not start a service container while this test script is
>> running.
>> test-pbs-seg....ok
>> 1/1 skipped: PBS SEG not configured
>> All tests successful, 1 subtest skipped.
>> Files=1, Tests=1, 0 wallclock secs ( 0.03 cusr + 0.00 csys =
>> 0.03 CPU)
>>
>>
>> Any suggestions? Because this has me completely stumped at the
>> moment.
>>
>> Thanks in advance!
>>
>> --
>> Andrew Howard
>> Rosen Center for Advanced Computing
>> Purdue University
>>
>> <<<
>>
>>
>
>
>
> --
> Andrew Howard
> Rosen Center for Advanced Computing
> Purdue University
>
More information about the gram-dev
mailing list