[gram-user] job cannot be submiited by globusws-run
Peter G Lane
lane at mcs.anl.gov
Mon May 8 18:08:21 CDT 2006
You haven't provided any of the information that I asked for except for
the GT version. Are my instructions not clear enough? Please tell me if
I'm not making any sense. Anyway, I already know your fork SEG isn't
working. Could you try those things from my last email. Thanks.
Peter
On Mon, 2006-05-08 at 14:32 -0700, wenwen LI wrote:
> I'm using GT4.0.1, but when I test to run globus-start-container
> today, it doesn't give errors or warnings, everything looks fine, but
> when I run "globusrun-ws -submit -c /bin/true", I got the errors
> telling that "Current Job State : unsubmitted".
> When I test to run TEST.pl with the web container shut down, I got
> such errors:
> ==================================================================
> [globus at srb globus_scheduler_event_generator_fork_test]$ ./TESTS.pl
> Warning: Do not start a service container while this test script is
> running.
> test-fork-seg....NOK 1# Test 1 got: 'Unable to run SEG with fork
> module: is it installed?' (test-fork-seg.pl at line 23)
> # Expected: '0'
> # test-fork-seg.pl line 23 is: skip($skip_all ? "Fork SEG not
> configured" : 0, &run_test, 0);
> test-fork-seg....FAILED test
> 1
> Failed 1/1 tests, 0.00% okay
> Failed Test Stat Wstat Total Fail Failed List of Failed
> -------------------------------------------------------------------------------
> test-fork-seg.pl 1 1 100.00% 1
> Failed 1/1 test scripts, 0.00% okay. 1/1 subtests failed, 0.00% okay.
> ===============================================================
>
> Thank you very much!
>
>
> Wenwen
>
>
>
> Peter G Lane <lane at mcs.anl.gov> wrote:
> What version of the Globus Toolkit are you using? I can't
> figure out why
> it's trying to recover the same job twice. The recover method
> is
> synchronized and also checks a flag to make sure it doesn't
> run twice.
> This should be impossible. Do you have two deployments of the
> GRAM
> services by any chance?
>
> I guess for now you can just delete your ~/.globus/persisted/
> directory
> to clean up all the job persistence data. Then we can address
> the
> original problem. Can you turn on full GRAM debug logging in
> container-log4j.properties (just uncomment the appropriate
> line) and
> just start your container (don't submit any jobs). You should
> see some
> lines that list the command-line arguments for running the
> Fork SEG. If
> not, send me the container log. If you do, reconstruct the
> command-line
> from those logging statements and run it by hand. If you don't
> see any
> output, adjust the timestamp (it's in seconds since the epoch)
> so that
> it represents an earlier time and try again. You should
> eventually see
> something like the following (the command should "hang"):
>
> logan%
> $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s
> fork
> -t 1145994457
> 001;1145994457;58d45b32-d494-11da-8c01-000d61215ff0:6616;2;0
> 001;1145994457;58d45b32-d494-11da-8c01-000d61215ff0:6616;8;0
> 001;1145994584;a4ceee44-d494-11da-8600-000d61215ff0:6709;2;0
> 001;1145994584;a4ceee44-d494-11da-8600-000d61215ff0:6709;8;0
> 001;1146023492;f36ab716-d4d7-11da-9124-000d61215ff0:11605;2;0
> 001;1146023492;f36ab716-d4d7-11da-9124-000d61215ff0:11605;8;0
>
> Peter
>
> On Fri, 2006-04-28 at 10:56 -0700, wenwen LI wrote:
> > Here is the result:
> >
> --------------------------------------------------------------------------------------------------------------------
> > total 40
> > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
> > 111fb780-d590-11da-b53b-00093d1067b1.xml
> > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
> > 17889586-d579-11da-830a-00093d1067b1.xml
> > -rw-rw-r-- 1 globus globus 6926 Apr 27 16:22
> > 362de86c-d57c-11da-82c7-00093d1067b1.xml
> > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
> > 438ae57e-d580-11da-babc-00093d1067b1.xml
> > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
> > b6f4ab42-d59b-11da-a021-00093d1067b1.xml
> > -rw-rw-r-- 1 globus globus 0 Apr 21 00:09 xph27814.tmp
> >
> >
> -------------------------------------------------------------------------------------------------------------------
> > And I have attached the
> 111fb780-d590-11da-b53b-00093d1067b1.xml file
> > in the mail.
> > Thank you very much!
> >
> >
> > Peter G Lane wrote:
> > On Thu, 2006-04-27 at 12:17 -0700, wenwen LI wrote:
> > > Here is the result:
> > > [root at srb var]# ls -l
> > > total 24
> > > -rw-r--r-- 1 globus globus 4831 Apr 12 15:46 container.log
> > > -rw-rw-rw- 1 globus globus 1346 Apr 26 23:12
> > > globus-fork.log
> > > -rw-rw-r-- 1 globus globus 46 Apr 27 04:02
> > > globus-jsm-fork.stamp
> > > -rw-rw-r-- 1 globus globus 46 Apr 27 04:02
> > > globus-jsm-multi.stamp
> > > drwxrwxr-x 3 globus globus 4096 Mar 30 17:19 lib
> > > I think it has the right permissions.
> > > But today when I start the web service container by user
> > 'globus', it
> > > has such errors that never comes before,
> > >
> >
> -------------------------------------------------------------------------------------------------------------------
> > > [globus at srb postgre]$ globus-start-container
> > > 2006-04-27 16:22:07,937 INFO exec.ManagedExecutableJobHome
> > > [Thread-3,recover:163] Recovered resource with ID
> > > 438ae57e-d580-11da-babc-00093d1067b1.
> > > 2006-04-27 16:22:07,944 INFO exec.RunQueue [Thread-3,:54]
> > > Starting state machine with 16 run queues.
> > > 2006-04-27 16:22:09,027 INFO exec.ManagedExecutableJobHome
> > > [Thread-3,recover:163] Recovered resource with ID
> > > 111fb780-d590-11da-b53b-00093d1067b1.
> > > 2006-04-27 16:22:12,918 INFO exec.ManagedExecutableJobHome
> > > [Thread-6,recover:163] Recovered resource with ID
> > > 438ae57e-d580-11da-babc-00093d1067b1.
> > > 2006-04-27 16:22:12,919 INFO exec.ManagedExecutableJobHome
> > > [Thread-6,recover:163] Recovered resource with ID
> > > 111fb780-d590-11da-b53b-00093d1067b1.
> > > 2006-04-27 16:22:12,958 ERROR
> > utils.JobStateMonitorSubscriptionManager
> > > [Thread-23,subscribe:179] unable to monitor job for state
> > changes
> > > org.globus.exec.monitoring.AlreadyRegisteredException
> >
> > I don't understand how, but it looks like a job is being
> > recovered twice
> > (111fb780-d590-11da-b53b-00093d1067b1). What version of the
> > toolkit are
> > you using? Would it be possible for you to find the file in
> > the
> > container owner's
> > ~/.globus/persisted/-/ManagedExecutableJobResourceStateType/
> > directory named 111fb780-d590-11da-b53b-00093d1067b1.xml and
> > attach it
> > to your response. I'm wondering if the persistence data got
> > corrupted.
> > After that, if you delete that directory then you won't have
> > all these
> > jobs being recovered.
> >
> > Peter
> >
> > > at
> > >
> >
> org.globus.exec.monitoring.JobStateMonitor.registerJobID(JobStateMonitor.java:227)
> > > at
> > >
> >
> org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.subscribe(JobStateMonitorSubscriptionManager.java:171)
> > > at
> > >
> >
> org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.run(JobStateMonitorSubscriptionManager.java:136)
> > > 2006-04-27 16:22:12,963 ERROR
> > utils.JobStateMonitorSubscriptionManager
> > > [Thread-23,subscribe:179] unable to monitor job for state
> > changes
> > > org.globus.exec.monitoring.AlreadyRegisteredException
> > > at
> > >
> >
> org.globus.exec.monitoring.JobStateMonitor.registerJobID(JobStateMonitor.java:227)
> > > at
> > >
> >
> org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.subscribe(JobStateMonitorSubscriptionManager.java:171)
> > > at
> > >
> >
> org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.run(JobStateMonitorSubscriptionManager.java:136)
> > > 2006-04-27 16:22:12,963 WARN
> > factory.ManagedJobFactoryResource
> > > [Thread-3,run:164] Recovery exception
> > > org.globus.wsrf.NoSuchResourceException
> > > at
> > >
> >
> org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
> > > at
> > >
> >
> org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
> > > at
> > >
> >
> org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
> > > at
> org.globus.exec.service.factory.ManagedJobFactoryResource
> > > $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
> > > 2006-04-27 16:22:13,084 INFO exec.ManagedExecutableJobHome
> > > [Thread-6,recover:163] Recovered resource with ID
> > > 9a8bbb5c-d56a-11da-bb0b-00093d1067b1.
> > > 2006-04-27 16:22:13,206 INFO exec.ManagedExecutableJobHome
> > > [Thread-6,recover:163] Recovered resource with ID
> > > 17889586-d579-11da-830a-00093d1067b1.
> > > 2006-04-27 16:22:13,324 INFO exec.ManagedExecutableJobHome
> > > [Thread-6,recover:163] Recovered resource with ID
> > > 362de86c-d57c-11da-82c7-00093d1067b1.
> > > 2006-04-27 16:22:13,438 INFO exec.ManagedExecutableJobHome
> > > [Thread-6,recover:163] Recovered resource with ID
> > > b6f4ab42-d59b-11da-a021-00093d1067b1.
> > >
> > >
> >
> -------------------------------------------------------------------------------------------------------------------
> > >
> > > What's wrong with it?
> > >
> > > Peter G Lane wrote:
> > > On Wed, 2006-04-26 at 19:17 -0700, wenwen LI wrote:
> > > > Hi,
> > > > This is the information for globus-fork.conf:
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------------------------------------------
> > > > [root at srb etc]# ls -l globus-fork.conf
> > > > -rw-rw-rw- 1 globus globus 47 Mar 30 17:20
> > > > globus-fork.conf
> > >
> > > I wasn't clear enough. I want you to look *in*
> > > globus-fork.conf. It is a
> > > configuration file that contains a path to the fork SEG
> log
> > > file. It is
> > > the fork SEG log file that I want you to check for
> > > permissions. The
> > > globus-fork.conf file only needs to be -rw for the owner.
> > >
> > > Peter
> > >
> > > >
> > >
> >
> ---------------------------------------------------------------------------------------------------------------------------
> > > > After that I run the TESTS.pl by the user 'wenwen' , but
> > > still get
> > > > this
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------------------------------------------
> > > > [wenwen at srb
> > > globus_scheduler_event_generator_fork_test]$ ./TESTS.pl
> > > > Warning: Do not start a service container while this
> test
> > > script is
> > > > running.
> > > > test-fork-seg....ok
> > > > 1/1 skipped: Fork SEG not configured
> > > > All tests successful, 1 subtest skipped.
> > > > Files=1, Tests=1, 0 wallclock secs ( 0.01 cusr + 0.02
> csys
> > =
> > > 0.03
> > > > CPU)
> > > >
> > >
> >
> ----------------------------------------------------------------------------------------------------------------------------
> > > > Then I submit a job, it says "job
> status:unsubmitted",then
> > > nothing
> > > > comes out under that sentence.
> > > > I think the webservice server cannot received my job
> > > request,
> > > > What's wrong with it?
> > > > Thanks in advance!
> > > >
> > > > Wenwen
> > > >
> > > >
> > > > Peter G Lane wrote:
> > > > On Wed, 2006-04-26 at 15:50 -0700, wenwen LI wrote:
> > > > > Hi,everyone:
> > > > >
> > > > > I start POSTGRESQL under user 'postgre',success; Then
> I
> > > run
> > > > the web
> > > > > service container in user 'globus', it starts well;but
> > > when
> > > > I run :
> > > > > globusrun-ws -submit -c /bin/true
> > > > > by user 'wenwen' , I got such results:
> > > > > Submitting job...Done.
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------------
> > > > > Job ID: uuid:362de86c-d57c-11da-82c7-00093d1067b1
> > > > > Termination time: 04/27/2006 23:27 GMT
> > > > > (after waiting 2 minutes,I got)
> > > > > Current job state: Unsubmitted
> > > > > (Then nothing comes out in this window and in the web
> > > > service
> > > > > container window,
> > > > > nothing comes out, either)
> > > > >
> > > >
> > >
> >
> -------------------------------------------------------------------------------------------------------
> > > > > Then I run TESTS.pl by user 'wenwen' like this:
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------
> > > > > [wenwen at srb
> > > > globus_scheduler_event_generator_test]$ ./TESTS.pl
> > > > > seg-api-test............ok
> > > > > seg-module-load-test....ok
> > > > > seg-timestamp-test......ok
> > > > > All tests successful.
> > > > > Files=3, Tests=6, 1 wallclock secs ( 0.09 cusr + 0.05
> > csys
> > > =
> > > > 0.14
> > > > > CPU)
> > > > > [wenwen at srb globus_scheduler_event_generator_test]$ cd
> > > > >
> > > >
> > >
> >
> $GLOBUS_LOCATION/test/globus_scheduler_event_generator_fork_test
> > > > > [wenwen at srb
> > > > globus_scheduler_event_generator_fork_test]$ ./TESTS.pl
> > > > > Warning: Do not start a service container while this
> > test
> > > > script is
> > > > > running.
> > > > > test-fork-seg....ok
> > > > > 1/1 skipped: Fork SEG not configured
> > > >
> > > > Check $GLOBUS_LOCATION/etc/globus-fork.conf for a valid
> > > path.
> > > > Check the
> > > > file pointed to by that path for proper permissions. It
> > > should
> > > > be world
> > > > readable and writable.
> > > >
> > > > Peter
> > > >
> > > > > All tests successful, 1 subtest skipped.
> > > > > Files=1, Tests=1, 0 wallclock secs ( 0.04 cusr + 0.00
> > csys
> > > =
> > > > 0.04
> > > > > CPU)
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------
> > > > > But I restart web service container:
> > > > > It gives such informations:
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------
> > > > > globus-start-container
> > > > > 2006-04-26 19:52:02,266 WARN
> > > > factory.ManagedJobFactoryResource
> > > > > [Thread-3,run:164] Recovery exception
> > > > > org.globus.wsrf.NoSuchResourceException
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
> > > > > at
> > > org.globus.exec.service.factory.ManagedJobFactoryResource
> > > > >
> $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
> > > > > 2006-04-26 19:52:05,222 INFO exec.RunQueue
> > [Thread-6,:54]
> > > > > Starting state machine with 16 run queues.
> > > > > 2006-04-26 19:52:07,289 WARN
> > > > factory.ManagedJobFactoryResource
> > > > > [Thread-6,run:164] Recovery exception
> > > > > org.globus.wsrf.NoSuchResourceException
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
> > > > > at
> > > org.globus.exec.service.factory.ManagedJobFactoryResource
> > > > >
> $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
> > > > > Starting SOAP server at:
> > > > https://129.174.124.107:8443/wsrf/services/
> > > > > With the following services:
> > > > > [1]:
> > > >
> > >
> >
> https://129.174.124.107:8443/wsrf/services/TriggerFactoryService
> > > > > [2]:
> > > >
> > >
> >
> https://129.174.124.107:8443/wsrf/services/DelegationTestService
> > > > > [3]:
> > > >
> > >
> >
> https://129.174.124.107:8443/wsrf/services/SecureCounterService
> > > > > [4]:
> > > >
> > https://129.174.124.107:8443/wsrf/services/IndexServiceEntry
> > > > > [5]:
> > > >
> > https://129.174.124.107:8443/wsrf/services/DelegationService
> > > > > [6]:
> > > > >
> > > >
> > >
> >
> https://129.174.124.107:8443/wsrf/services/InMemoryServiceGroupFactory
> > > > > [7]:
> > > > >
> > > >
> > >
> >
> https://129.174.124.107:8443/wsrf/services/mds/test/execsource/IndexService
> > > > > [8]:
> > > > >
> > > >
> > >
> >
> https://129.174.124.107:8443/wsrf/services/mds/test/subsource/IndexSe
> > > > > ......
> > > > > [51]
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------
> > > > > Can any body help??
> > > > > Thank you very much!
> > > > >
> > > > >
> > > > > Wenwen
> > > > >
> > > > >
> > > >
> > >
> >
> ______________________________________________________________________
> > > > > Yahoo! Messenger with Voice. PC-to-Phone calls for
> > > > ridiculously low
> > > > > rates.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> ______________________________________________________________________
> > > > > Yahoo! Messenger with Voice. PC-to-Phone calls for
> > > > ridiculously low
> > > > > rates.
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> ______________________________________________________________________
> > > > Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone
> > > calls. Great
> > > > rates starting at 1?min.
> > > >
> > > >
> > > >
> > >
> >
> ______________________________________________________________________
> > > > New Yahoo! Messenger with Voice. Call regular phones
> from
> > > your PC and
> > > > save big.
> > >
> > >
> > >
> > >
> > >
> >
> ______________________________________________________________________
> > > Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone
> > calls. Great
> > > rates starting at 1?min.
> >
> >
> >
> >
> >
> ______________________________________________________________________
> > New Yahoo! Messenger with Voice. Call regular phones from
> your PC and
> > save big.
> >
> >
> >
> ______________________________________________________________________
> > Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the
> US (and 30+
> > countries) for 2?min or less.
>
>
>
>
> ______________________________________________________________________
> How low will we go? Check out Yahoo! Messenger’s low PC-to-Phone call
> rates.
>
>
> ______________________________________________________________________
> Get amazing travel prices for air and hotel in one click on Yahoo!
> FareChase
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3720 bytes
Desc: not available
URL: <http://lists.globus.org/pipermail/gram-user/attachments/20060508/e35a2f4e/attachment.bin>
More information about the gram-user
mailing list