[gram-user] job cannot be submiited by globusws-run

Peter G Lane lane at mcs.anl.gov
Tue May 2 10:58:30 CDT 2006


What version of the Globus Toolkit are you using? I can't figure out why
it's trying to recover the same job twice. The recover method is
synchronized and also checks a flag to make sure it doesn't run twice.
This should be impossible. Do you have two deployments of the GRAM
services by any chance?

I guess for now you can just delete your ~/.globus/persisted/ directory
to clean up all the job persistence data. Then we can address the
original problem. Can you turn on full GRAM debug logging in
container-log4j.properties (just uncomment the appropriate line) and
just start your container (don't submit any jobs). You should see some
lines that list the command-line arguments for running the Fork SEG. If
not, send me the container log. If you do, reconstruct the command-line
from those logging statements and run it by hand. If you don't see any
output, adjust the timestamp (it's in seconds since the epoch) so that
it represents an earlier time and try again. You should eventually see
something like the following (the command should "hang"):

logan% $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s fork
-t 1145994457
001;1145994457;58d45b32-d494-11da-8c01-000d61215ff0:6616;2;0
001;1145994457;58d45b32-d494-11da-8c01-000d61215ff0:6616;8;0
001;1145994584;a4ceee44-d494-11da-8600-000d61215ff0:6709;2;0
001;1145994584;a4ceee44-d494-11da-8600-000d61215ff0:6709;8;0
001;1146023492;f36ab716-d4d7-11da-9124-000d61215ff0:11605;2;0
001;1146023492;f36ab716-d4d7-11da-9124-000d61215ff0:11605;8;0

Peter

On Fri, 2006-04-28 at 10:56 -0700, wenwen LI wrote:
> Here is the result:
> --------------------------------------------------------------------------------------------------------------------
> total 40
> -rw-rw-r--    1 globus   globus       6925 Apr 27 16:22
> 111fb780-d590-11da-b53b-00093d1067b1.xml
> -rw-rw-r--    1 globus   globus       6925 Apr 27 16:22
> 17889586-d579-11da-830a-00093d1067b1.xml
> -rw-rw-r--    1 globus   globus       6926 Apr 27 16:22
> 362de86c-d57c-11da-82c7-00093d1067b1.xml
> -rw-rw-r--    1 globus   globus       6925 Apr 27 16:22
> 438ae57e-d580-11da-babc-00093d1067b1.xml
> -rw-rw-r--    1 globus   globus       6925 Apr 27 16:22
> b6f4ab42-d59b-11da-a021-00093d1067b1.xml
> -rw-rw-r--    1 globus   globus          0 Apr 21 00:09 xph27814.tmp
> 
> -------------------------------------------------------------------------------------------------------------------
> And I have attached the 111fb780-d590-11da-b53b-00093d1067b1.xml  file
> in the mail.
> Thank you very much!
> 
> 
> Peter G Lane <lane at mcs.anl.gov> wrote:
>         On Thu, 2006-04-27 at 12:17 -0700, wenwen LI wrote:
>         > Here is the result:
>         > [root at srb var]# ls -l
>         > total 24
>         > -rw-r--r-- 1 globus globus 4831 Apr 12 15:46 container.log
>         > -rw-rw-rw- 1 globus globus 1346 Apr 26 23:12
>         > globus-fork.log
>         > -rw-rw-r-- 1 globus globus 46 Apr 27 04:02
>         > globus-jsm-fork.stamp
>         > -rw-rw-r-- 1 globus globus 46 Apr 27 04:02
>         > globus-jsm-multi.stamp
>         > drwxrwxr-x 3 globus globus 4096 Mar 30 17:19 lib
>         > I think it has the right permissions.
>         > But today when I start the web service container by user
>         'globus', it
>         > has such errors that never comes before,
>         >
>         -------------------------------------------------------------------------------------------------------------------
>         > [globus at srb postgre]$ globus-start-container
>         > 2006-04-27 16:22:07,937 INFO exec.ManagedExecutableJobHome
>         > [Thread-3,recover:163] Recovered resource with ID
>         > 438ae57e-d580-11da-babc-00093d1067b1.
>         > 2006-04-27 16:22:07,944 INFO exec.RunQueue [Thread-3,:54]
>         > Starting state machine with 16 run queues.
>         > 2006-04-27 16:22:09,027 INFO exec.ManagedExecutableJobHome
>         > [Thread-3,recover:163] Recovered resource with ID
>         > 111fb780-d590-11da-b53b-00093d1067b1.
>         > 2006-04-27 16:22:12,918 INFO exec.ManagedExecutableJobHome
>         > [Thread-6,recover:163] Recovered resource with ID
>         > 438ae57e-d580-11da-babc-00093d1067b1.
>         > 2006-04-27 16:22:12,919 INFO exec.ManagedExecutableJobHome
>         > [Thread-6,recover:163] Recovered resource with ID
>         > 111fb780-d590-11da-b53b-00093d1067b1.
>         > 2006-04-27 16:22:12,958 ERROR
>         utils.JobStateMonitorSubscriptionManager
>         > [Thread-23,subscribe:179] unable to monitor job for state
>         changes
>         > org.globus.exec.monitoring.AlreadyRegisteredException
>         
>         I don't understand how, but it looks like a job is being
>         recovered twice
>         (111fb780-d590-11da-b53b-00093d1067b1). What version of the
>         toolkit are
>         you using? Would it be possible for you to find the file in
>         the
>         container owner's
>         ~/.globus/persisted/-/ManagedExecutableJobResourceStateType/
>         directory named 111fb780-d590-11da-b53b-00093d1067b1.xml and
>         attach it
>         to your response. I'm wondering if the persistence data got
>         corrupted.
>         After that, if you delete that directory then you won't have
>         all these
>         jobs being recovered.
>         
>         Peter
>         
>         > at
>         >
>         org.globus.exec.monitoring.JobStateMonitor.registerJobID(JobStateMonitor.java:227)
>         > at
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.subscribe(JobStateMonitorSubscriptionManager.java:171)
>         > at
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.run(JobStateMonitorSubscriptionManager.java:136)
>         > 2006-04-27 16:22:12,963 ERROR
>         utils.JobStateMonitorSubscriptionManager
>         > [Thread-23,subscribe:179] unable to monitor job for state
>         changes
>         > org.globus.exec.monitoring.AlreadyRegisteredException
>         > at
>         >
>         org.globus.exec.monitoring.JobStateMonitor.registerJobID(JobStateMonitor.java:227)
>         > at
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.subscribe(JobStateMonitorSubscriptionManager.java:171)
>         > at
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.run(JobStateMonitorSubscriptionManager.java:136)
>         > 2006-04-27 16:22:12,963 WARN
>         factory.ManagedJobFactoryResource
>         > [Thread-3,run:164] Recovery exception
>         > org.globus.wsrf.NoSuchResourceException
>         > at
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
>         > at
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
>         > at
>         >
>         org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
>         > at org.globus.exec.service.factory.ManagedJobFactoryResource
>         > $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
>         > 2006-04-27 16:22:13,084 INFO exec.ManagedExecutableJobHome
>         > [Thread-6,recover:163] Recovered resource with ID
>         > 9a8bbb5c-d56a-11da-bb0b-00093d1067b1.
>         > 2006-04-27 16:22:13,206 INFO exec.ManagedExecutableJobHome
>         > [Thread-6,recover:163] Recovered resource with ID
>         > 17889586-d579-11da-830a-00093d1067b1.
>         > 2006-04-27 16:22:13,324 INFO exec.ManagedExecutableJobHome
>         > [Thread-6,recover:163] Recovered resource with ID
>         > 362de86c-d57c-11da-82c7-00093d1067b1.
>         > 2006-04-27 16:22:13,438 INFO exec.ManagedExecutableJobHome
>         > [Thread-6,recover:163] Recovered resource with ID
>         > b6f4ab42-d59b-11da-a021-00093d1067b1.
>         > 
>         >
>         -------------------------------------------------------------------------------------------------------------------
>         > 
>         > What's wrong with it? 
>         > 
>         > Peter G Lane wrote:
>         > On Wed, 2006-04-26 at 19:17 -0700, wenwen LI wrote:
>         > > Hi,
>         > > This is the information for globus-fork.conf:
>         > >
>         >
>         ----------------------------------------------------------------------------------------------------------------------------
>         > > [root at srb etc]# ls -l globus-fork.conf
>         > > -rw-rw-rw- 1 globus globus 47 Mar 30 17:20
>         > > globus-fork.conf
>         > 
>         > I wasn't clear enough. I want you to look *in*
>         > globus-fork.conf. It is a
>         > configuration file that contains a path to the fork SEG log
>         > file. It is
>         > the fork SEG log file that I want you to check for
>         > permissions. The
>         > globus-fork.conf file only needs to be -rw for the owner.
>         > 
>         > Peter
>         > 
>         > >
>         >
>         ---------------------------------------------------------------------------------------------------------------------------
>         > > After that I run the TESTS.pl by the user 'wenwen' , but
>         > still get
>         > > this
>         > >
>         >
>         ----------------------------------------------------------------------------------------------------------------------------
>         > > [wenwen at srb
>         > globus_scheduler_event_generator_fork_test]$ ./TESTS.pl
>         > > Warning: Do not start a service container while this test
>         > script is
>         > > running.
>         > > test-fork-seg....ok 
>         > > 1/1 skipped: Fork SEG not configured
>         > > All tests successful, 1 subtest skipped.
>         > > Files=1, Tests=1, 0 wallclock secs ( 0.01 cusr + 0.02 csys
>         =
>         > 0.03
>         > > CPU)
>         > >
>         >
>         ----------------------------------------------------------------------------------------------------------------------------
>         > > Then I submit a job, it says "job status:unsubmitted",then
>         > nothing
>         > > comes out under that sentence.
>         > > I think the webservice server cannot received my job
>         > request,
>         > > What's wrong with it?
>         > > Thanks in advance!
>         > > 
>         > > Wenwen
>         > > 
>         > > 
>         > > Peter G Lane wrote:
>         > > On Wed, 2006-04-26 at 15:50 -0700, wenwen LI wrote:
>         > > > Hi,everyone:
>         > > > 
>         > > > I start POSTGRESQL under user 'postgre',success; Then I
>         > run
>         > > the web
>         > > > service container in user 'globus', it starts well;but
>         > when
>         > > I run :
>         > > > globusrun-ws -submit -c /bin/true 
>         > > > by user 'wenwen' , I got such results:
>         > > > Submitting job...Done.
>         > > >
>         > >
>         >
>         -------------------------------------------------------------------------------------------------------
>         > > > Job ID: uuid:362de86c-d57c-11da-82c7-00093d1067b1
>         > > > Termination time: 04/27/2006 23:27 GMT
>         > > > (after waiting 2 minutes,I got)
>         > > > Current job state: Unsubmitted
>         > > > (Then nothing comes out in this window and in the web
>         > > service
>         > > > container window,
>         > > > nothing comes out, either)
>         > > >
>         > >
>         >
>         -------------------------------------------------------------------------------------------------------
>         > > > Then I run TESTS.pl by user 'wenwen' like this:
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > [wenwen at srb
>         > > globus_scheduler_event_generator_test]$ ./TESTS.pl
>         > > > seg-api-test............ok 
>         > > > seg-module-load-test....ok 
>         > > > seg-timestamp-test......ok 
>         > > > All tests successful.
>         > > > Files=3, Tests=6, 1 wallclock secs ( 0.09 cusr + 0.05
>         csys
>         > =
>         > > 0.14
>         > > > CPU)
>         > > > [wenwen at srb globus_scheduler_event_generator_test]$ cd
>         > > >
>         > >
>         >
>         $GLOBUS_LOCATION/test/globus_scheduler_event_generator_fork_test
>         > > > [wenwen at srb
>         > > globus_scheduler_event_generator_fork_test]$ ./TESTS.pl
>         > > > Warning: Do not start a service container while this
>         test
>         > > script is
>         > > > running.
>         > > > test-fork-seg....ok 
>         > > > 1/1 skipped: Fork SEG not configured
>         > > 
>         > > Check $GLOBUS_LOCATION/etc/globus-fork.conf for a valid
>         > path.
>         > > Check the
>         > > file pointed to by that path for proper permissions. It
>         > should
>         > > be world
>         > > readable and writable.
>         > > 
>         > > Peter
>         > > 
>         > > > All tests successful, 1 subtest skipped.
>         > > > Files=1, Tests=1, 0 wallclock secs ( 0.04 cusr + 0.00
>         csys
>         > =
>         > > 0.04
>         > > > CPU)
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > But I restart web service container:
>         > > > It gives such informations:
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > globus-start-container
>         > > > 2006-04-26 19:52:02,266 WARN
>         > > factory.ManagedJobFactoryResource
>         > > > [Thread-3,run:164] Recovery exception
>         > > > org.globus.wsrf.NoSuchResourceException
>         > > > at
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
>         > > > at
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
>         > > > at
>         > > >
>         > >
>         >
>         org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
>         > > > at
>         > org.globus.exec.service.factory.ManagedJobFactoryResource
>         > > > $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
>         > > > 2006-04-26 19:52:05,222 INFO exec.RunQueue
>         [Thread-6,:54]
>         > > > Starting state machine with 16 run queues.
>         > > > 2006-04-26 19:52:07,289 WARN
>         > > factory.ManagedJobFactoryResource
>         > > > [Thread-6,run:164] Recovery exception
>         > > > org.globus.wsrf.NoSuchResourceException
>         > > > at
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
>         > > > at
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
>         > > > at
>         > > >
>         > >
>         >
>         org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
>         > > > at
>         > org.globus.exec.service.factory.ManagedJobFactoryResource
>         > > > $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
>         > > > Starting SOAP server at:
>         > > https://129.174.124.107:8443/wsrf/services/ 
>         > > > With the following services:
>         > > > [1]:
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/TriggerFactoryService
>         > > > [2]:
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/DelegationTestService
>         > > > [3]:
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/SecureCounterService
>         > > > [4]:
>         > >
>         https://129.174.124.107:8443/wsrf/services/IndexServiceEntry
>         > > > [5]:
>         > >
>         https://129.174.124.107:8443/wsrf/services/DelegationService
>         > > > [6]:
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/InMemoryServiceGroupFactory
>         > > > [7]:
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/mds/test/execsource/IndexService
>         > > > [8]:
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/mds/test/subsource/IndexSe
>         > > > ......
>         > > > [51]
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > Can any body help??
>         > > > Thank you very much!
>         > > > 
>         > > > 
>         > > > Wenwen
>         > > > 
>         > > >
>         > >
>         >
>         ______________________________________________________________________
>         > > > Yahoo! Messenger with Voice. PC-to-Phone calls for
>         > > ridiculously low
>         > > > rates.
>         > > > 
>         > > > 
>         > > >
>         > >
>         >
>         ______________________________________________________________________
>         > > > Yahoo! Messenger with Voice. PC-to-Phone calls for
>         > > ridiculously low
>         > > > rates.
>         > > 
>         > > 
>         > > 
>         > > 
>         > >
>         >
>         ______________________________________________________________________
>         > > Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone
>         > calls. Great
>         > > rates starting at 1?min.
>         > > 
>         > > 
>         > >
>         >
>         ______________________________________________________________________
>         > > New Yahoo! Messenger with Voice. Call regular phones from
>         > your PC and
>         > > save big.
>         > 
>         > 
>         > 
>         > 
>         >
>         ______________________________________________________________________
>         > Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone
>         calls. Great
>         > rates starting at 1?min.
> 
> 
> 
> 
> ______________________________________________________________________
> New Yahoo! Messenger with Voice. Call regular phones from your PC and
> save big.
> 
> 
> ______________________________________________________________________
> Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+
> countries) for 2¢/min or less.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3720 bytes
Desc: not available
URL: <http://lists.globus.org/pipermail/gram-user/attachments/20060502/8952489b/attachment.bin>


More information about the gram-user mailing list