[gram-user] job cannot be submiited by globusws-run

Peter G Lane lane at mcs.anl.gov
Tue May 9 15:17:54 CDT 2006


On Tue, 2006-05-09 at 12:19 -0700, wenwen LI wrote:
> Thanks very much! It works now!
> I have followed your instruction, uncomment the debug line, then start
> globus-scheduler-event-generator -s..., then I started
> globus-start-container, 
> then by user 'wenwen', I submit a job, its status comes out as active!
> Thanks a lot!
> But can you tell me why this happens? Why I can submit a job before I
> run globus-scheduler-event-generator?

You shouldn't have to start the SEG manually. GRAM is supposed to start
it for you (that's where those log statements come from). In fact, GRAM
is using it's own SEG daemon instead of the one you started. I'm just
not sure why starting one manually would make it work. Does GRAM work
now if you don't start the SEG manually?

Peter

> 
> 
> Peter G Lane <lane at mcs.anl.gov> wrote: 
>         What version of the Globus Toolkit are you using? I can't
>         figure out why
>         it's trying to recover the same job twice. The recover method
>         is
>         synchronized and also checks a flag to make sure it doesn't
>         run twice.
>         This should be impossible. Do you have two deployments of the
>         GRAM
>         services by any chance?
>         
>         I guess for now you can just delete your ~/.globus/persisted/
>         directory
>         to clean up all the job persistence data. Then we can address
>         the
>         original problem. Can you turn on full GRAM debug logging in
>         container-log4j.properties (just uncomment the appropriate
>         line) and
>         just start your container (don't submit any jobs). You should
>         see some
>         lines that list the command-line arguments for running the
>         Fork SEG. If
>         not, send me the container log. If you do, reconstruct the
>         command-line
>         from those logging statements and run it by hand. If you don't
>         see any
>         output, adjust the timestamp (it's in seconds since the epoch)
>         so that
>         it represents an earlier time and try again. You should
>         eventually see
>         something like the following (the command should "hang"):
>         
>         logan%
>         $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s
>         fork
>         -t 1145994457
>         001;1145994457;58d45b32-d494-11da-8c01-000d61215ff0:6616;2;0
>         001;1145994457;58d45b32-d494-11da-8c01-000d61215ff0:6616;8;0
>         001;1145994584;a4ceee44-d494-11da-8600-000d61215ff0:6709;2;0
>         001;1145994584;a4ceee44-d494-11da-8600-000d61215ff0:6709;8;0
>         001;1146023492;f36ab716-d4d7-11da-9124-000d61215ff0:11605;2;0
>         001;1146023492;f36ab716-d4d7-11da-9124-000d61215ff0:11605;8;0
>         
>         Peter
>         
>         On Fri, 2006-04-28 at 10:56 -0700, wenwen LI wrote:
>         > Here is the result:
>         >
>         --------------------------------------------------------------------------------------------------------------------
>         > total 40
>         > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
>         > 111fb780-d590-11da-b53b-00093d1067b1.xml
>         > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
>         > 17889586-d579-11da-830a-00093d1067b1.xml
>         > -rw-rw-r-- 1 globus globus 6926 Apr 27 16:22
>         > 362de86c-d57c-11da-82c7-00093d1067b1.xml
>         > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
>         > 438ae57e-d580-11da-babc-00093d1067b1.xml
>         > -rw-rw-r-- 1 globus globus 6925 Apr 27 16:22
>         > b6f4ab42-d59b-11da-a021-00093d1067b1.xml
>         > -rw-rw-r-- 1 globus globus 0 Apr 21 00:09 xph27814.tmp
>         > 
>         >
>         -------------------------------------------------------------------------------------------------------------------
>         > And I have attached the
>         111fb780-d590-11da-b53b-00093d1067b1.xml file
>         > in the mail.
>         > Thank you very much!
>         > 
>         > 
>         > Peter G Lane wrote:
>         > On Thu, 2006-04-27 at 12:17 -0700, wenwen LI wrote:
>         > > Here is the result:
>         > > [root at srb var]# ls -l
>         > > total 24
>         > > -rw-r--r-- 1 globus globus 4831 Apr 12 15:46 container.log
>         > > -rw-rw-rw- 1 globus globus 1346 Apr 26 23:12
>         > > globus-fork.log
>         > > -rw-rw-r-- 1 globus globus 46 Apr 27 04:02
>         > > globus-jsm-fork.stamp
>         > > -rw-rw-r-- 1 globus globus 46 Apr 27 04:02
>         > > globus-jsm-multi.stamp
>         > > drwxrwxr-x 3 globus globus 4096 Mar 30 17:19 lib
>         > > I think it has the right permissions.
>         > > But today when I start the web service container by user
>         > 'globus', it
>         > > has such errors that never comes before,
>         > >
>         >
>         -------------------------------------------------------------------------------------------------------------------
>         > > [globus at srb postgre]$ globus-start-container
>         > > 2006-04-27 16:22:07,937 INFO exec.ManagedExecutableJobHome
>         > > [Thread-3,recover:163] Recovered resource with ID
>         > > 438ae57e-d580-11da-babc-00093d1067b1.
>         > > 2006-04-27 16:22:07,944 INFO exec.RunQueue [Thread-3,:54]
>         > > Starting state machine with 16 run queues.
>         > > 2006-04-27 16:22:09,027 INFO exec.ManagedExecutableJobHome
>         > > [Thread-3,recover:163] Recovered resource with ID
>         > > 111fb780-d590-11da-b53b-00093d1067b1.
>         > > 2006-04-27 16:22:12,918 INFO exec.ManagedExecutableJobHome
>         > > [Thread-6,recover:163] Recovered resource with ID
>         > > 438ae57e-d580-11da-babc-00093d1067b1.
>         > > 2006-04-27 16:22:12,919 INFO exec.ManagedExecutableJobHome
>         > > [Thread-6,recover:163] Recovered resource with ID
>         > > 111fb780-d590-11da-b53b-00093d1067b1.
>         > > 2006-04-27 16:22:12,958 ERROR
>         > utils.JobStateMonitorSubscriptionManager
>         > > [Thread-23,subscribe:179] unable to monitor job for state
>         > changes
>         > > org.globus.exec.monitoring.AlreadyRegisteredException
>         > 
>         > I don't understand how, but it looks like a job is being
>         > recovered twice
>         > (111fb780-d590-11da-b53b-00093d1067b1). What version of the
>         > toolkit are
>         > you using? Would it be possible for you to find the file in
>         > the
>         > container owner's
>         > ~/.globus/persisted/-/ManagedExecutableJobResourceStateType/
>         > directory named 111fb780-d590-11da-b53b-00093d1067b1.xml and
>         > attach it
>         > to your response. I'm wondering if the persistence data got
>         > corrupted.
>         > After that, if you delete that directory then you won't have
>         > all these
>         > jobs being recovered.
>         > 
>         > Peter
>         > 
>         > > at
>         > >
>         >
>         org.globus.exec.monitoring.JobStateMonitor.registerJobID(JobStateMonitor.java:227)
>         > > at
>         > >
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.subscribe(JobStateMonitorSubscriptionManager.java:171)
>         > > at
>         > >
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.run(JobStateMonitorSubscriptionManager.java:136)
>         > > 2006-04-27 16:22:12,963 ERROR
>         > utils.JobStateMonitorSubscriptionManager
>         > > [Thread-23,subscribe:179] unable to monitor job for state
>         > changes
>         > > org.globus.exec.monitoring.AlreadyRegisteredException
>         > > at
>         > >
>         >
>         org.globus.exec.monitoring.JobStateMonitor.registerJobID(JobStateMonitor.java:227)
>         > > at
>         > >
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.subscribe(JobStateMonitorSubscriptionManager.java:171)
>         > > at
>         > >
>         >
>         org.globus.exec.service.exec.utils.JobStateMonitorSubscriptionManager.run(JobStateMonitorSubscriptionManager.java:136)
>         > > 2006-04-27 16:22:12,963 WARN
>         > factory.ManagedJobFactoryResource
>         > > [Thread-3,run:164] Recovery exception
>         > > org.globus.wsrf.NoSuchResourceException
>         > > at
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
>         > > at
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
>         > > at
>         > >
>         >
>         org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
>         > > at
>         org.globus.exec.service.factory.ManagedJobFactoryResource
>         > > $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
>         > > 2006-04-27 16:22:13,084 INFO exec.ManagedExecutableJobHome
>         > > [Thread-6,recover:163] Recovered resource with ID
>         > > 9a8bbb5c-d56a-11da-bb0b-00093d1067b1.
>         > > 2006-04-27 16:22:13,206 INFO exec.ManagedExecutableJobHome
>         > > [Thread-6,recover:163] Recovered resource with ID
>         > > 17889586-d579-11da-830a-00093d1067b1.
>         > > 2006-04-27 16:22:13,324 INFO exec.ManagedExecutableJobHome
>         > > [Thread-6,recover:163] Recovered resource with ID
>         > > 362de86c-d57c-11da-82c7-00093d1067b1.
>         > > 2006-04-27 16:22:13,438 INFO exec.ManagedExecutableJobHome
>         > > [Thread-6,recover:163] Recovered resource with ID
>         > > b6f4ab42-d59b-11da-a021-00093d1067b1.
>         > > 
>         > >
>         >
>         -------------------------------------------------------------------------------------------------------------------
>         > > 
>         > > What's wrong with it? 
>         > > 
>         > > Peter G Lane wrote:
>         > > On Wed, 2006-04-26 at 19:17 -0700, wenwen LI wrote:
>         > > > Hi,
>         > > > This is the information for globus-fork.conf:
>         > > >
>         > >
>         >
>         ----------------------------------------------------------------------------------------------------------------------------
>         > > > [root at srb etc]# ls -l globus-fork.conf
>         > > > -rw-rw-rw- 1 globus globus 47 Mar 30 17:20
>         > > > globus-fork.conf
>         > > 
>         > > I wasn't clear enough. I want you to look *in*
>         > > globus-fork.conf. It is a
>         > > configuration file that contains a path to the fork SEG
>         log
>         > > file. It is
>         > > the fork SEG log file that I want you to check for
>         > > permissions. The
>         > > globus-fork.conf file only needs to be -rw for the owner.
>         > > 
>         > > Peter
>         > > 
>         > > >
>         > >
>         >
>         ---------------------------------------------------------------------------------------------------------------------------
>         > > > After that I run the TESTS.pl by the user 'wenwen' , but
>         > > still get
>         > > > this
>         > > >
>         > >
>         >
>         ----------------------------------------------------------------------------------------------------------------------------
>         > > > [wenwen at srb
>         > > globus_scheduler_event_generator_fork_test]$ ./TESTS.pl
>         > > > Warning: Do not start a service container while this
>         test
>         > > script is
>         > > > running.
>         > > > test-fork-seg....ok 
>         > > > 1/1 skipped: Fork SEG not configured
>         > > > All tests successful, 1 subtest skipped.
>         > > > Files=1, Tests=1, 0 wallclock secs ( 0.01 cusr + 0.02
>         csys
>         > =
>         > > 0.03
>         > > > CPU)
>         > > >
>         > >
>         >
>         ----------------------------------------------------------------------------------------------------------------------------
>         > > > Then I submit a job, it says "job
>         status:unsubmitted",then
>         > > nothing
>         > > > comes out under that sentence.
>         > > > I think the webservice server cannot received my job
>         > > request,
>         > > > What's wrong with it?
>         > > > Thanks in advance!
>         > > > 
>         > > > Wenwen
>         > > > 
>         > > > 
>         > > > Peter G Lane wrote:
>         > > > On Wed, 2006-04-26 at 15:50 -0700, wenwen LI wrote:
>         > > > > Hi,everyone:
>         > > > > 
>         > > > > I start POSTGRESQL under user 'postgre',success; Then
>         I
>         > > run
>         > > > the web
>         > > > > service container in user 'globus', it starts well;but
>         > > when
>         > > > I run :
>         > > > > globusrun-ws -submit -c /bin/true 
>         > > > > by user 'wenwen' , I got such results:
>         > > > > Submitting job...Done.
>         > > > >
>         > > >
>         > >
>         >
>         -------------------------------------------------------------------------------------------------------
>         > > > > Job ID: uuid:362de86c-d57c-11da-82c7-00093d1067b1
>         > > > > Termination time: 04/27/2006 23:27 GMT
>         > > > > (after waiting 2 minutes,I got)
>         > > > > Current job state: Unsubmitted
>         > > > > (Then nothing comes out in this window and in the web
>         > > > service
>         > > > > container window,
>         > > > > nothing comes out, either)
>         > > > >
>         > > >
>         > >
>         >
>         -------------------------------------------------------------------------------------------------------
>         > > > > Then I run TESTS.pl by user 'wenwen' like this:
>         > > > >
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > > [wenwen at srb
>         > > > globus_scheduler_event_generator_test]$ ./TESTS.pl
>         > > > > seg-api-test............ok 
>         > > > > seg-module-load-test....ok 
>         > > > > seg-timestamp-test......ok 
>         > > > > All tests successful.
>         > > > > Files=3, Tests=6, 1 wallclock secs ( 0.09 cusr + 0.05
>         > csys
>         > > =
>         > > > 0.14
>         > > > > CPU)
>         > > > > [wenwen at srb globus_scheduler_event_generator_test]$ cd
>         > > > >
>         > > >
>         > >
>         >
>         $GLOBUS_LOCATION/test/globus_scheduler_event_generator_fork_test
>         > > > > [wenwen at srb
>         > > > globus_scheduler_event_generator_fork_test]$ ./TESTS.pl
>         > > > > Warning: Do not start a service container while this
>         > test
>         > > > script is
>         > > > > running.
>         > > > > test-fork-seg....ok 
>         > > > > 1/1 skipped: Fork SEG not configured
>         > > > 
>         > > > Check $GLOBUS_LOCATION/etc/globus-fork.conf for a valid
>         > > path.
>         > > > Check the
>         > > > file pointed to by that path for proper permissions. It
>         > > should
>         > > > be world
>         > > > readable and writable.
>         > > > 
>         > > > Peter
>         > > > 
>         > > > > All tests successful, 1 subtest skipped.
>         > > > > Files=1, Tests=1, 0 wallclock secs ( 0.04 cusr + 0.00
>         > csys
>         > > =
>         > > > 0.04
>         > > > > CPU)
>         > > > >
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > > But I restart web service container:
>         > > > > It gives such informations:
>         > > > >
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > > globus-start-container
>         > > > > 2006-04-26 19:52:02,266 WARN
>         > > > factory.ManagedJobFactoryResource
>         > > > > [Thread-3,run:164] Recovery exception
>         > > > > org.globus.wsrf.NoSuchResourceException
>         > > > > at
>         > > > >
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
>         > > > > at
>         > > > >
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
>         > > > > at
>         > > > >
>         > > >
>         > >
>         >
>         org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
>         > > > > at
>         > > org.globus.exec.service.factory.ManagedJobFactoryResource
>         > > > >
>         $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
>         > > > > 2006-04-26 19:52:05,222 INFO exec.RunQueue
>         > [Thread-6,:54]
>         > > > > Starting state machine with 16 run queues.
>         > > > > 2006-04-26 19:52:07,289 WARN
>         > > > factory.ManagedJobFactoryResource
>         > > > > [Thread-6,run:164] Recovery exception
>         > > > > org.globus.wsrf.NoSuchResourceException
>         > > > > at
>         > > > >
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.get(ResourceHomeImpl.java:285)
>         > > > > at
>         > > > >
>         > > >
>         > >
>         >
>         org.globus.wsrf.impl.ResourceHomeImpl.find(ResourceHomeImpl.java:262)
>         > > > > at
>         > > > >
>         > > >
>         > >
>         >
>         org.globus.exec.service.exec.ManagedExecutableJobHome.recover(ManagedExecutableJobHome.java:160)
>         > > > > at
>         > > org.globus.exec.service.factory.ManagedJobFactoryResource
>         > > > >
>         $1RecoveryThread.run(ManagedJobFactoryResource.java:161)
>         > > > > Starting SOAP server at:
>         > > > https://129.174.124.107:8443/wsrf/services/ 
>         > > > > With the following services:
>         > > > > [1]:
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/TriggerFactoryService
>         > > > > [2]:
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/DelegationTestService
>         > > > > [3]:
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/SecureCounterService
>         > > > > [4]:
>         > > >
>         > https://129.174.124.107:8443/wsrf/services/IndexServiceEntry
>         > > > > [5]:
>         > > >
>         > https://129.174.124.107:8443/wsrf/services/DelegationService
>         > > > > [6]:
>         > > > >
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/InMemoryServiceGroupFactory
>         > > > > [7]:
>         > > > >
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/mds/test/execsource/IndexService
>         > > > > [8]:
>         > > > >
>         > > >
>         > >
>         >
>         https://129.174.124.107:8443/wsrf/services/mds/test/subsource/IndexSe
>         > > > > ......
>         > > > > [51]
>         > > > >
>         > > >
>         > >
>         >
>         ------------------------------------------------------------------------------------------------------
>         > > > > Can any body help??
>         > > > > Thank you very much!
>         > > > > 
>         > > > > 
>         > > > > Wenwen
>         > > > > 
>         > > > >
>         > > >
>         > >
>         >
>         ______________________________________________________________________
>         > > > > Yahoo! Messenger with Voice. PC-to-Phone calls for
>         > > > ridiculously low
>         > > > > rates.
>         > > > > 
>         > > > > 
>         > > > >
>         > > >
>         > >
>         >
>         ______________________________________________________________________
>         > > > > Yahoo! Messenger with Voice. PC-to-Phone calls for
>         > > > ridiculously low
>         > > > > rates.
>         > > > 
>         > > > 
>         > > > 
>         > > > 
>         > > >
>         > >
>         >
>         ______________________________________________________________________
>         > > > Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone
>         > > calls. Great
>         > > > rates starting at 1?min.
>         > > > 
>         > > > 
>         > > >
>         > >
>         >
>         ______________________________________________________________________
>         > > > New Yahoo! Messenger with Voice. Call regular phones
>         from
>         > > your PC and
>         > > > save big.
>         > > 
>         > > 
>         > > 
>         > > 
>         > >
>         >
>         ______________________________________________________________________
>         > > Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone
>         > calls. Great
>         > > rates starting at 1?min.
>         > 
>         > 
>         > 
>         > 
>         >
>         ______________________________________________________________________
>         > New Yahoo! Messenger with Voice. Call regular phones from
>         your PC and
>         > save big.
>         > 
>         > 
>         >
>         ______________________________________________________________________
>         > Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the
>         US (and 30+
>         > countries) for 2?min or less.
> 
> 
> 
> 
> ______________________________________________________________________
> Blab-away for as little as 1?min. Make PC-to-Phone Calls using Yahoo!
> Messenger with Voice.
> 
> 
> ______________________________________________________________________
> Get amazing travel prices for air and hotel in one click on Yahoo!
> FareChase 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3720 bytes
Desc: not available
URL: <http://lists.globus.org/pipermail/gram-user/attachments/20060509/fee6dc98/attachment.bin>


More information about the gram-user mailing list