[gram-user] globusrun-ws: Job failed: The executable could not be started.

Martin Feller feller at mcs.anl.gov
Wed Apr 15 15:59:33 CDT 2009


Enable debug logging of the MessageLoggingHandler (uncomment
log4j.category.org.globus.wsrf.handlers.MessageLoggingHandler=DEBUG
in $GLOBUS_LOCATION/container-log4j.properties), restart the GT server,
submit 1 job and send the logfile.
Don't have all debugging on, only the MessageLoggingHandler.

-Martin

Neha Sharma wrote:
> Hi Martin
> 
> w.r.t
>> I assume you did not change Java code for the new job manager, but
>> created
>> a SEG, and a perl module for it, right?
> 
> the answer is yes.
> 
>  I have been trying several things on my end, putting in several debug
> statements , tracking an incoming job through its various stages till
> its gets submitted to job manager etc.
> 
> The problem seems to be with the delegation service.
> 
> ++++++++++++++
> On the client side
> ++++++++++++++
> -bash-3.00$ globusrun-ws -submit -Jf neha.epr.fg -F
> fgintosg1.fnal.gov:9443 -Ft Cemon -c /bin/true
> Submitting job...Done.
> Job ID: uuid:af6b48ce-29fb-11de-9f72-001422086c92
> Termination time: 04/16/2009 20:26 GMT
> Current job state: Failed
> Destroying job...Done.
> globusrun-ws: Job failed: Failed when creating the Perl job description
> ; nested exception is:
>     org.globus.delegation.DelegationException: Error getting delegation
> resource [Caused by: org.globus.wsrf.NoSuchResourceException]
> 
> -bash-3.00$
> 
> ++++++++++
> neha.epr.fg
> ++++++++++
> -bash-3.00$ cat neha.epr.fg
> <DelegatedEPR xsi:type="ns1:EndpointReferenceType"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns:ns1="http://schemas.xmlsoap.org/ws/2004/03/addressing">
>  <ns1:Address
> xsi:type="ns1:AttributedURI">https://131.225.81.96:9443/wsrf/services/DelegationService</ns1:Address>
> 
>  <ns1:ReferenceProperties xsi:type="ns1:ReferencePropertiesType">
>   <ns1:DelegationKey
> xmlns:ns1="http://www.globus.org/08/2004/delegationService">83aca030-2865-11de-8f8e-87fe9d875190</ns1:DelegationKey>
> 
>  </ns1:ReferenceProperties>
>  <ns1:ReferenceParameters xsi:type="ns1:ReferenceParametersType"/>
> </DelegatedEPR>
> -bash-3.00$
> 
> 
> and this is the relevant piece of log
> ++++++++++++++++++++++++++++++++++++++++
> 
> CONTAINER-REAL LOG
> ++++++++++++++++++++++++++++++++++++++++
> 
> 2009-04-15 15:26:23,064 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,getDelegatedCredential:154] new DelegatedCredential()
> 2009-04-15 15:26:23,064 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,<init>:214] Entering DelegatedCredential()
> 2009-04-15 15:26:23,065 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,getDelegationKey:611] Pulled out DelegationKey:
> 83aca030-2865-11de-8f8e-87fe9d875190
> 2009-04-15 15:26:23,065 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,<init>:230] userProxFile:
> /grid/home/fnalgrid/.globus/gram_job_proxy_83aca030-2865-11de-8f8e-87fe9d875190
> 
> 2009-04-15 15:26:23,065 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,<init>:236] saving this instance to instance table
> 2009-04-15 15:26:23,065 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,<init>:253] Leaving DelegatedCredential()
> 2009-04-15 15:26:23,066 DEBUG DelegatedCredential.performance
> [RunQueueThread_3,stop:71] [new DelegatedCredential()][RunQueueThread_3][2]
> 2009-04-15 15:26:23,066 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,getDelegatedCredential:165] delegated credential
> endpoint (AFTER): Address:
> https://131.225.81.96:9443/wsrf/services/DelegationService
> Reference property[0]:
> <ns1:DelegationKey
> xmlns:ns1="http://www.globus.org/08/2004/delegationService">83aca030-2865-11de-8f8e-87fe9d875190</ns1:DelegationKey>
> 
> 
> 2009-04-15 15:26:23,067 ERROR delegation.DelegationUtil
> [RunQueueThread_3,getDelegationResource:253] Error getting delegation
> resource
> org.globus.wsrf.NoSuchResourceException
>     at
> org.globus.delegation.service.DelegationResource.load(DelegationResource.java:405)
> 
>     at
> org.globus.delegation.service.DelegationHome.find(DelegationHome.java:53)
>     at
> org.globus.delegation.DelegationUtil.getDelegationResource(DelegationUtil.java:251)
> 
>     at
> org.globus.delegation.DelegationUtil.registerDelegationListener(DelegationUtil.java:166)
> 
>     at
> org.globus.exec.service.utils.DelegatedCredential.getDelegatedCredential(DelegatedCredential.java:175)
> 
>     at
> org.globus.exec.service.job.ManagedJobResourceImpl.getJobCredential(ManagedJobResourceImpl.java:421)
> 
>     at
> org.globus.exec.service.exec.ManagedExecutableJobResource.getUserProxyFileName(ManagedExecutableJobResource.java:397)
> 
>     at
> org.globus.exec.service.exec.ManagedExecutableJobResource.initJobEnvironment(ManagedExecutableJobResource.java:457)
> 
>     at
> org.globus.exec.service.exec.ManagedExecutableJobResource.initPerlJobDescription(ManagedExecutableJobResource.java:364)
> 
>     at
> org.globus.exec.service.exec.StateMachine.processNoneState(StateMachine.java:550)
> 
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 
>     at java.lang.reflect.Method.invoke(Method.java:585)
>     at
> org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:332)
> 
>     at org.globus.exec.service.exec.RunThread.run(RunThread.java:85)
> 2009-04-15 15:26:23,068 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,credentialDeleted:499] entering credentialDeleted()
> 2009-04-15 15:26:23,068 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,credentialDeleted:519] removed DelegatedCredential object
> 2009-04-15 15:26:23,069 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,removeUserProxy:389] Removing job user proxy.
> 2009-04-15 15:26:23,069 DEBUG utils.GlobusShToolsProperties
> [RunQueueThread_3,getInstance:43] entering getInstance()
> 2009-04-15 15:26:23,069 DEBUG utils.GlobusShToolsProperties
> [RunQueueThread_3,getInstance:75] leaving getInstance()
> 2009-04-15 15:26:23,069 DEBUG utils.AuthorizationHelper
> [RunQueueThread_3,isAuthorizationGridmap:80] Entering/Exiting
> isAuthorizationGridmap()
> 2009-04-15 15:26:23,070 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,removeUserProxy:450] Executing command:
> /usr/bin/sudo -u fnalgrid -S
> /usr/local/vdt-1.10.1/globus/libexec/globus-gram-local-proxy-tool
> /usr/local/vdt-1.10.1/globus -delete
> /grid/home/fnalgrid/.globus/gram_job_proxy_83aca030-2865-11de-8f8e-87fe9d875190
> 
> 2009-04-15 15:26:23,208 DEBUG utils.DelegatedCredential
> [RunQueueThread_3,credentialDeleted:533] leaving credentialDeleted()
> 2009-04-15 15:26:23,208 DEBUG utils.FaultUtils
> [RunQueueThread_3,makeFault:460] Fault Class: class
> org.globus.exec.generated.FaultType
> 2009-04-15 15:26:23,208 DEBUG utils.FaultUtils
> [RunQueueThread_3,makeFault:461] Resource Key:
> {http://www.globus.org/namespaces/2004/10/gram/job}ResourceID=afba3600-29fb-11de-80f0-aa5e60e80f64
> 
> 
> 
> The stacktrace above indicates that delegation service cannot be
> contacted when trying to retrieve the job credential..
> 
>  How we can make sure on our end that delegation service is functioning
> correctly and if there is any option that should be passed to the
> globusrun-ws command..which we may be missing..
> 
> This is a very high priority task  and I really appreciate your timely
> response.
> 
> Thanks
> -Neha
> 
> On Apr 6, 2009, at 2:55 PM, Martin Feller wrote:
> 
>> I assume you did not change Java code for the new job manager, but
>> created
>> a SEG, and a perl module for it, right?
>>
>> I'd start like this:
>>
>> Start the container with debug logging enabled and verify that the
>> executable
>> is really what you expect it to be:
>> Watch out for the following in the container logfile, and verify that
>> "executable => [...] contains the right executable:
>>
>> ------------------------------------------------------
>> PROCESSING INTERNAL STATE:  -- Submit --
>> ------------------------------------------------------
>> 2009-04-06 14:39:34,804 DEBUG exec.StateMachine
>> [RunQueueThread_2,runScript:2885] Perl Job Description: $description = {
>>    directory => [ '/opt/martin' ],
>>    condoros => [ 'LINUX' ],
>>    xmlextensions => [ '1' ],
>>    useforkstarter => [ '1' ],
>>    condorarch => [ 'INTEL' ],
>>    stderr => [ '/dev/null' ],
>>    environment => [...],
>>    executable => [ '/bin/date' ],
>>    factoryendpoint => [...],
>>    stdin => [ '/dev/null' ],
>>    expandglobushome => [ '1' ],
>>    jobdir => [
>> '/opt/martin/.globus/a7eb9100-22e2-11de-b8eb-cff0ce07cd2b' ],
>>    jobtype => [ 'multiple' ],
>>    stdout => [ '/dev/null' ],
>>    expandglobuslocation => [ '1' ],
>>    count => [ '1' ],
>>    useforkstarter => [ '1' ],
>> };
>>
>>
>> If things look ok, check in
>> $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/cemon.pm
>> what happens to $description->executable and how it's used.
>> Potentially add logging statements, e.g. like system("echo whatever >
>> /tmp/myLog.txt");
>>
>> -Martin
>>
>>
>> Neha Sharma wrote:
>>> Can you atleast point me in the right direction  wrt where to look for
>>> the cause...container-real.log does not show anything more descriptive
>>> than the error itself
>>>
>>> thanks
>>> -neha
>>> On Apr 6, 2009, at 11:30 AM, Neha Sharma wrote:
>>>
>>>> Hi
>>>>
>>>> Yes, it does work with jobmanager Fork and jobmanager Condor
>>>>
>>>> Cemon is basically jobmanager condor modified to perform matchmaking
>>>> between an incoming job and various available resources.
>>>>
>>>> -Neha
>>>> On Apr 6, 2009, at 11:13 AM, Martin Feller wrote:
>>>>
>>>>> Does it work with Fork as local resource manager (-Ft Fork)?
>>>>> Just curious: what is Cemon?
>>>>>
>>>>> -Martin
>>>>>
>>>>>
>>>>> Neha Sharma wrote:
>>>>>> Hi
>>>>>>
>>>>>> I am not able to figure out what could be the cause of this error.
>>>>>> I am
>>>>>> wondering if anyone on this list has seen this before..
>>>>>>
>>>>>> globusrun-ws: Job failed: The executable could not be started.
>>>>>>
>>>>>>
>>>>>> The command that I run is:
>>>>>> +++++++++++++++++++++++
>>>>>> globusrun-ws -dbg -submit -Jf neha.epr.fg -F
>>>>>> fermigridosg1.fnal.gov:9443
>>>>>> -Ft Cemon -streaming -se n.err -so n.out -c /bin/true
>>>>>>
>>>>>> The executable exists on the ws container node.
>>>>>>
>>>>>> Running container in full debug mode does not show anything
>>>>>> besides the
>>>>>> same error as above
>>>>>>
>>>>>> === REQUEST MESSAGE (length 4834) (time 1239033124.965094000) ===
>>>>>> <soapenv:Envelope
>>>>>> xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
>>>>>> xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>>>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>>>> xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"><soapenv:Header><wsa:MessageID
>>>>>>
>>>>>>
>>>>>> soapenv:mustUnderstand="0">uuid:e03764a0-22c2-11de-8cea-bf45018d1031</wsa:MessageID><wsa:To
>>>>>>
>>>>>>
>>>>>> soapenv:mustUnderstand="0">https://fnpcsrv1.fnal.gov:39240/wsrf/services/NotificationConsumerService</wsa:To><wsa:Action
>>>>>>
>>>>>>
>>>>>> soapenv:mustUnderstand="0">http://docs.oasis-open.org/wsn/2004/06/wsn-WS-BaseNotification/Notify</wsa:Action><wsa:From
>>>>>>
>>>>>>
>>>>>> soapenv:mustUnderstand="0"><wsa:Address>http://schemas.xmlsoap.org/ws/2004/03/addressing/role/anonymous</wsa:Address></wsa:From><ns06:ResourceID
>>>>>>
>>>>>>
>>>>>> ns04:type="ns05:string"
>>>>>> xmlns:ns04="http://www.w3.org/2001/XMLSchema-instance"
>>>>>> xmlns:ns05="http://www.w3.org/2001/XMLSchema"
>>>>>> xmlns:ns06="http://www.globus.org/docs.oasis-open.org/wsn/2004/06/wsn-WS-BaseNotification-1.2-draft-01.wsdl"
>>>>>>
>>>>>>
>>>>>> soapenv:mustUnderstand="0">dc10e2c0-22c2-11de-8ed9-001422086c92</ns06:ResourceID></soapenv:Header><soapenv:Body><Notify
>>>>>>
>>>>>>
>>>>>> xmlns="http://docs.oasis-open.org/wsn/2004/06/wsn-WS-BaseNotification-1.2-draft-01.xsd"><NotificationMessage><Topic
>>>>>>
>>>>>>
>>>>>> Dialect="http://docs.oasis-open.org/wsn/2004/06/TopicExpression/Simple"
>>>>>>
>>>>>> xmlns:ns1="http://www.globus.org/namespaces/2004/10/gram/job/types">ns1:state</Topic><ProducerReference><wsa:Address>https://131.225.107.165:9443/wsrf/services/ManagedJobFactoryService</wsa:Address><wsa:ReferenceProperties><ns2:ResourceID
>>>>>>
>>>>>>
>>>>>> xmlns:ns2="http://www.globus.org/namespaces/2004/10/gram/job">dcbd5910-22c2-11de-8cea-bf45018d1031</ns2:ResourceID></wsa:ReferenceProperties><wsa:ReferenceParameters/></ProducerReference><Message
>>>>>>
>>>>>>
>>>>>> xsi:type="ns3:StateChangeNotificationMessageWrapperType"
>>>>>> xmlns:ns3="http://www.globus.org/namespaces/2004/10/gram/job"><ns3:stateChangeNotificationMessage><ns4:state
>>>>>>
>>>>>>
>>>>>> xmlns:ns4="http://www.globus.org/namespaces/2004/10/gram/job/types">Failed</ns4:state><ns5:fault
>>>>>>
>>>>>>
>>>>>> xmlns:ns5="http://www.globus.org/namespaces/2004/10/gram/job/faults"><ns5:executionFailedFault><ns6:Timestamp
>>>>>>
>>>>>>
>>>>>> xmlns:ns6="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd">2009-04-06T15:52:00.310Z</ns6:Timestamp><ns7:Originator
>>>>>>
>>>>>>
>>>>>> xmlns:ns7="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd"><wsa:Address>https://131.225.107.165:9443/wsrf/services/ManagedJobFactoryService</wsa:Address><wsa:ReferenceProperties><ns3:ResourceID>dcbd5910-22c2-11de-8cea-bf45018d1031</ns3:ResourceID></wsa:ReferenceProperties><wsa:ReferenceParameters/></ns7:Originator><ns8:Description
>>>>>>
>>>>>>
>>>>>> xmlns:ns8="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd">The
>>>>>>
>>>>>>
>>>>>> executable could not be started.</ns8:Description><ns9:FaultCause
>>>>>> xmlns:ns9="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd"><ns9:Timestamp>2009-04-06T15:52:00.310Z</ns9:Timestamp><ns9:ErrorCode
>>>>>>
>>>>>>
>>>>>> dialect="http://www.globus.org/fault/stacktrace">
>>>>>>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>>> Method)
>>>>>>  at
>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at
>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
>>>>>>  at java.lang.Class.newInstance0(Class.java:350)
>>>>>>  at java.lang.Class.newInstance(Class.java:303)
>>>>>>  at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:485)
>>>>>>  at
>>>>>> org.globus.exec.utils.FaultUtils.createExecutionFailedFault(FaultUtils.java:396)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at
>>>>>> org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(StateMachine.java:3120)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at
>>>>>> org.globus.exec.service.exec.StateMachine.processSubmitState(StateMachine.java:1172)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>  at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at java.lang.reflect.Method.invoke(Method.java:585)
>>>>>>  at
>>>>>> org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:329)
>>>>>>
>>>>>>
>>>>>>
>>>>>>  at org.globus.exec.service.exec.RunThread.run(RunThread.java:85)
>>>>>> </ns9:ErrorCode><ns9:Description>org.globus.exec.generated.ExecutionFailedFaultType</ns9:Description></ns9:FaultCause><ns5:stateWhenFailureOccurred>Unsubmitted</ns5:stateWhenFailureOccurred><ns5:command>submit</ns5:command><ns5:gt2ErrorCode>17</ns5:gt2ErrorCode><ns5:attribute>stdin</ns5:attribute></ns5:executionFailedFault></ns5:fault><ns10:exitCode
>>>>>>
>>>>>>
>>>>>> xmlns:ns10="http://www.globus.org/namespaces/2004/10/gram/job/types">0</ns10:exitCode><ns11:holding
>>>>>>
>>>>>>
>>>>>> xmlns:ns11="http://www.globus.org/namespaces/2004/10/gram/job/types">false</ns11:holding></ns3:stateChangeNotificationMessage></Message></NotificationMessage></Notify></soapenv:Body></soapenv:Envelope>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------
>>>>>> Current job state: Failed
>>>>>>
>>>>>> The sudoers file is also correct
>>>>>> ++++++++++++++++++++++++
>>>>>> # cat /etc/sudoers
>>>>>> Runas_Alias GLOBUSUSERS = ALL, !root
>>>>>>
>>>>>> globus ALL=(GLOBUSUSERS) \
>>>>>>      NOPASSWD: \
>>>>>>
>>>>>> /usr/local/vdt-1.10.1/globus/libexec/globus-job-manager-script.pl *
>>>>>>
>>>>>> globus ALL=(GLOBUSUSERS) \
>>>>>>      NOPASSWD: \
>>>>>>
>>>>>> /usr/local/vdt-1.10.1/globus/libexec/globus-gram-local-proxy-tool *
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> -Neha
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 



More information about the gram-user mailing list