[gram-user] globusrun-ws: Job failed: The executable could not be started.
Martin Feller
feller at mcs.anl.gov
Mon Apr 6 14:55:11 CDT 2009
I assume you did not change Java code for the new job manager, but created
a SEG, and a perl module for it, right?
I'd start like this:
Start the container with debug logging enabled and verify that the executable
is really what you expect it to be:
Watch out for the following in the container logfile, and verify that
"executable => [...] contains the right executable:
------------------------------------------------------
PROCESSING INTERNAL STATE: -- Submit --
------------------------------------------------------
2009-04-06 14:39:34,804 DEBUG exec.StateMachine [RunQueueThread_2,runScript:2885] Perl Job Description: $description = {
directory => [ '/opt/martin' ],
condoros => [ 'LINUX' ],
xmlextensions => [ '1' ],
useforkstarter => [ '1' ],
condorarch => [ 'INTEL' ],
stderr => [ '/dev/null' ],
environment => [...],
executable => [ '/bin/date' ],
factoryendpoint => [...],
stdin => [ '/dev/null' ],
expandglobushome => [ '1' ],
jobdir => [ '/opt/martin/.globus/a7eb9100-22e2-11de-b8eb-cff0ce07cd2b' ],
jobtype => [ 'multiple' ],
stdout => [ '/dev/null' ],
expandglobuslocation => [ '1' ],
count => [ '1' ],
useforkstarter => [ '1' ],
};
If things look ok, check in $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/cemon.pm
what happens to $description->executable and how it's used.
Potentially add logging statements, e.g. like system("echo whatever > /tmp/myLog.txt");
-Martin
Neha Sharma wrote:
> Can you atleast point me in the right direction wrt where to look for
> the cause...container-real.log does not show anything more descriptive
> than the error itself
>
> thanks
> -neha
> On Apr 6, 2009, at 11:30 AM, Neha Sharma wrote:
>
>> Hi
>>
>> Yes, it does work with jobmanager Fork and jobmanager Condor
>>
>> Cemon is basically jobmanager condor modified to perform matchmaking
>> between an incoming job and various available resources.
>>
>> -Neha
>> On Apr 6, 2009, at 11:13 AM, Martin Feller wrote:
>>
>>> Does it work with Fork as local resource manager (-Ft Fork)?
>>> Just curious: what is Cemon?
>>>
>>> -Martin
>>>
>>>
>>> Neha Sharma wrote:
>>>> Hi
>>>>
>>>> I am not able to figure out what could be the cause of this error. I am
>>>> wondering if anyone on this list has seen this before..
>>>>
>>>> globusrun-ws: Job failed: The executable could not be started.
>>>>
>>>>
>>>> The command that I run is:
>>>> +++++++++++++++++++++++
>>>> globusrun-ws -dbg -submit -Jf neha.epr.fg -F
>>>> fermigridosg1.fnal.gov:9443
>>>> -Ft Cemon -streaming -se n.err -so n.out -c /bin/true
>>>>
>>>> The executable exists on the ws container node.
>>>>
>>>> Running container in full debug mode does not show anything besides the
>>>> same error as above
>>>>
>>>> === REQUEST MESSAGE (length 4834) (time 1239033124.965094000) ===
>>>> <soapenv:Envelope
>>>> xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
>>>> xmlns:xsd="http://www.w3.org/2001/XMLSchema"
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>>> xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/03/addressing"><soapenv:Header><wsa:MessageID
>>>>
>>>> soapenv:mustUnderstand="0">uuid:e03764a0-22c2-11de-8cea-bf45018d1031</wsa:MessageID><wsa:To
>>>>
>>>> soapenv:mustUnderstand="0">https://fnpcsrv1.fnal.gov:39240/wsrf/services/NotificationConsumerService</wsa:To><wsa:Action
>>>>
>>>> soapenv:mustUnderstand="0">http://docs.oasis-open.org/wsn/2004/06/wsn-WS-BaseNotification/Notify</wsa:Action><wsa:From
>>>>
>>>> soapenv:mustUnderstand="0"><wsa:Address>http://schemas.xmlsoap.org/ws/2004/03/addressing/role/anonymous</wsa:Address></wsa:From><ns06:ResourceID
>>>>
>>>> ns04:type="ns05:string"
>>>> xmlns:ns04="http://www.w3.org/2001/XMLSchema-instance"
>>>> xmlns:ns05="http://www.w3.org/2001/XMLSchema"
>>>> xmlns:ns06="http://www.globus.org/docs.oasis-open.org/wsn/2004/06/wsn-WS-BaseNotification-1.2-draft-01.wsdl"
>>>>
>>>> soapenv:mustUnderstand="0">dc10e2c0-22c2-11de-8ed9-001422086c92</ns06:ResourceID></soapenv:Header><soapenv:Body><Notify
>>>>
>>>> xmlns="http://docs.oasis-open.org/wsn/2004/06/wsn-WS-BaseNotification-1.2-draft-01.xsd"><NotificationMessage><Topic
>>>>
>>>> Dialect="http://docs.oasis-open.org/wsn/2004/06/TopicExpression/Simple"
>>>> xmlns:ns1="http://www.globus.org/namespaces/2004/10/gram/job/types">ns1:state</Topic><ProducerReference><wsa:Address>https://131.225.107.165:9443/wsrf/services/ManagedJobFactoryService</wsa:Address><wsa:ReferenceProperties><ns2:ResourceID
>>>>
>>>> xmlns:ns2="http://www.globus.org/namespaces/2004/10/gram/job">dcbd5910-22c2-11de-8cea-bf45018d1031</ns2:ResourceID></wsa:ReferenceProperties><wsa:ReferenceParameters/></ProducerReference><Message
>>>>
>>>> xsi:type="ns3:StateChangeNotificationMessageWrapperType"
>>>> xmlns:ns3="http://www.globus.org/namespaces/2004/10/gram/job"><ns3:stateChangeNotificationMessage><ns4:state
>>>>
>>>> xmlns:ns4="http://www.globus.org/namespaces/2004/10/gram/job/types">Failed</ns4:state><ns5:fault
>>>>
>>>> xmlns:ns5="http://www.globus.org/namespaces/2004/10/gram/job/faults"><ns5:executionFailedFault><ns6:Timestamp
>>>>
>>>> xmlns:ns6="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd">2009-04-06T15:52:00.310Z</ns6:Timestamp><ns7:Originator
>>>>
>>>> xmlns:ns7="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd"><wsa:Address>https://131.225.107.165:9443/wsrf/services/ManagedJobFactoryService</wsa:Address><wsa:ReferenceProperties><ns3:ResourceID>dcbd5910-22c2-11de-8cea-bf45018d1031</ns3:ResourceID></wsa:ReferenceProperties><wsa:ReferenceParameters/></ns7:Originator><ns8:Description
>>>>
>>>> xmlns:ns8="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd">The
>>>>
>>>> executable could not be started.</ns8:Description><ns9:FaultCause
>>>> xmlns:ns9="http://docs.oasis-open.org/wsrf/2004/06/wsrf-WS-BaseFaults-1.2-draft-01.xsd"><ns9:Timestamp>2009-04-06T15:52:00.310Z</ns9:Timestamp><ns9:ErrorCode
>>>>
>>>> dialect="http://www.globus.org/fault/stacktrace">
>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>> at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>>
>>>>
>>>> at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>>
>>>>
>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
>>>> at java.lang.Class.newInstance0(Class.java:350)
>>>> at java.lang.Class.newInstance(Class.java:303)
>>>> at org.globus.exec.utils.FaultUtils.makeFault(FaultUtils.java:485)
>>>> at
>>>> org.globus.exec.utils.FaultUtils.createExecutionFailedFault(FaultUtils.java:396)
>>>>
>>>>
>>>> at
>>>> org.globus.exec.service.exec.StateMachine.createFaultFromErrorCode(StateMachine.java:3120)
>>>>
>>>>
>>>> at
>>>> org.globus.exec.service.exec.StateMachine.processSubmitState(StateMachine.java:1172)
>>>>
>>>>
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>
>>>>
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>
>>>>
>>>> at java.lang.reflect.Method.invoke(Method.java:585)
>>>> at
>>>> org.globus.exec.service.exec.StateMachine.processState(StateMachine.java:329)
>>>>
>>>>
>>>> at org.globus.exec.service.exec.RunThread.run(RunThread.java:85)
>>>> </ns9:ErrorCode><ns9:Description>org.globus.exec.generated.ExecutionFailedFaultType</ns9:Description></ns9:FaultCause><ns5:stateWhenFailureOccurred>Unsubmitted</ns5:stateWhenFailureOccurred><ns5:command>submit</ns5:command><ns5:gt2ErrorCode>17</ns5:gt2ErrorCode><ns5:attribute>stdin</ns5:attribute></ns5:executionFailedFault></ns5:fault><ns10:exitCode
>>>>
>>>> xmlns:ns10="http://www.globus.org/namespaces/2004/10/gram/job/types">0</ns10:exitCode><ns11:holding
>>>>
>>>> xmlns:ns11="http://www.globus.org/namespaces/2004/10/gram/job/types">false</ns11:holding></ns3:stateChangeNotificationMessage></Message></NotificationMessage></Notify></soapenv:Body></soapenv:Envelope>
>>>>
>>>>
>>>> ----------------------------------------------
>>>> Current job state: Failed
>>>>
>>>> The sudoers file is also correct
>>>> ++++++++++++++++++++++++
>>>> # cat /etc/sudoers
>>>> Runas_Alias GLOBUSUSERS = ALL, !root
>>>>
>>>> globus ALL=(GLOBUSUSERS) \
>>>> NOPASSWD: \
>>>>
>>>> /usr/local/vdt-1.10.1/globus/libexec/globus-job-manager-script.pl *
>>>>
>>>> globus ALL=(GLOBUSUSERS) \
>>>> NOPASSWD: \
>>>>
>>>> /usr/local/vdt-1.10.1/globus/libexec/globus-gram-local-proxy-tool *
>>>>
>>>>
>>>>
>>>> Thanks
>>>> -Neha
>>>>
>>>>
>>>
>>
>
More information about the gram-user
mailing list