[gram-user] globusrun-ws: Job failed: The executable could not be started.]
Steven Timm
timm at fnal.gov
Thu Apr 16 22:42:05 CDT 2009
On Thu, 16 Apr 2009, Neha Sharma wrote:
> Hi Martin
>
> We had tried running the same command manually earlier.
>
> I must mention that at this stage, the behavior has become quite
> unpredictable. Its really strange - but it does seem to work sometime and
> fail the next time I try.
>
> I just tried and it seems to have worked. This jdl file does NOT have any
> value set for the user proxy - infact, x509userproxy attribute is not present
> at all in this jdl file
>
> -bash-3.2$ condor_submit /tmp/condor.fnalgrid.0.0345397166247531239932873.jdl
> Submitting job(s).
> Logging submit event(s).
> 1 job(s) submitted to cluster 121.
> -bash-3.2$
Unfortunately, the job in question held immediately
with holdreason "x509userproxy is not set in the job ad"
so although the condor_submit works it is of limited value.
Steve Timm
>
>
> I am pretty sure if I keep submitting, I will hit the failure case..without
> changing anything..
>
> When I do, I will send you that output as well.
>
> -Neha
> On Apr 16, 2009, at 2:38 PM, Martin Feller wrote:
>
>> I see a few ways to run into this by looking at the end of the submit
>> subroutine of cemon.pm. You need to explore what actually happens by adding
>> debug statements e.g. prints to stderr and submit it the non-WS way
>> i mentioned in an e-mail earlier.
>>
>> My comments inline.
>>
>>
>> open(SCRIPT_RESPONSE, "$condor_submit $script_filename $log|") ||
>> return Globus::GRAM::Error::JOB_EXECUTION_FAILED;
>>
>> # For me this resulted in execution of
>> # /condor/bin/condor_submit \
>> #
>> /opt/martin/.globus/.gass_cache/local/md5/37/a6259cc0c1dae299a7866489dff0bd/md5/35/dafb502a36d53fd23b291a877239e6/data
>> \
>> # 2> /dev/null|
>> # which caused for me that the check on if(@response_text) below
>> failed
>> # ==> return Globus::GRAM::Error::JOB_EXECUTION_FAILED;
>> # Also: Execute the command manually (without redirecting error to
>> /dev/null of course) and see what you get.
>> # Also: Might be worth checking the condor submission script you are
>> generating
>> # (.../dafb502a36d53fd23b291a877239e6/data in my example)
>>
>> @response_text = <SCRIPT_RESPONSE>;
>> close SCRIPT_RESPONSE;
>>
>> if(@response_text){
>> $response_line =(grep(/submitted to cluster/,
>> @response_text))[0];
>> $job_id = (split(/\./, (split(/\s+/,
>> $response_line))[5]))[0];
>>
>> if($job_id ne ""){
>> #Neha Sharma 01/11 - Commenting below line as need to
>> write this into a file
>> # unique to each user
>> #system("echo JID: $job_id >> /tmp/logger");
>> $status = Globus::GRAM::JobState::PENDING;
>> $job_id = join(',', map {
>> sprintf("%03d.%03d.%03d",$job_id, $_, 0) } (0..($description->count()-1)));
>> return {JOB_STATE => Globus::GRAM::JobState::PENDING,
>> JOB_ID => $job_id};
>> }
>>
>> # if $job_id is ""
>> # ==> return Globus::GRAM::Error::JOB_EXECUTION_FAILED;
>> }
>> return Globus::GRAM::Error::JOB_EXECUTION_FAILED;
>>
>> -Martin
>>
>>
>> Neha Sharma wrote:
>>>
>>> -bash-3.2$ ${GLOBUS_LOCATION}/libexec/globus-job-manager-script.pl -m
>>> fork -f /tmp/job.raw -c submit
>>> GRAM_SCRIPT_JOB_ID:3572
>>> GRAM_SCRIPT_JOB_STATE:2
>>> -bash-3.2$ ${GLOBUS_LOCATION}/libexec/globus-job-manager-script.pl -m
>>> cemon -f /tmp/job.raw -c submit
>>> GRAM_SCRIPT_ERROR:17
>>> -bash-3.2$
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>> -Neha
>>>
>>
>>
>
--
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm at fnal.gov http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
More information about the gram-user
mailing list