[gram-user] globusrun-ws: Job failed: The executable could not be started.]

Steven Timm timm at fnal.gov
Thu Apr 16 22:42:05 CDT 2009


On Thu, 16 Apr 2009, Neha Sharma wrote:

> Hi Martin
>
> We had tried running the same command manually earlier.
>
> I must mention that at this stage, the behavior has become quite 
> unpredictable. Its really strange - but it does seem to work sometime and 
> fail the next time I try.
>
> I just tried and it seems to have worked. This jdl file does NOT have any 
> value set for the user proxy - infact, x509userproxy attribute is not present 
> at all in this jdl file
>
> -bash-3.2$ condor_submit /tmp/condor.fnalgrid.0.0345397166247531239932873.jdl
> Submitting job(s).
> Logging submit event(s).
> 1 job(s) submitted to cluster 121.
> -bash-3.2$

Unfortunately, the job in question held immediately
with holdreason "x509userproxy is not set in the job  ad"
so although the condor_submit works it is of limited value.

Steve Timm

>
>
> I am pretty sure if I keep submitting, I will hit the failure case..without 
> changing anything..
>
> When I do, I will send you that output as well.
>
> -Neha
> On Apr 16, 2009, at 2:38 PM, Martin Feller wrote:
>
>> I see a few ways to run into this by looking at the end of the submit
>> subroutine of cemon.pm. You need to explore what actually happens by adding
>> debug statements e.g. prints to stderr and submit it the non-WS way
>> i mentioned in an e-mail earlier.
>> 
>> My comments inline.
>> 
>>
>> 	open(SCRIPT_RESPONSE, "$condor_submit $script_filename $log|") || 
>> return Globus::GRAM::Error::JOB_EXECUTION_FAILED;
>>
>>       # For me this resulted in execution of
>>       # /condor/bin/condor_submit \
>>       # 
>> /opt/martin/.globus/.gass_cache/local/md5/37/a6259cc0c1dae299a7866489dff0bd/md5/35/dafb502a36d53fd23b291a877239e6/data 
>> \
>>       #   2> /dev/null|
>>       # which caused for me that the check on if(@response_text) below 
>> failed
>>       # ==> return Globus::GRAM::Error::JOB_EXECUTION_FAILED;
>>       # Also: Execute the command manually (without redirecting error to 
>> /dev/null of course) and see what you get.
>>       # Also: Might be worth checking the condor submission script you are 
>> generating
>>       # (.../dafb502a36d53fd23b291a877239e6/data in my example)
>>
>> 	@response_text = <SCRIPT_RESPONSE>;
>> 	close SCRIPT_RESPONSE;
>>
>> 	if(@response_text){
>> 		$response_line =(grep(/submitted to cluster/, 
>> @response_text))[0];
>> 		$job_id = (split(/\./, (split(/\s+/, 
>> $response_line))[5]))[0];
>>
>> 		if($job_id ne ""){
>> 			#Neha Sharma 01/11 - Commenting below line as need to 
>> write this into a file
>> 			# unique to each user
>> 			#system("echo JID: $job_id >> /tmp/logger");
>> 			$status = Globus::GRAM::JobState::PENDING;
>> 			$job_id = join(',', map { 
>> sprintf("%03d.%03d.%03d",$job_id, $_, 0) } (0..($description->count()-1))); 
>> return {JOB_STATE => Globus::GRAM::JobState::PENDING,
>> 		    JOB_ID    => $job_id};
>> 		}
>>
>>               # if $job_id is ""
>>               # ==> return Globus::GRAM::Error::JOB_EXECUTION_FAILED;
>> 	}
>> 	return Globus::GRAM::Error::JOB_EXECUTION_FAILED; 
>> 
>> -Martin
>> 
>> 
>> Neha Sharma wrote:
>>> 
>>> -bash-3.2$ ${GLOBUS_LOCATION}/libexec/globus-job-manager-script.pl -m
>>> fork -f /tmp/job.raw -c submit
>>> GRAM_SCRIPT_JOB_ID:3572
>>> GRAM_SCRIPT_JOB_STATE:2
>>> -bash-3.2$ ${GLOBUS_LOCATION}/libexec/globus-job-manager-script.pl -m
>>> cemon -f /tmp/job.raw -c submit
>>> GRAM_SCRIPT_ERROR:17
>>> -bash-3.2$
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Thanks
>>> 
>>> -Neha
>>> 
>> 
>> 
>

-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.


More information about the gram-user mailing list