[gt-user] GRAM Job submission failed because the job manager detected an invalid script response
Joseph Bester
bester at mcs.anl.gov
Tue Jul 27 18:23:20 CDT 2010
On Jul 27, 2010, at 9:13 AM, Carlos Borrego Iglesias wrote:
> Hello,
> I am trying to define a new globus-job-manager called jobmanager-mmaa:
>
> when I do a globus-job-run I get the next message:
>
> #globus-job-run myce.pic.es/jobmanager-mmaa /bin/hostname
> GRAM Job submission failed because the job manager detected an invalid script response (error code 24)
>
> In the gatekeeper log I see no errors:
>
Does this mean you are creating a new LRM script?
If so, the first step of debugging is to make sure it runs with
perl -I$GLOBUS_LOCATION/lib/perl $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/mmaa.pm
(assuming mmaa.pm is your module name)
After that, if you are using 5.0.2, you can add the rsl relation save_job_description = yes to a job
submission. This will leave a perl file in your home directory that you can run with the job manager script. That file is called gram_$unique.pl where $unique is a string of characters unique for each job. Pass that to the script to see what's going on:
$GLOBUS_LOCATION/libexec/globus-job-manager-script.pl -m mmaa -f ~/gram_UNIQUE.pl -c submit
> Successfull mapping done
> Mapping service "LCMAPS" returned local user "atlas007"
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 0: GRID_SECURITY_HTTP_BODY_FD=9
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 5: Requested service: jobmanager-mmaa
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 5: Authorized as local user: atlas007
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 5: Authorized as local uid: 31057
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 5: and local gid: 1307
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 5: "/DC=es/DC=irisgrid/O=ifae/CN=carlos.borrego" mapped to atlas007 (31057/1307)
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 0: executing /opt/globus//libexec/globus-job-manager
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 0: GATEKEEPER_JM_ID 2010-07-27.15:08:22.0000026517.0000000000 for /DC=es/DC=irisgrid/O=ifae/CN=carlos.borrego on 193.109.175.133
> JMA 2010/07/27 15:08:22 GATEKEEPER_JM_ID 2010-07-27.15:08:22.0000026517.0000000000 has EDG_WL_JOBID ''
> GATEKEEPER_DGAS_FD=8 (/opt/edg/var/gatekeeper/jobs/2010-07-27.15:08:22.0000014805.0000000014)
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=12
> TIME: Tue Jul 27 15:08:22 2010
> PID: 26517 -- Notice: 0: Child 26519 started
> JMA 2010/07/27 15:08:22 GATEKEEPER_JM_ID 2010-07-27.15:08:22.0000026517.0000000000 JM exiting
>
> In the gram job manager log file from the user which is mapped to I get:
>
> [atlas082 at myce ~]$ tail gram_job_mgr_22248.log
> 7/27 14:33:07 JMI: completed script validation: job manager type is mmaa.
> 7/27 14:33:07 JMI: cmd = cache_cleanup
> 7/27 14:33:07 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_CACHE_CLEAN_UP
> 7/27 14:33:07 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_EARLY_FAILED_RESPONSE
> 7/27 14:33:07 JM: before sending to client: rc=0 (Success)
> 7/27 14:33:07 Job Manager State Machine (exiting): GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE
> 7/27 14:33:07 JM: in globus_gram_job_manager_reporting_file_remove()
> 7/27 14:33:07 Job Manager State Machine (entering): GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_DONE
> 7/27 14:33:07 JM: in globus_gram_job_manager_reporting_file_remove()
> 7/27 14:33:07 JM: exiting globus_gram_job_manager.
>
> Any ideas where can I get more debug info?
> Thanks so much in advance
> Carlos
>
> --
> =============================
> Carlos Borrego Iglesias
> cborrego at ifae.es
> IFAE Institut de Física d'Altes Energies
> Campus UAB Edifici Cn. Facultat Ciències
> E-08193 Bellaterra
> tel: +34 93 581 2822
> =============================
More information about the gt-user
mailing list