[gridway-user] Job remains pending

Tino Vazquez tinova79 at gmail.com
Fri Nov 30 06:23:54 CST 2007


Hi rasyid,

The T means the total number of nodes. Your SGE seems to be reporting 0 for
this. The 4 slots you get as free is probably due to one queue in SGE having
that number of slots. You need to configure SGE to report correctly the
value of total nodes back to Globus, as GridWay uses that information, more
precisely it uses MDS.

This post may clarify things a bit:

http://www.globus.org/mail_archive/gridway-user/2007/11/msg00018.html

If it doesn't help, please don't hesitate to come back to us.

Regards,

-Tino

On Nov 30, 2007 5:11 AM, rasyid mujahid <rasyidmujahid at gmail.com> wrote:

> these're the output of gwhost
>
> $gwhost
> HID PRIO  OS              ARCH   MHZ %CPU  MEM(F/T)     DISK(F/T)
> N(U/F/T) LRMS    HOSTNAME
> 0   1     Linux2.6.17-10- x86   3067   60     7/440   11482/20560
> 0/4/0 SGE                  riset-c-3208-203.riset.c
> 1   1     Linux2.6.17-10- x86   3067  129     7/440   11482/20560
> 0/4/0 SGE                  riset-c-3208-202.riset.c
>
> Is it normal that N(U/F/T) = 0/4/0 SGE ? Does it mean no free slots?
>
> $gwhost -m 0
> HID QNAME      RANK  PRIO  SLOTS HOSTNAME
> 0         all.q            0     1     0     riset-c-3208-203.riset.c
> 1         all.q            0     1     0     riset-c-3208-202.riset.c
>
> oh, it seems no slots allocated for the job. so, how i can solve the
> problem?
>
> Thanks.
>
>
> On Nov 29, 2007 5:25 PM, Tino Vazquez <tinova79 at gmail.com > wrote:
>
> > Hello rasyid,
> >
> > Could you send the output of gwhost? It can be that the job doesn't have
> > free slots to run. also, could you send the output of
> >
> > gwhost -m <job_id>
> >
> > where job_id is the jid of a job in the pending state.
> >
> > Regards,
> >
> > -Tino
> >
> >
> > On Nov 29, 2007 2:12 AM, rasyid mujahid <rasyidmujahid at gmail.com >
> > wrote:
> >
> > > Hello,
> > > I have problem while submitting job. it's always in pending state. I
> > > hope anyone could solve this problem.
> > > This is the job file
> > >
> > > EXECUTABLE  = /bin/ls
> > > ARGUMENTS   =     -la
> > > STDIN_FILE  =   /dev/null
> > > STDOUT_FILE =   ls.out.${JOB_ID}
> > > STDERR_FILE =   ls.err.${JOB_ID}
> > >
> > >
> > > below I print the output of gwps and logs file:
> > >
> > > $gwps
> > > USER         JID DM   EM   START    END      EXEC    XFER    EXIT
> > > NAME            HOST
> > > gwman        4   pend ----     01:58:08 --:--:--       0:00:00
> > > 0:00:00   --       ls.jt
> > > --
> > >
> > > $cat gwd.log
> > > Wed Nov 28 07:13:26 2007 [GW][I]:
> > > ---------------------------------------------------
> > > Wed Nov 28 07:13:26 2007 [GW][I]:                    gwd.conf values
> > > Wed Nov 28 07:13:26 2007 [GW][I]:
> > > ---------------------------------------------------
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Core configuration attributes
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     GWD_PORT                 : 6725
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MAX_NUMBER_OF_CLIENTS    : 25
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     NUMBER_OF_ARRAYS         : 200
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     NUMBER_OF_JOBS           : 5000
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     NUMBER_OF_HOSTS          : 100
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     NUMBER_OF_USERS          : 30
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     SCHEDULING_INTERVAL      : 30
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     DISCOVERY_INTERVAL       : 900
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MONITORING_INTERVAL      : 300
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     POLL_INTERVAL            : 180
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MAX_ACTIVE_IM_QUERIES    : 10
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Information Manager MADs
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MAD(0)  name  : mds4
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         executable:
> > > gw_im_mad_mds4_thr
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         argument  : -l
> > > etc/hosts.list
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         TM        :
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         EM        : gridftp
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Transfer Manager MADs
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MAD(0)  name  : gridftp
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         executable: gw_tm_mad_ftp
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         argument  :
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Execution Manager MADs
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MAD(0)  name  : ws
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         executable: gw_em_mad_ws
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         argument  :
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         rsl mode  : rsl
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Dispatch Manager Scheduler
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         name      : builtin
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         executable: gw_sched
> > > Wed Nov 28 07:13:26 2007 [GW][I]:         argument  :
> > > Wed Nov 28 07:13:26 2007 [GW][I]:
> > > ---------------------------------------------------
> > > Wed Nov 28 07:13:26 2007 [GW][I]:             sched.conf built-in
> > > policies
> > > Wed Nov 28 07:13:26 2007 [GW][I]:
> > > ---------------------------------------------------
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Scheduler configuration attributes
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     DISABLE                  : no
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     DISPATCH_CHUNK           : 15
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MAX_RUNNING_USER         : 30
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     MAX_RUNNING_RESOURCE     : 10
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Job Fixed Priority Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     FP_WEIGHT                : 1.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     Fixed Priority Values (users)
> > > Wed Nov 28 07:13:26 2007 [GW][I]:       DEFAULT                : 0
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Job Share Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     SH_WEIGHT (share)        : 1.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     SH_WINDOW_SIZE           : 1.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     SH_WINDOW_DEPTH          : 5
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     User Shares
> > > Wed Nov 28 07:13:26 2007 [GW][I]:       DEFAULT                : 5
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Job Waiting time Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     WT_WEIGHT                : 0.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Job Deadline Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     DL_WEIGHT (deadline)     : 1.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     DL_HALF                  : 0
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Resource Fixed Priority Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     RP_WEIGHT                : 1.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     Fixed Priority Values
> > > (information managers)
> > > Wed Nov 28 07:13:26 2007 [GW][I]:       DEFAULT                : 1
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Resource Failure Rate Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     RA_WEIGHT                : 1.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Resource Failure Rank Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     FR_MAX_BANNED            : 3600
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     FR_BANNED_C              :
> > > 650.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:   Resource Usage Policy
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     UG_WEIGHT                : 1.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     UG_HISTORY_WINDOW        : 3.00
> > > Wed Nov 28 07:13:26 2007 [GW][I]:     UG_HISTORY_RATIO         : 0.25
> > > Wed Nov 28 07:13:26 2007 [GW][I]:
> > > ---------------------------------------------------
> > > Wed Nov 28 07:13:26 2007 [DM][I]: Job pool initialized.
> > > Wed Nov 28 07:13:26 2007 [DM][I]: Array pool initialized.
> > > Wed Nov 28 07:13:26 2007 [IM][I]: Host pool initialized.
> > > Wed Nov 28 07:13:26 2007 [UM][I]: User pool initiated.
> > > Wed Nov 28 07:13:26 2007 [GW][I]: Loading Information Manager MADs.
> > > Wed Nov 28 07:13:27 2007 [IM][I]:       MAD mds4 loaded (exec:
> > > gw_im_mad_mds4_thr, arg: -l etc/hosts.list).
> > > Wed Nov 28 07:13:27 2007 [GW][I]: Loading the scheduler.
> > > Wed Nov 28 07:13:27 2007 [DM][I]:       Scheduler builtin loaded
> > > (exec: gw_sched, arg: ).
> > > ...
> > > Wed Nov 28 07:14:27 2007 [UM][I]: User gwman registered.
> > > ...
> > > Thu Nov 29 01:58:08 2007 [DM][I]: New job 4 allocated and initialized.
> > >
> > > $cat 4/job.log
> > > Thu Nov 29 01:58:08 2007 [DM][I]: ----------- Job configuration file (
> > > ls.jt) values -----------
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       EXECUTABLE             :
> > > /bin/ls
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       ARGUMENTS              : -la
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       INPUT_FILES   (Total 0):
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       OUTPUT_FILES  (Total 0):
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       RESTART_FILES (Total 0):
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       STDIN_FILE             :
> > > /dev/null
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       STDOUT_FILE            :
> > > ls.out.${JOB_ID}
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       STDERR_FILE            :
> > > ls.err.${JOB_ID}
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       REQUIREMENTS           :
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       RANK                   :
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       RESCHEDULING_INTERVAL  : 0
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       RESCHEDULING_THRESHOLD : 300
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       SUSPENSION_TIMEOUT     : 900
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       CPULOAD_THRESHOLD      : 50
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       RESCHEDULE_ON_FAILURE  : yes
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       NUMBER_OF_RETRIES      : 3
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       CHECKPOINT_INTERVAL    : 0
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       CHECKPOINT_URL         :
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       WRAPPER                :
> > > /home/gwadmin/gw/libexec/gw_wrapper.sh
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       MONITOR                :
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       PRE_WRAPPER            :
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       PRE_WRAPPER_ARGUMENTS  :
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       TYPE                   :
> > > single
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       NP                     : 1
> > > Thu Nov 29 01:58:08 2007 [DM][I]:       DEADLINE               :
> > > 0:00:00 0
> > > Thu Nov 29 01:58:08 2007 [DM][I]:
> > > ----------------------------------------------------------
> > > Thu Nov 29 01:58:08 2007 [DM][I]: New state is PENDING.
> > >
> > > But it's wondering i can submit manually using gw_em_mad_ws
> > >
> > > $ gw_em_mad_ws
> > > INIT - - - -
> > > INIT - SUCCESS -
> > > SUBMIT 5 152.118.26.202 job.rsl
> > > SUBMIT 5 SUCCESS https://riset-c-3208-202.riset.cs.ui.ac.id:8443/wsrf/services/ManagedExecutableJobService?0ac03c50-9e13-11dc-8646-b69b9c478baf
> > >
> > > CALLBACK 5 SUCCESS CLEANUP
> > > CALLBACK 5 SUCCESS ACTIVE
> > > CALLBACK 5 SUCCESS DONE:0
> > >
> > >
> > > Thanks.
> > > --
> > > Rasyid
> > >
> > >
> >
> >
> > --
> > +-----------------------------------------------------------+
> > Tino Vázquez
> > Grid Technology Engineer/Researcher
> > Dpto. Arquitectura de Computadores y Automatica
> > Facultad de Informatica
> > Universidad Complutense 28040 Madrid
> > Phone : +34 91 394 75 74
> > http://asds.dacya.ucm.es/
> > +-----------------------------------------------------------+
> >
> > GridWay, The Way to Grid! http://www.gridway.org
>
>
>
>
> --
> Rasyid
>



-- 
+-----------------------------------------------------------+
Tino Vázquez
Grid Technology Engineer/Researcher
Dpto. Arquitectura de Computadores y Automatica
Facultad de Informatica
Universidad Complutense 28040 Madrid
Phone : +34 91 394 75 74
http://asds.dacya.ucm.es/
+-----------------------------------------------------------+

GridWay, The Way to Grid! http://www.gridway.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.globus.org/pipermail/gridway-user/attachments/20071130/dbd2ef65/attachment.htm>


More information about the gridway-user mailing list