[gridway-user] Gridway cannot stage files back from resource.
Patrick Armstrong
patricka at uvic.ca
Fri Aug 11 13:04:31 CDT 2006
Hmm, I do use Torque, however I don't think that's the problem. RSA keys
authentication is enabled on my cluster, but I also use a shared NFS
filesystem for /home. I really don't think this is the problem though,
because gt4 jobs submitted from my metascheduler machine work fine:
[patricka at ugdev08 ~]$ globusrun-ws -submit -s -q -F gridsn.phys.uvic.ca
-Ft PBS -c /bin/uname -a
Linux wn1 2.6.9-5.0.5.EL #1 Tue Apr 19 14:33:20 CDT 2005 i686 athlon
i386 GNU/Linux
[patricka at ugdev08 ~]$
wn1 is one of the PBS nodes on the cluster.
Also, if I reconfigure my static resource file to use the Fork LRMS,
then the jobs still fail in exactly the same manner.
Here's an example of a session demonstrating that scp works properly
between the nodes:
[root at gridsn ~]# cat /etc/grid-security/grid-mapfile | grep Patrick
"/C=CA/O=Grid/OU=phys.uvic.ca/CN=Patrick Armstrong" dev03
[root at gridsn ~]# su - dev03
[dev03 at gridsn ~]$ scp wn1:/etc/motd /tmp
motd 100% 0 0.0KB/s 00:00
[dev03 at gridsn ~]$ scp wn2:/etc/motd /tmp
motd 100% 0 0.0KB/s 00:00
[dev03 at gridsn ~]$ ssh wn1
Last login: Mon Jul 10 12:09:27 2006 from gridsn.phys.uvic.ca
[dev03 at wn1 ~]$ scp gridsn:/etc/motd /tmp
motd 100% 0 0.0KB/s 00:00
[dev03 at wn1 ~]$ logout
Connection to wn1 closed.
[dev03 at gridsn ~]$ ssh wn2
Last login: Wed Jul 12 11:32:38 2006 from gridsn.phys.uvic.ca
[dev03 at wn2 ~]$ scp gridsn:/etc/motd /tmp
motd 100% 0 0.0KB/s 00:00
[dev03 at wn2 ~]$ logout
Connection to wn2 closed.
[dev03 at gridsn ~]$ logout
[root at gridsn ~]#
However I couldn't get Torque native file stage-out / stage-in to work
with my cluster, I tried this:
[dev03 at gridsn ~]$ date > input.txt
[dev03 at gridsn ~]$ qsub -l nodes=1,walltime=100 -W
stagein=input.txt at wn1:/home/dev03/input_out.txt
but it just hung until I killed it with ^C. Is this really necessary for
Gridway? And why would Fork jobs be failing in the same way?
Thanks,
--Patrick Armstrong
Krzysztof Wilk wrote:
> Hi Patric,
>
> I do not know which PBS variant you use. I wil refer to Torque
> (similar to OpenPBS). PBS Pro interface should also be similar.
>
> I do not know if GridWay makes use of Torque native file stage-out /
> stage-in mechanism but I suggest you to check if they work on your
> cluster.
>
> First, make sure that you have RSA keys authentication enabled:
> <http://www.clusterresources.com/wiki/doku.php?id=torque:6.1_scp_rcp_setup>.
>
>
> Then, check if file stage-out and stage-in works for you:
> <http://www.clusterresources.com/wiki/doku.php?id=torque:6.3_file_stage-in_stage-out>.
>
>
> Hope this helps,
>
> Best,
> Chris
More information about the gridway-user
mailing list