From Engineering Grid Wiki
There is currently one instance of MPICH2 installed on the Cloud, and it is tied to the default GNU compilers.
It is rooted at:
Setting up for MPICH2 on the Cloud
You need to prepare your account. Run this:
touch ~/.mpd.conf chmod 0700 ~/.mpd.conf
to initialize an MPD control file.
You'll also want SSH keys. We're going to do this without a passphrase, since SGE can't pass along your password.
ssh-keygen -t dsa
Hit enter to all the prompts; use a blank passphrase, then:
cp ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Now your user is able to SSH amongst the Cloud using a SSH Key rather than a password.
Compiling a MPICH2 program
To compile a program written for MPICH2, use:
/cluster/cloud/mpich2/mmpich2-1.5/bin/mpicc /cluster/cloud/mpich2/mmpich2-1.5/bin/mpic++ /cluster/cloud/mpich2/mmpich2-1.5/bin/mpif77 /cluster/cloud/mpich2/mmpich2-1.5/bin/mpif90
/cluster/cloud/mpich2/gnu-mpd/bin/mpicc /cluster/cloud/mpich2/gnu-mpd/bin/mpic++ /cluster/cloud/mpich2/gnu-mpd/bin/mpif77 /cluster/cloud/mpich2/gnu-mpd/bin/mpif90
as appropriate for the language you are compiling in, or consult the package's Makefile, if compiling prepackaged software.
Running a MPICH2 Job
A simple queue submission script for MPICH2 would look as such:
#!/bin/sh #$ -cwd #$ -S /bin/bash #$ -j y #$ -o mpihello.out #$ -pe smp 4 export MPICH2_ROOT=/cluster/cloud/mpich2/gnu-mpd or .../mpich2/mpich2-1.5 export PATH=$MPICH2_ROOT/bin:$PATH export MPD_CON_EXT="sge_$JOB_ID.$TASK_ID" # The order of arguments is important. First global, then local options. mpiexec -machinefile $TMPDIR/machines -n $NSLOTS /home/research/mark/mpihello/mpihello exit 0
#$ -pe smp 4
is important. "smp" is the Grid Engine parallel environment that your job will run in. The number afterward is the number of cores you wish to run on.
#$ -S /bin/bash
is also important. If you happen to be writing your submission script in a shell other than your default login shell, you've got to enforce which shell SGE will run the job with.
By default, the parallel environment uses a "Fill Up" algorithm, which will fill up a single node with your MPI job before splitting across machines.
This script (in this example saved as 'mpisub.sh'), is executed as:
After some time, I get this output (note that above I combined my error and output streams):
This output from the error stream of the job is notifications from the parallel environment - note that it successfully found all the nodes. Errors starting the MP daemons would show up here:
-catch_rsh /cluster/cloud/sge/SEAS-CLOUD/spool/cloud002/active_jobs/129.1/pe_hostfile /cluster/cloud/mpich2/gnu-mpd cloud002:2 cloud004:2 startmpich2.sh: check for local mpd daemon (1 of 10) /cluster/cloud/sge/bin/lx24-amd64/qrsh -inherit -V cloud002 /cluster/cloud/mpich2/gnu-mpd/bin/mpd Warning: No xauth data; using fake authentication data for X11 forwarding. startmpich2.sh: check for local mpd daemon (2 of 10) startmpich2.sh: check for mpd daemons (1 of 10) /cluster/cloud/sge/bin/lx24-amd64/qrsh -inherit -V cloud004 /cluster/cloud/mpich2/gnu-mpd /bin/mpd -h cloud002 -p 40931 -n startmpich2.sh: check for mpd daemons (2 of 10) startmpich2.sh: got all 2 of 2 nodes
The output of the job, showing MPI communications working:
Hello World from Node 0. Hello World from Node 1. Hello World from Node 2. Hello World from Node 3.
The MPI Hello World program used in this example can be found here: http://gridengine.sunsource.net/howto/mpich2-integration/mpihello.tgz