Log in Page Discussion History Go to the site toolbox

Cloud MPICH2

From Engineering Grid Wiki

There is currently one instance of MPICH2 installed on the Cloud, and it is tied to the default GNU compilers.

It is rooted at:


Setting up for MPICH2 on the Cloud

You need to prepare your account. Run this:

touch ~/.mpd.conf
chmod 0700 ~/.mpd.conf

to initialize an MPD control file.

You'll also want SSH keys. We're going to do this without a passphrase, since SGE can't pass along your password.

ssh-keygen -t dsa

Hit enter to all the prompts; use a blank passphrase, then:

cp ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now your user is able to SSH amongst the Cloud using a SSH Key rather than a password.

Compiling a MPICH2 program

To compile a program written for MPICH2, use:




as appropriate for the language you are compiling in, or consult the package's Makefile, if compiling prepackaged software.

Running a MPICH2 Job

A simple queue submission script for MPICH2 would look as such:

#$ -cwd
#$ -S /bin/bash
#$ -j y
#$ -o mpihello.out
#$ -pe smp 4
export MPICH2_ROOT=/cluster/cloud/mpich2/gnu-mpd or .../mpich2/mpich2-1.5
export PATH=$MPICH2_ROOT/bin:$PATH
export MPD_CON_EXT="sge_$JOB_ID.$TASK_ID"
# The order of arguments is important. First global, then local options.
mpiexec -machinefile $TMPDIR/machines -n $NSLOTS /home/research/mark/mpihello/mpihello
exit 0

The line:

#$ -pe smp 4

is important. "smp" is the Grid Engine parallel environment that your job will run in. The number afterward is the number of cores you wish to run on.

#$ -S /bin/bash

is also important. If you happen to be writing your submission script in a shell other than your default login shell, you've got to enforce which shell SGE will run the job with.

By default, the parallel environment uses a "Fill Up" algorithm, which will fill up a single node with your MPI job before splitting across machines.

This script (in this example saved as 'mpisub.sh'), is executed as:

qsub mpisub.sh

After some time, I get this output (note that above I combined my error and output streams):

This output from the error stream of the job is notifications from the parallel environment - note that it successfully found all the nodes. Errors starting the MP daemons would show up here:

-catch_rsh /cluster/cloud/sge/SEAS-CLOUD/spool/cloud002/active_jobs/129.1/pe_hostfile  /cluster/cloud/mpich2/gnu-mpd
startmpich2.sh: check for local mpd daemon (1 of 10)
/cluster/cloud/sge/bin/lx24-amd64/qrsh -inherit -V cloud002 /cluster/cloud/mpich2/gnu-mpd/bin/mpd
Warning: No xauth data; using fake authentication data for X11 forwarding.
startmpich2.sh: check for local mpd daemon (2 of 10)
startmpich2.sh: check for mpd daemons (1 of 10) 
/cluster/cloud/sge/bin/lx24-amd64/qrsh -inherit -V cloud004 /cluster/cloud/mpich2/gnu-mpd /bin/mpd -h cloud002 -p 40931 -n
startmpich2.sh: check for mpd daemons (2 of 10)
startmpich2.sh: got all 2 of 2 nodes

The output of the job, showing MPI communications working:

Hello World from Node 0.
Hello World from Node 1.
Hello World from Node 2.
Hello World from Node 3.

The MPI Hello World program used in this example can be found here: http://gridengine.sunsource.net/howto/mpich2-integration/mpihello.tgz

Site Toolbox:

Personal tools
This page was last modified on 17 August 2015, at 09:42. - This page has been accessed 4,410 times. - Disclaimers - About Engineering Grid Wiki