Log in Page Discussion History Go to the site toolbox

Engineering Cluster Template

From Engineering Grid Wiki

cluster

Engineering Cluster Template

In general, our clusters have 4 main components:

In many cases, the Head Node, Cluster Control, and Storage are combined.

Head Node

The "Head Node" refers to the main public access point of the cluster - the machine users log into to submit jobs, work with data, etc. Many times it is the same hardware as the cluster nodes, in some cases (when it acts as a storage server) it is a unique machine.

Often this machine will run any public-facing webservers that serve out pages referencing data on the cluster storage.

The majority of the time, it will also serve as the cluster control and application server, as well as serving direct attached storage to the cluster nodes.

As it is usually the only public-facing portion of the cluster, it often provides access to the Internet and other networking functions.

Cluster Control and Applications

All of the cluster management and research applications on EIT-managed clusters are installed on an NFS share (either from the storage server or head node) mounted as /cluster/clustername, where clustername is the generally accepted name for that cluster. This is a common mountpoint (usually managed by automount) on every cluster member.

Cluster Control

All of the EIT-managed computing clusters use Grid Engine as the cluster management mechanism. This is installed under /cluster/clustername/sge, using the default SGE installation method[1] of having all local spool and log directories on the NFS mount. The SGE Cell name is usually the clustername.

Cluster Communications

Parallel operations can take place over a variety of mediums - Ethernet, Myrinet, Infiniband, and others. Most Engineering clusters communicate over Gigabit Ethernet.

Applications

Applications vary by cluster. Specific details on any special application will be found on each cluster's page.

Parallel Operations

Most parallel operations on EIT-managed clusters are enabled through MPICH2. We are moving to a modification of the "daemon-based smpd" MPICH2 integration method listed here. Using at least MPICH2 1.0.8, an additional environment flag will cause smpd to forgo using the ".smpd" file in the user's home directory, which will often cause job failures.

A base parallel job submission script:

#!/bin/sh
#
#$ -q all.q
#$ -N testpsmp
#$ -pe mpich2_psmp 8
#$ -cwd 
#$ -v SMPD_OPTION_NO_DYNAMIC_HOSTS=1
# This line defines the MPI communications port, do not change.
@ smpdport = $JOB_ID % 5000 + 20000
/cluster/clustername/mpi/bin/mpiexec -n $NSLOTS -machinefile $TMPDIR/machines -port $smpdport -phrase $smpdport /cluster/clustername/mpi/share/examples_logging/cpilog
exit 0

The SMPD_OPTION_NO_DYNAMIC_HOSTS flag passed through qsub instructs the smpd to not use the dynamic hosts file, and instead specifies the SMPD passphrase and port on the command line.

Storage

Cluster storage on Engineering School clusters are in general provided by direct-attached storage of varying types attached to the head node.

This storage is shared out to the nodes via NFS, usually managed by an automount pointing to /research-projects/clustername (often linked to for historical reasons to /project/clustername).

Depending on needs, the head node shares this out via NFS and CIFS[2].

Cluster Nodes

Cluster nodes are set up to be identical, usually with a scripted Kickstart installation, or another method of cloning (Advanced Clustering's Cloner utility, Partimage, or any number of applications).

Node addresses are managed with static DHCP reservations, and storage access via NFS+automount. Only default distribution software is installed locally on each node; unless there's a very special need, all cluster applications are installed under the /cluster/clustername mount.

With hard drive sizes being what they are, there is usually a large partition set aside for temporary data (it's quicker to do temp storage locally than over NFS). This is usually accessed via a /cluster-tmp link to the actual storage partition. It's important that this is a seperate partition from the other system partitions. In general, it's the user's responsibility to move any data that should be kept to main cluster storage at the end of the job.

Network

IP address space in the private cluster network is up to the administrators; any of the private address spaces (aside from the 172.16, which we use here internally at Engineering) will do. Most of the Engineering clusters use networks in the 10.0.0.0, which may or may not be proper practice.

It's important to note that internal services such as DNS, DHCP, or TFTP should only listen on the internal cluster interfaces so as not to interfere with the rest of the Engineering network.

NAT

Usually done on the head node, network address translation allows cluster nodes to access the internet for updates/installation, and users to download external data as part of a job. This is done through IPTables - the cluster nodes will have the head node as their default gateway.

DNS

DNS is important for all facets of cluster communication, be it cluster applications or storage. Using DNS eliminates the requirement to manage /etc/hosts or the like locally on the nodes.

The default 'caching-nameserver' RPM provided with RHEL/CentOS is a good place to start, as it provides a caching nameserver for the cluster (reducing external traffic). Add into that definitions for internal cluster nodes/resources.

DHCP

The head node also can run DHCP, providing the cluster nodes with their network settings. This aids in installation and management of the nodes - each node wil gain an IP address based on it's MAC when configured correctly, which allows you to push out a default image to all nodes without worrying about setting the hostname or IP address.

TFTP

TFTP is used to PXE boot nodes, most often for installation, but some clusters (not here at Engineering) will boot their nodes off of NFS as well.

References

  1. A discussion of how SGE can reduce NFS usage can be found at http://gridengine.sunsource.net/howto/nfsreduce.html
  2. Using Samba (www.samba.org), authenticating to our AD domains


Site Toolbox:

Personal tools
This page was last modified on 28 May 2009, at 12:14. - This page has been accessed 11,118 times. - Disclaimers - About Engineering Grid Wiki