Using the Axel Cluster

Cluster Resources

hardware resources

The following table lists the available hardware resources in the current Axel system. We currently have 25 nodes, 160 CPU cores and over 10TB of disk space. On top of these general purpose resources, we have 24 GPUs including the newest nVidia Fermi architecture and 16 FPGA accelerators including Xilinx Virtex-6 architecture. All nodes are supported by UPS (Uninterrupted Power Supply) and connected through a dedicated Gigabit Ethernet switch.

Node CPU RAM GPU FPGA access
cccad1 Dual Intel Xeon X5650 (6-core per CPU) @ 2.67GHz 32GB DDR3 @ 1333MHz (ECC) nVidia GTX480 Alpha-Data ADM-XRC-5T2 SSH only
axel01, axel02 Dual Intel Xeon E5420 Quad-Core @ 2.50GHz 16GB DDR2 @ 1066MHz nVidia Tesla C1060 Alpha-Data ADM-XRC-5T2 SSH and PBS
axel03 AMD PhenomX4 9650 Quad-Core @ 2.30GHz 8GB DDR2 @ 1066MHz nVidia GTX480 Alpha-Data ADM-XRC-5T2 PBS only
axel04 ~ axel12 AMD PhenomX4 9650 Quad-Core @ 2.30GHz 8GB DDR2 @ 1066MHz nVidia Tesla C1060 Alpha-Data ADM-XRC-5T2 PBS only
axel13 ~ axel16 AMD PhenomX4 9650 Quad-Core @ 2.30GHz 8GB DDR2 @ 1066MHz nVidia Tesla C1060 N/A PBS only
axel17, axel18, axel20 Intel Core i7 950 # 3.7GHz 6GB DDR3 @ 1600MHz nVidia GTX580 Alpha-Data ADM-XRC-6T1 PBS only
axel19 Intel Core i7 950 # 3.7GHz 6GB DDR3 @ 1600MHz nVidia GTX480 Alpha-Data ADM-XRC-6T1 PBS only
axel21 ~ axel24 Intel Core i7 950 # 3.7GHz 6GB DDR3 @ 1600MHz nVidia GTX580 N/A PBS only

The cccad1 should be used as the general gateway to all the Axel cluster resources while axel01 and axel02 are used for FPGA/GPU debugging. The rest nodes must be accessed through the Torque/Maui cluster management system for production applications.

storage resources

In all nodes, your CSG Linux accounts are mounted on /homes/< username > as normal CSG managed machines. The /vol/cc and /vol/bitbucket directories are also mount in their normal locations.

For the CC group members, you will be assigned to extra storage through our group NAS (network attached storage). There are 50GB space available for each member and you can access it through /mnt/ccnas/< username >. There is no RAID or backup service for the NAS. So please use it for large temporary files and place your valuable research data in your personal account (which is mirrored and backup every hour).

For each node in the Axel cluster, there is a large local storage for all users in /data. You can consider it as a 450GB temporary directory for your application. There is no simple way to synchronise the contents in the local storage between different nodes.

software resource

All nodes in Axel are running CentOS (binary compatible to the RedHat Enterprise Linux distribution) 64-bit (x86_64) Linux. OpenMPI is selected as the default MPI implementation for distributed applications.

In the head node, cccad1, the following software are installed in the /opt directory. For all software requiring floating license, please point your LM_LICENSE_FILE to 27000@chicken.

Altera Quartus 10.0
AutoESL AutoPilot 2010.a.4
nVidia CUDA SDK 4.0
Intel Compiler 11.1 (including IPP, MKL, CMKL)
Matlab R2010a
ModelTech ModelSim SE 6.6c
Synopsys 2010 suit (including DC, Formality, HSpice, Primetime, VCS)
Xilinx ISE System Edition 13.1

Accessing the Compute Nodes

A centralised resource management scheme is currently enforced using the Torque/Maui CMS (Cluster Management System) tool. Average user are no long able to login the compute nodes (currently axel03~axel24) through SSH and all jobs must be submitted through the head node (currently cccad1). The procedures to run process remotely on the nodes are detailed below.

  1. You must put all your program and data in your home directory /homes which is the same as on any other CSG managed Linux machines. Or you can alternatively place them in our group NAS system /mnt/ccnas/your_login_name which is only available in the Axel cluster.

  2. The login name user_xyz is used in the following examples. Please replace it with our own login name.

  3. Assume you have the C source code as /homes/user_xyz/projects/hello.c and you compile it to generate the Linux executable as /homes/user_xyz/projects/hello .

  4. You MUST create a shell script, as /homes/user_xyz/projects/hello.sh as shown below to wrap around the binary executable for the Torque CMS. The first part is to ensure the correct path for program execution and data file access.

    #!/bin/bash
    
    if [ -n "$PBS_O_WORKDIR" ]; then
    cd $PBS_O_WORKDIR
    fi
    
    ./hello
    

  5. Set the correct permission of the above shell script.

    chmod 755 hello.sh
    

  6. Submit the job to the CMS system. The following command submit the hello.sh script to a specific compute node, axel24. The task will be scheduled in the default queue of the CMS. The system will display the job ID (34 in this example). If you skip the "-l host=axel24" option, the CMS will automatically assign your job to a free node.

    user_xyz@cccad1:~$ qsub -l host=axel24 ./hello.sh
    34.cccad1.doc.ic.ac.uk
    

    You MUST login the head node (cccad1) and submit the job from there.

    Unless absolute path is specified in the above script or the PATH environment variable is set correctly, you MUST run the "qsub" command in the same directory of the "hello" binary executable.

  7. You can monitor the status of your job. In this example, job 34 is completed (Status=C). It was scheduled through the "batch" queue. Your job may also be running (Status=R) or waiting in the queue (Status=Q).

    user_xyz@cccad1:~$ qstat
    Job id                    Name             User            Time Use S Queue
    ------------------------- ---------------- --------------- -------- - -----
    34.cccad1                 pbs_test.sh      user_xyz        00:00:00 C batch    
    
  8. After the job is completed, its outputs are stored in the same directory where you submit it (involved the "qsub" command). In this example, we will have the hello.sh.e34 and hello.sh.o34 in the directory /homes/user_xyz/projects/. They are the output for std_err and std_out respectively.

  9. You can cancel submitted job with the qdel command (before it is completed).

    qdel 34
    
  10. For user cc, since there is no NFS /homes directory for you from CSG, you have to run everything from within /mnt/ccnas/cc.

    For advance usage of the CMS software, please refer to the original documents .