This is a template for programming a distributed application in the AXEL multi-node cluster. This example add two vectors, A and B, repeatedly and the result is stored in vector A.
Parallelism is achieved by segmenting the vectors and distribute it to the nodes. The workload within a node is then further distributed to various PEs of the node. In this example, both GPU and FPGA are used.
To run the distributed application, user should enter the following:
qsub cfg/myapp.sh
The 'qsub' command assume we have PBS/Torque cluster management system up and running. To run the application without PBS/Torque, user can use the MPI runtime directly as shown below:
mpirun -bynode \ -host axel05 -np 1 ./myapp_m0 cfg/myapp.xml : \ -host axel06 -np 1 ./myapp_m0 cfg/myapp.xml : \ -host axel07 -np 1 ./myapp_m0 cfg/myapp.xml : \ -host axel08 -np 1 ./myapp_m0 cfg/myapp.xml
The node assignment is random.