The iframe on this page is empty and contains no contentSkip to content

Research: Maximum Performance Computing

Acknowledgment

We thank Maxeler Technologies for their generous donations. Much support for our research has been received through the Maxeler University Program (MAX-UP).

Resources

We currently host 4 systems with a total of 10 MAX3 dataflow compute engines for various research projects.

System Processor Memory Accelerator
maxstation1 Intel Core i7 870 @ 2.93GHz 16GB DDR3 @ 1600MHz MAX3 (V6-SXT475) + 10G interface
maxstation2 Intel Core i7 870 @ 2.93GHz 16GB DDR3 @ 1600MHz MAX3 (V6-SXT475) + 10G interface
maxnode1 Dual Intel Core Xeon X5650 (6-core per CPU) @ 2.67GHz 96GB DDR3 @ 1600MHz (ECC) Quad MAX3 (V6-SXT475)
maxnode2 Dual Intel Core Xeon X5650 (6-core per CPU) @ 2.67GHz 96GB DDR3 @ 1600MHz (ECC) Quad MAX3 (V6-SXT475)

Each MaxWorkstation is equipped with a MAX3 dataflow compute engine and a 10G interface card, which provides high bandwidth and low latency communication channels. Each MaxNode is equipped with four MAX3 dataflow compute engines, connected together by a MaxRing.

Projects

Mixed Precision Monte Carlo Methodology for Reconfigurable Accelerator Systems

This work introduces a novel mixed precision methodology applicable to any Monte Carlo (MC) simulation. It involves the use of datapaths with reduced precision, and the resulting errors are corrected by auxiliary sampling. An analytical model is developed for a reconfigurable accelerator system with a field-programmable gate array (FPGA) and a general purpose processor (GPP). Optimisation based on mixed integer geometric programming is employed for determining the optimal reduced precision and optimal resource allocation among the MC datapaths and correction datapaths. Experiments show that the proposed mixed precision methodology requires up to 11% additional evaluations while less than 4% of all the evaluations are computed in the reference precision; the resulting designs are up to 7.1 times faster and 3.1 times more energy efficient than baseline double precision FPGA designs, and up to 163 times faster and 170 times more energy efficient than quad-core software designs optimised with the Intel compiler and Math Kernel Library. Our methodology also produces designs for pricing Asian options which are 4.6 times faster and 5.5 times more energy efficient than NVIDIA Tesla C2070 GPU implementations.


Multi-level Customisation Framework for Curve Based Monte Carlo Financial Simulations

One of the main challenges when accelerating financial applications using reconfigurable hardware is the management of design complexity. This work proposes a multi-level customisation framework for automatic generation of complex yet highly efficient curve based financial Monte Carlo simulators on reconfigurable hardware. By identifying multiple levels of functional specialisations and the optimal data format for the Monte Carlo simulation, we allow different levels of programmability in our framework to retain good performance and support multiple applications. Designs targeting a Virtex-6 SX475T FPGA generated by our framework are about 40 times faster than single-core software implementations on an i7-870 quad-core CPU at 2.93 GHz; they are over 10 times faster and 20 times more energy efficient than 4-core implementations on the same i7-870 quad-core CPU, and are over three times more energy efficient and 36% faster than a highly optimised implementation on an NVIDIA Tesla C2070 GPU at 1.15 GHz. In addition, our framework is platform independent and can be extended to support CPU and GPU applications.


Optimising Performance of Quadrature Methods with Reduced Precision

This work presents a generic precision optimisation methodology for quadrature computation targeting reconfigurable hardware to maximise performance at a given error tolerance level. The proposed methodology optimises performance by considering integration grid density versus mantissa size of floating-point operators. The optimisation provides the number of integration points and mantissa size with maximised throughput while meeting given error tolerance requirement. Three case studies show that the proposed reduced precision designs on a Virtex-6 SX475T FPGA are up to 6 times faster than comparable FPGA designs with double precision arithmetic. They are up to 15.1 times faster and 234.9 times more energy efficient than an i7-870 quad-core CPU, and are 1.2 times faster and 42.2 times more energy efficient than a Tesla C2070 GPU.


A Mixed Precision Methodology for Mathematical Optimisation

This project introduces a novel mixed precision methodology for mathematical optimisation. It involves the use of reduced precision FPGA optimisers for searching potential regions containing the global optimum, and double precision optimisers on a general purpose processor (GPP) for verifying the results. An empirical method is proposed to determine parameters of the mixed precision methodology running on a reconfigurable accelerator consisting of FPGA and GPP. The effectiveness of our approach is evaluated using a set of optimisation benchmarks. Using our mixed precision methodology and a modern reconfigurable accelerator, we can locate the global optima 1.7 to 6 times faster compared with quad-core optimiser. The mixed precision optimisations search up to 40.3 times more starting vector per unit time compared with quad- core optimisers and only 0.7% to 2.7% of these searches are refined using GPP double precision optimisers. The proposed methodology also allows us to accelerate problems with more complicated functions or to solve problems involving higher dimensions.

Publications

  1. G.C.T. Chow, A.H.T. Tse, Q. Jin, W. Luk, P.H.W. Leong and D.B. Thomas. "A mixed precision Monte Carlo methodology for reconfigurable accelerator systems". In Proc. ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA), pp. 57-66, 2012. (pdf)
  2. Q. Jin, D. Dong, A.H.T. Tse, G.C.T. Chow, D.B. Thomas, W. Luk and S. Weston. "Multi-level Customisation Framework for Curve Based Monte Carlo Financial Simulations". In Proc. International Symposium on Applied Reconfigurable Computing (ARC), pp. 187-201, 2012. (pdf)
  3. A.H.T. Tse, G.C.T. Chow, Q. Jin, D.B. Thomas and W. Luk, "Optimising Performance of Quadrature Methods with Reduced Precision". In Proc. International Symposium on Applied Reconfigurable Computing (ARC), pp. 251-263, 2012. (pdf)
  4. G.C.T. Chow, W. Luk and P.H.W. Leong. "A Mixed Precision Methodology for Mathematical Optimisation". In Proc. IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2012. (pdf)