Several projects are exploring techniques, such as mixed precision and multi-level customisation, for maximising speed and energy efficiency of various applications in high-performance computing.
We pioneer reconfigurable acceleration for computational finance; since the first paper in 2005, we have studied many techniques and tools for financial applications including financial modelling and algorithmic trading.
Various reconfigurable solutions have been developed to speed up a variety of media processing and graphics operations, including motion estimation, trace transform, and radiosity-based scene generation.
This research proposes an analytical model that relate FPGA architectural parameters to the logic size and depth of an FPGA implementation. The model can be used in FPGA architectural investigations to complement experimental approaches.
This research explores aspect-oriented techniques for systematic mapping of high-level programs into FPGA designs; the functional code is decoupled from non-functional requirements for enhanced modularity.
Axel is a high performance heterogeneous cluster. Each of its nodes include multicore processors, graphics processing units, and field-programmable gate arrays. It enables research on next-generation systems such as data centres with heterogeneous accelerators.
This research explores three strategic directions for advanced digital systems: customisable heterogeneous architectures, including design space exploration of devices and systems, advanced development methods and tools, and prototyping platforms and design portability enhancement; self-adapting design, including architecture innovations, adaptation policies and optimisation strategies, and design and verification flow; security-aware systems, including architecture enhancements, compilation and test generation environments, and experimental facilities and demonstration flow.
We have studied techniques, tools and applications for custom instruction processors, including their optimisation and verification.
The FPGA Cube is designed to include 8 boards, each containing 64 FPGA devices. It provides simple interface and streamline processing power. With high bandwidth systolic inter-FPGA communication and flexible programming scheme, we created a low power, high density and scalable super computing machine suitable for various large scale parallel applications.
A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, such as 2D convolution and colour correction, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU, and a Xilinx Virtex-4 field programmable gate array (FPGA).
The GroundHog benchmark suite is designed for evaluating the power consumption of reconfigurable technology for applications targeting the mobile computing domain. It includes seven designs: one design targets fine-grained FPGA fabrics allowing for quick state- of-the-art evaluation, and six designs are specified at a high level allowing them to target a range of existing and future reconfigurable technologies.
The hArtes toolchain is composed of tools which can be linked together to form a coherent workflow. There are novel algorithms for design space exploration, which aims to automate design partitioning, task transformation, choice of data representation, and metric evaluation for both hardware and software components. A system synthesis tool produces heterogeneous implementations that best exploit the capability of each type of processing element.
A toolchain that targets multiprocessor heterogeneous systems comprising different types of processing elements such as general-purposed processors (GPPs), digital signal processors (DSP), and field-programmable gate arrays (FPGAs) from a high-level C program. The core tools include a task transformation engine, a mapping selector, a data representation optimiser, and a hardware synthesiser.
Optimisations involving both high precision computation and low precision computation are described; they are applied to Monte Carlo simulation, function ccmparison, and global optimisation.
NeMo is a spiking neural network simulator aimed at real-time simulation of hndreds of thousands of realistically connected spiking neurons. It targets many-core processors, particularly graphics processing units (GPUs).
These projects address network flow analysis and firewall processors, hardware-accelerated cryptographic designs, hardware platform for key search engine, power attack resistance, and security-aware cache.
Online linear regression module is a custom sampling module based on the pfmon tool. It works inside the pfmon, capturing the variation of the sample values with a series of straight lines.
This utility supports executing a C program on an or32 platform and perform simple text I/O via the Opencores UART commonly used with OpenRISC designs. The advantage of this approach is that many standard C functions including printf (to the UART), malloc/free, string functions and math functions would be available.
3S is an efficient, small and flexible program analysis framework. 3S can be used to identify hotspots, control and data dependencies, parallelism potentials, memory leaks and to confirm test case quality for programs written in any compilable language.
This project provides the source code for the paper "FPGA-Optimised Uniform Random Number Generators using LUTs and Shift Registers" which was presented at FPL 2010. The idea was to design uniform random number generators for FPGAs which are high quality, long period, customisable, low resource usage, state serialisation, and simple description.
VPH is a modified version of the VPR tool to explore hybrid FPGA architecture. It covers embedded blocks, memories, multipliers, supports carry chains. It allows user constraints for individual embedded blocks. The positions, area and delay of EBs can be specified in the user constraints, It supports extra wide channels around the embedded blocks, and allows enabling or disabling of switching wire direction inside embedded blocks.
The updated power models support the new data structures in VPR 5.0, the new architecture file format, and can estimate power on both uni-directional and bi-directional routing structures. Scripts, benchmarks, architecture files, and the source code along with some other tools are included.