The iframe on this page is empty and contains no contentSkip to content

Research: Custom Computing


Maximum Performance Computing

Several projects are exploring techniques, such as mixed precision and multi-level customisation, for maximising speed and energy efficiency of various applications in high-performance computing.

Accelerating computational finance

We pioneer reconfigurable acceleration for computational finance; since the first paper in 2005, we have studied many techniques and tools for financial applications including financial modelling and algorithmic trading.

Accelerating media processing and graphics

Various reconfigurable solutions have been developed to speed up a variety of media processing and graphics operations, including motion estimation, trace transform, and radiosity-based scene generation.

Analytical modelling for FPGA architecture

This research proposes an analytical model that relate FPGA architectural parameters to the logic size and depth of an FPGA implementation. The model can be used in FPGA architectural investigations to complement experimental approaches.

Aspect-oriented design

This research explores aspect-oriented techniques for systematic mapping of high-level programs into FPGA designs; the functional code is decoupled from non-functional requirements for enhanced modularity.

Axel Cluster

Axel is a high performance heterogeneous cluster. Each of its nodes include multicore processors, graphics processing units, and field-programmable gate arrays. It enables research on next-generation systems such as data centres with heterogeneous accelerators.

Custom computing for advanced digital systems (Platform Grant)

This research explores three strategic directions for advanced digital systems: customisable heterogeneous architectures, including design space exploration of devices and systems, advanced development methods and tools, and prototyping platforms and design portability enhancement; self-adapting design, including architecture innovations, adaptation policies and optimisation strategies, and design and verification flow; security-aware systems, including architecture enhancements, compilation and test generation environments, and experimental facilities and demonstration flow.

Custom instruction processors

We have studied techniques, tools and applications for custom instruction processors, including their optimisation and verification.


The FPGA Cube is designed to include 8 boards, each containing 64 FPGA devices. It provides simple interface and streamline processing power. With high bandwidth systolic inter-FPGA communication and flexible programming scheme, we created a low power, high density and scalable super computing machine suitable for various large scale parallel applications.

FPGA/GPU performance comparison

A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, such as 2D convolution and colour correction, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU, and a Xilinx Virtex-4 field programmable gate array (FPGA).

GroundHog Benchmarks

The GroundHog benchmark suite is designed for evaluating the power consumption of reconfigurable technology for applications targeting the mobile computing domain. It includes seven designs: one design targets fine-grained FPGA fabrics allowing for quick state- of-the-art evaluation, and six designs are specified at a high level allowing them to target a range of existing and future reconfigurable technologies.

hArtes: a toolchain for embedded systems

The hArtes toolchain is composed of tools which can be linked together to form a coherent workflow. There are novel algorithms for design space exploration, which aims to automate design partitioning, task transformation, choice of data representation, and metric evaluation for both hardware and software components. A system synthesis tool produces heterogeneous implementations that best exploit the capability of each type of processing element.

Harmonic: a high-level compilation toolchain

A toolchain that targets multiprocessor heterogeneous systems comprising different types of processing elements such as general-purposed processors (GPPs), digital signal processors (DSP), and field-programmable gate arrays (FPGAs) from a high-level C program. The core tools include a task transformation engine, a mapping selector, a data representation optimiser, and a hardware synthesiser.

Mixed Precision Optimisation

Optimisations involving both high precision computation and low precision computation are described; they are applied to Monte Carlo simulation, function ccmparison, and global optimisation.

Nemo Neural Simulator

NeMo is a spiking neural network simulator aimed at real-time simulation of hndreds of thousands of realistically connected spiking neurons. It targets many-core processors, particularly graphics processing units (GPUs).

Network security and cryptography

These projects address network flow analysis and firewall processors, hardware-accelerated cryptographic designs, hardware platform for key search engine, power attack resistance, and security-aware cache.

Online Linear Regression Module

Online linear regression module is a custom sampling module based on the pfmon tool. It works inside the pfmon, capturing the variation of the sample values with a series of straight lines.

OpenRISC platform utility

This utility supports executing a C program on an or32 platform and perform simple text I/O via the Opencores UART commonly used with OpenRISC designs. The advantage of this approach is that many standard C functions including printf (to the UART), malloc/free, string functions and math functions would be available.

3S program instrumentation and characterisation framework

3S is an efficient, small and flexible program analysis framework. 3S can be used to identify hotspots, control and data dependencies, parallelism potentials, memory leaks and to confirm test case quality for programs written in any compilable language.

Uniform random number generator for FPGAs and GPUs

This project provides the source code for the paper "FPGA-Optimised Uniform Random Number Generators using LUTs and Shift Registers" which was presented at FPL 2010. The idea was to design uniform random number generators for FPGAs which are high quality, long period, customisable, low resource usage, state serialisation, and simple description.

VPH Hybrid FPGA Tool

VPH is a modified version of the VPR tool to explore hybrid FPGA architecture. It covers embedded blocks, memories, multipliers, supports carry chains. It allows user constraints for individual embedded blocks. The positions, area and delay of EBs can be specified in the user constraints, It supports extra wide channels around the embedded blocks, and allows enabling or disabling of switching wire direction inside embedded blocks.

VPR 5.0 with power estimation

The updated power models support the new data structures in VPR 5.0, the new architecture file format, and can estimate power on both uni-directional and bi-directional routing structures. Scripts, benchmarks, architecture files, and the source code along with some other tools are included.