#### THE COARSE-GRAINED / FINE-GRAINED LOGIC INTERFACE IN FPGAS WITH EMBEDDED FLOATING-POINT ARITHMETIC UNITS

Chi Wai Yu<sup>1</sup>, Julien Lamoureux<sup>2</sup>, Steven J.E. Wilton<sup>2</sup>, Philip Leong<sup>3</sup>, Wayne Luk<sup>1</sup>

<sup>1</sup>Dept of Computing Imperial College London, London {cyu,wl}@doc.ic.ac.uk <sup>2</sup>Dept of Electrical and Computer Engineering, University of British Columbia, Vancouver, B.C., Canada {julienl, stevew}@ece.ubc.ca <sup>3</sup>Dept of Computer Science and Engineering, Chinese University of Hong Kong phwl@cse.cuhk.edu.hk

# Outline

- 1. Motivation
- 2. Background
- 3. Contributions
- 4. Architecture assumptions
- 5. Interface parameters
- 6. Exploration tool
- 7. Benchmarks
- 8. Results
- 9. Conclusion

# 1. Motivation

- Interface between coarse-grained and fine-grained is important to an FPGA
- Not flexible enough : reduce usefulness of coarse-grained block
- Too flexible : result in unnecessary overhead for applications that do not use the embedded component



# 2. Background

- Adding coarse-grained block within the fine-grained fabric FPGA improves area and delay
- Novel domain-specific hybrid FPGA architecture that targets floating point applications improves 18 times area-efficient
- Few have examined the interface between the coarsegrained blocks and fine-grained fabric

## 3. Contributions

- A set of parameters that describes the interface between coarse-grained and fine-grained programmable logic in FPGAs
- An empirical framework to model the impact of coarsegrained architectural parameters in terms of performance, density, and power consumption
- An empirical study that examines the set of parameters

# 4. Architecture assumption -Hybrid FPGA



# Architecture assumption – Coarse/Fine-Grained

 Fine-Grained (configurable logic block) Coarse-Grained (64 bits floating point block, 182 tiles, 9.2ns )



#### 5. Interface parameters – 1 EB Position

| E | В | E | В | E | В | Е | В |
|---|---|---|---|---|---|---|---|
|   |   |   |   |   |   |   |   |
|   |   |   |   |   |   |   |   |
|   |   |   |   |   |   |   |   |
|   |   |   |   |   |   |   |   |
| E | В | E | В | E | В | E | В |

| EB |  | EB |  | EB |  | EB |  |
|----|--|----|--|----|--|----|--|
|    |  |    |  |    |  |    |  |
|    |  |    |  |    |  |    |  |
|    |  |    |  |    |  |    |  |
|    |  |    |  |    |  |    |  |
|    |  |    |  |    |  |    |  |
|    |  |    |  |    |  |    |  |

(a)Type 1: EBs are on the top and bottom of CLBs (b) Type 2: All EBs are on the top of CLBs

(c)Type 3: EBs are in the middle of CLBs

EΒ

EΒ

EΒ

EΒ



(d) Type 4: EBs are surrounded by sea of CLBs

#### Interface parameters – 2 Pin Location







6. Exploration Tool – VPH : Versatile Place and Route for Hybrid FPGAs



#### 7. Benchmarks

- bgm Monte Carlo simulations of interest rate model derivatives
- dscg digital sine cosine generator
- bfly basic component of Fast Fourier Transform
- ode ordinary differential equation
- mm3 3x3 matrix multiplication circuit
- fir4 4-tap finite impulse response filter

| Benchmarks  | bgm  | dscg | bfly | ode | mm3 | fir4 |
|-------------|------|------|------|-----|-----|------|
| No. of CLBs | 6433 | 647  | 790  | 336 | 773 | 180  |
| No. of FPUs | 7    | 8    | 8    | 8   | 8   | 8    |

Number of FPU and CLB used in each benchmarks

#### 8. Result – 1 EB Position



#### Result -2 Pin Location – Delay











#### Result -2 Pin Location – Channel width

| I/O pos. | 1 side                               | 2 sides  | 3 sides  | 4 sides  |  |  |  |  |
|----------|--------------------------------------|----------|----------|----------|--|--|--|--|
| Circuit  | Min. channel width (Dev. From 1side) |          |          |          |  |  |  |  |
| bgm      | 46(0%)                               | 35(-22%) | 32(-29%) | 25(-44%) |  |  |  |  |
| dscg     | 43(0%)                               | 33(-23%) | 32(-26%) | 32(-26%) |  |  |  |  |
| bfly     | 44(0%)                               | 35(-20%) | 33(-25%) | 32(-27%) |  |  |  |  |
| ode      | 406(0%)                              | 38(-17%) | 37(-20%) | 38(-17%) |  |  |  |  |
| mm3      | 42(0%)                               | 40(-5%)  | 26(-38%) | 24(-43%) |  |  |  |  |
| fir4     | 43(0%)                               | 33(-23%) | 32(-26%) | 30(-30%) |  |  |  |  |









Minimum channel width for different I/O configurations

16

## Result-3 Interconnect Flexibility



Channel width = w

# Result-4 Shape



#### Result - Discussion

#### **Critical path**

-- Routed efficiently once the circuit is routable



## 9. Conclusion

- Study interconnect between coarse-grained floating point blocks and the fine-grained fabric
- Best interface parameters:
  - EBs close to each other in the middle of the chip
  - Pins distributed evenly around the EB
  - Width of the channels have little impact
  - Square EB is most efficient
- Future: apply approach to FPGAs with other types of embedded blocks