Download Transforming Reconfigurable Systems: A Festschrift Celebrating the 60th Birthday of Professor Peter PDF

TitleTransforming Reconfigurable Systems: A Festschrift Celebrating the 60th Birthday of Professor Peter
File Size6.3 MB
Total Pages252
Table of Contents
List of Contributors
Table of Contents
1. Accelerator-Rich Architectures — Computing Beyond Processors
2. Whither Reconfigurable Computing?
3. An FPGA-Based Floating Point Unit for Rounding Error Analysis
4. The Shroud of Turing
5. Smart Module Redundancy — Approaching Cost Efficient Radiation Tolerance
6. Analysing Reconfigurable Computing Systems
7. Custom Computing or Vector Processing?
8. Maximum Performance Computing with Dataflow Technology
9. Future DREAMS: Dynamically Reconfigurable Extensible Architectures for Manycore Systems
10. Compact Code Generation and Throughput Optimization for Coarse-Grained Reconfigurable Arrays
11. Some Statistical Experiments with Spatially Correlated Variation Maps
12. On-Chip FPGA Debugging and Validation: From Academia to Industry, and Back Again
13. Enabling Survival Instincts in Electronic Systems: An Energy Perspective
Document Text Contents
Page 2


Page 126

Chapter 8

Maximum Performance Computing with Dataflow Technology

Michael Munday*, Oliver Pell*, Oskar Mencer*,† and Michael J. Flynn*,‡
*Maxeler Technologies

†Imperial College London
‡Stanford University

Reconfigurable computers, generally based upon field programmable gate array (FPGA)
technology, have been used successfully as a platform for performance critical applications in a
variety of industries. Applications targeted at reconfigurable computers can exploit their fine-
grained parallelism, predictable low latency performance and very high data throughput per watt.
Traditional techniques for designing configurations are, however, generally considered time-
consuming and cumbersome and this has limited commercial reconfigurable computer usage. To
solve this problem Maxeler Technologies Ltd, working closely with Imperial College London, have
developed powerful new tools and hardware based on the dataflow computing paradigm. In this
chapter we explore these tools and provide examples of how they have enabled the development of a
number of high performance commercial applications.

8.1. Introduction

The continuously increasing speed and memory capacity of supercomputers has,
over the past decades, allowed for the creation of ever more complex and
accurate mathematical simulations. There are challenges facing high
performance computing (HPC) however. Chief among these are the monetary
and environmental costs involved in purchasing and running a HPC system. The
electricity costs alone for an exascale supercomputer are estimated to be more
than $80 million [Ref. 1] a year.

To create a supercomputer that achieves the maximum possible performance
for a given power/space budget, the architecture of the system needs to be
tailored to the applications of interest. This involves optimally balancing
resources such as memory, data storage and networking infrastructure based on

Page 127

detailed analysis of the applications. As well as these high-level optimizations,
the architecture of the chips in the system needs to provide both speed and low
power consumption.

Currently the top 500 supercomputers [Ref. 2] are built from relatively
general purpose servers which rely on CPUs (and more recently general purpose
GPUs) for computation. The architectures used by these chips are suitable for a
wide range of tasks however this also means that their low-level architectures are
not necessarily optimal for the applications the supercomputer is designed to run.
Figure 8.1 shows how little of a modern CPU is dedicated to actual computation.
The rest of the chip is dedicated to subsystems such as caches, branch predictors
and schedulers designed to speed up programs. Far higher performance and
efficiency can be had by designing the architecture such that it is a perfect fit for
an application.

Fig. 8.1. Simplified diagram of an Intel Westmere 6-core processor chip, highlighting the approximate
portion of the chip performing computation versus other functions.

The underlying architecture of a computer system can be optimized by
developing application-specific integrated circuits (ASICs). Designing ASICs is,
however, a very costly exercise and limits how the supercomputer can be
adapted and improved over time. Peter Cheung has made many contributions to
reconfigurable computers based on chips such as field programmable gate arrays
(FPGAs); they are a lower cost way of unlocking the gains that architectural
customization can bring while retaining the programmability that makes general

Page 251

static scheduling, 126
subthreshold, 249
supercomputer, 138
superthreshold, 249
survivability, 237
survival, 237

Deep, 243
switched capacitor DC/DC convertor (SCCs)
swizzle-switch, 71
synaptic connections

neural computation, 125
synaptic plasticity, 80
synaptic update

neural computation, 124
synthesis, 156, 157

dependable, 242
manycore, 151
neural computation, 124
self-aware, 237

systolic array processor, 139

template, 3
thermal monitoring, 247
threshold, 250
throughput, 200, 201

radiation effects, 85
TILE64, 71
tolerance, 242
topology, 172, 173, 203
total ionizing dose (TID)
trace buffers, 222
tranche, 144
tranched credit derivatives, 143
transformations, 176, 177, 179, 202
transients, 242
transistor, 59
translation look-aside buffer (TLB), 4
triune model, 241

Page 252

TSMC, 250
Turing, 57, 80
Turing machines, 68

uncertainty, 245
Universal Machine, 57

validation, 220
variation, 207, 210
vector processing, 117, 118
vector width

BlueVec, 119
Veridae Systems, 225
verification, 220
virtual precision, 42, 54
Vivado HLS, 14
VLIW, 71

processor, 167–170, 182

weakly programmable processor array, 168
wrappers, 170–172

Xilinx, 52, 84–86, 90

Yeung, Jackson, H.C., 41, 51
yield, 214, 215

Zynq, 52

Similer Documents