Download Implementation of Parametric Haar-like Transformations on FPGA PDF

TitleImplementation of Parametric Haar-like Transformations on FPGA
LanguageEnglish
File Size1.2 MB
Total Pages88
Table of Contents
                            Abstract
Abstract (in Finnish)
Preface
Contents
Symbols and abbreviations
1 Introduction
	1.1 Application
	1.2 Field Programmable Gate Arrays
	1.3 Implementation Methods
	1.4 Thesis Structure
2 Parametric Haar-like Transformations
	2.1 Linear Transformations
	2.2 Parametric Haar-like Transformations
		2.2.1 Parametric Representation of Unitary Transformations
		2.2.2 Parametric Representation of Haar-like Transformations
		2.2.3 Mapping Parametric Haar-like Transformations to Hardware
	2.3 Inverse Square Root Calculation on FPGAs
		2.3.1 Common Inverse Square Root Calculation Methods
3 Implementation
	3.1 Hardware Description Language Model
		3.1.1 Haar-like transformation VHDL mixed model
		3.1.2 Pipelined Haar-like transformation VHDL mixed model
		3.1.3 Processing Element VHDL mixed model
		3.1.4 Findings and summary
	3.2 High-Level Synthesis Implementation
		3.2.1 Algorithmic C Datatypes
		3.2.2 Processing Elements
		3.2.3 Class Based Hierarchy
		3.2.4 Flat Hierarchy
4 Design Optimization and Results
	4.1 Processing Elements
		4.1.1 Fixed-point Real Number Implementation
		4.1.2 Fixed-point Complex Number Implementation
		4.1.3 Floating-point Possibilities
	4.2 Class Based Hierarchy
		4.2.1 Fixed-point Real Number Implementation
		4.2.2 Fixed-point Complex Number Implementation
	4.3 Flat Hierarchy
		4.3.1 Fixed-point Real Number Implementation
		4.3.2 Fixed-point Complex Number Implementation
	4.4 Summary and Design Questions
5 Conclusions
References
A QR-Decomposition Example
B Bitwidths for the Fixed-Point Operations
C Accuracy Results for PE operations
                        
Document Text Contents
Page 1

Implementation of Parametric
Haar-like Transformations on FPGA

Mikko Koverola

School of Electrical Engineering

Thesis submitted for examination for the degree of Master of
Science in Technology.
Espoo 14.5.2018

Supervisor

Prof. Jussi Ryynänen

Advisor

Ph.D. David Guevorkian

Page 2

Copyright c
2018 Mikko Koverola

Page 44

35

of the C++ code is translated correctly to the RTL model, a channel class can be
used in the Catapult HLS. The ac_channel<T> class defines FIFO data transfer
between the RTL model’s processes that are generated from the C++ functions. The
parameter T defines the data type that is passed through the channel. For example,
ac_channel<ac_fixed(4,2,false)> defines a channel that passes unsigned fixed-point
values through the channel. [38]

3.2.2 Processing Elements

A high-performance PE is the enabling factor for a high-performance Haar-like
transformation and the inverse square root logic block lies at the heart of an efficient
PE. As discussed in Chapter 2, it is of utmost importance to choose a well suited
calculation method for the inverse square root logic. The Catapult HLS 10.2/754530
provides two methods for implementing the inverse square root. One method is to
implement the inverse square root functionality with the standard AC square root
and division functions and the other is a piecewise linear (PWL) inverse square root
function included in the AC math libraries.

The PWL inverse square root function is a polynomial approximation method
that can process fixed- and floating-point inputs. The square root function on the
other hand is implemented as an iterative calculation and it can only have fixed-point
values as an input. For fixed-point PE implementation the results suggest that
the PWL inverse square root function uses less resources than the square root and
division functions but it also produces more inaccurate outputs than the square root
and division functions combined. The results are discussed in section 4.1.

The accuracy constraints for the PEs depend largely on how the Haar-like trans-
formation is used. If information about the expected input range is available this
information can be taken advantage of in the design process. For example, if it is
known that the transformations will process input vectors whose elements’ magni-
tudes will vary only from 0 to 1, the PEs can be designed to accommodate this range
of inputs with a given level of accuracy while minimizing the bitwidths. For the
investigation done in this master’s thesis, the input test vectors were restricted to
have only integer values between 1 and 32.

The fixed-point implementation of the PEs was done similarly as in the VHDL
models. Conceptually the same architecture can be considered as in the figure 15,
although without the delay logic. An example code of the C++ implementation of
the real-number fixed-point PEs is presented below. The name of the type definitions
for each operation output type are included in the example code comments.

Each output type for the operations is defined to be an AC fixed-point number.
By adjusting the lengths of the integer and fractional parts of the output types the
accuray-level for the whole PE could be defined. Also, defining the output types as
AC complex types easy migration to a complex number implementation could be
achieved. The bitwidths used for each output type in every transformation size are
presented in appendix B.

Page 45

36

void PE(
input_type in1,
input_type in2,
int K_ind1,
int K_ind2,
ctrl_type gen,
ctrl_type sel,
output_type &out1,
output_type &out2

){
if (gen == 1) { // Kernel generation logic

if (!( (in1 == 0) && (in2 == 0) )){
sqr1 = (in1*in1); // sqr1 = sqr_type
sqr2 = (in2*in2); // sqr2 = sqr_type
Temp = sqr1 + sqr2; // Temp = sum_type
sqrt(Temp,sqroot); // sqroot = sqroot_type
div(One,sqroot,invNorm); // invNorm = norm_type

}
if((in1 == 0) && (in2 == 0)){ // If norm is zero identity kernels

K[K_ind1] = 1; //K = kernel_type
K[K_ind2] = 1; //K = kernel_type }

else if (sel == 1){ // Generate bypass kernels
K[K_ind1] = 0; //K = kernel_type
K[K_ind2] = 1; //K = kernel_type }

else { // Generate kernels
K[K_ind1] = in1*invNorm; //K = kernel_type
K[K_ind2] = in2*invNorm; //K = kernel_type

}
}
// Multiplication logic
out1 = (K[K_ind1]*in1) + (K[K_ind2]*in2);
out2 = (K[K_ind2]*in1) - (K[K_ind1]*in2);

}

In the PEs the most complicated calculations are done in the kernel generation
logic. Floating-point Matlab reference calculations were generated for the operations
in the kernel generation logic and the reference output values were compared with
the output values of the fixed-point operations. The accuracy of the C++ fixed-point
values was set by first determining the needed integer part widths and then adjusting
the bitwidth of the fractional part to achieve the wanted accuracy-level (See figure
16). The accuracy and bitwidth optimizations results are discussed in section 4.1.1
for the real number and in section 4.1.2 for the complex number implementations.

The Catapult HLS tool is able to unroll loops to parallelize the loop iterations.
This is done either by using a pragma or setting the unrolling option from the
tool GUI. To investigate latency reductions, the iterative square root and division

Page 87

78

C Accuracy Results for PE operations

Table C1: The accuracy results for the fixed-point real number PE implementations’
kernel generation calculations.

Operation output PE, Sqrt&Div PE, Inv. Sqrt PE, Reference
In1 1.0000000000 1.0000000000 1.0000000000
In2 2.0000000000 2.0000000000 2.0000000000
Inverse Norm 0.4472122192 0.4467773437 0.4472135954
Kernel1 0.4472045898 0.4467773437 0.4472135954
Kernel2 0.8944396972 0.8935546875 0.8944271909
Out1 2.2360839843 2.2338867187 2.2360679774
Out2 0.0000000000 0.0000000000 0.0000000000

Table C2: The accuracy results for the fixed-point complex number PE implementa-
tions’ kernel generation calculations.

Operation output PE, Sqrt&Div PE, Inv. Sqrt PE, Reference
In1 Re 1.0000000000000 1.000000000000 1.000000000000
In1 Im 2.0000000000000 2.000000000000 2.000000000000
In2 Re 2.0000000000000 2.000000000000 2.000000000000
In2 Im 3.0000000000000 3.000000000000 3.000000000000
Inverse Norm 0.2357025146484 0.235534667968 0.235702260395
Kernel1 Re 0.2357025146484 0.235534667968 0.235702260395
Kernel1 Im 0.4714050292968 0.471069335937 0.471404520791
Kernel2 Re 0.4714050292968 0.471069335937 0.471404520791
Kernel2 Im 0.7071075439453 0.706604003906 0.707106781186
Out1 Re 4.2426452636718 4.239624023437 4.242640687119
Out1 Im 0.0000000000000 0.000000000000 0.000000000000
Out2 Re 0.0000000000000 0.000000000000 0.000000000000
Out2 Im 0.0000000000000 0.000000000000 0.000000000000

Page 88

79

Table C3: The accuracy results for the fixed-point and floating-point real number
PE implementations’ kernel generation calculations when using the PWL inverse
square root funtion.

Operation output Fixed-point Floating-point Reference
In1 1.0000000000 1.0000000000 1.0000000000
In2 2.0000000000 2.0000000000 2.0000000000
Inverse Norm 0.4467773437 0.4468140602 0.4472135954
Kernel1 0.4467773437 0.4468140602 0.4472135954
Kernel2 0.8935546875 0.8936281204 0.8944271909
Out1 2.2338867187 2.2340707778 2.2360679774
Out2 0.0000000000 0.0000000000 0.0000000000

Similer Documents