Vivado HLS Example: FFT

Nitin Chandrachoodan
12 Sept 201914:56

Summary

TLDRThis video script outlines the process of implementing a Fast Fourier Transform (FFT) using Vivado High-Level Synthesis (HLS) tools. It covers generating reference data, writing C code, simulating, synthesizing, and packaging the IP for export. The tutorial skips optimization for simplicity and assumes a Linux environment. It also uses Python for data generation and Anaconda for numeric libraries, emphasizing the importance of experimentation and adaptability across different platforms.

Takeaways

  • ๐Ÿ“š The video demonstrates an implementation of the Fast Fourier Transform (FFT) using Vivado HLS tools.
  • ๐Ÿ” It covers the steps from generating reference data to exporting the IP for use in Vivado tools, excluding optimization topics.
  • ๐ŸŒ Reference websites are provided for the source code and additional tools like the Anaconda distribution for Python.
  • ๐Ÿ’ป The demonstration is conducted on a Linux system, but the principles are general and can be adapted to other systems.
  • ๐Ÿ“ The process begins by cloning a repository containing all the source code and data for the FFT example.
  • ๐Ÿ”ข Data for testing is generated using a Python script that creates random complex values, computes an FFT, and stores the results in different formats.
  • ๐Ÿ› ๏ธ The Vivado HLS project is set up with the main source code and test bench, including the generated .txt files for input and expected output.
  • ๐Ÿ”„ The part number for the FPGA board is selected to match the hardware that will be used for implementation.
  • ๐Ÿ” A C simulation is run to ensure the correctness of the implementation before proceeding to synthesis.
  • ๐Ÿ“ˆ Post-synthesis RTL co-simulation is performed to confirm that the synthesized design matches the C simulation results.
  • ๐Ÿ“ฆ Finally, the RTL is exported as a package for implementation on an FPGA, concluding the design process.

Q & A

  • What is the main topic of the video?

    -The video is about implementing a simple design, specifically the Fast Fourier Transform (FFT), using Vivado High-Level Synthesis (HLS) tools.

  • What are the steps followed in the video for implementing the FFT design?

    -The steps are: 1) Generate reference data for testing, 2) Write C code, 3) Simulate using Vivado HLS, 4) Synthesize and simulate post-synthesis, 5) Package the IP for export and use in Vivado tools for implementation.

  • Why are certain topics not covered in the video?

    -The video does not cover optimization of bit widths, data representations, or design optimization for size, speed, or other metrics to focus on the basic principles of using Vivado HLS tools.

  • What is the source code of the demonstration available at?

    -The source code is available at the website 'get-lab.com/Chandra/children/each-FPGA'.

  • What is the example used in the video for demonstrating the FFT using HLS?

    -The example used is an FFT implementation with HLS, and the audience is encouraged to clone and modify the information from the provided website.

  • Which Python libraries are mentioned for data analysis and signal processing?

    -The video mentions the use of numeric Python libraries, specifically the Anaconda distribution, which includes scientific Python and other libraries useful for data analysis and signal processing.

  • Is the video's demonstration platform-specific?

    -The demonstration is done on a Linux system, but the principles involved are general and can be applied to Windows systems or different Linux distributions with appropriate modifications.

  • What is the purpose of the 'data_gen_FFT.py' script mentioned in the video?

    -The 'data_gen_FFT.py' script generates random input data, computes an FFT, and stores the input and output data in both floating-point and 16-bit fixed-point hexadecimal formats for testing.

  • What is the significance of the tolerance limit set in the test bench?

    -The tolerance limit is set to account for small errors that may occur due to the conversion from floating-point to fixed-point, ensuring the system does not flag an error for minor discrepancies.

  • What does the synthesis report provide after running C synthesis in Vivado HLS?

    -The synthesis report provides information on the target clock period, estimated clock period achieved, latency, initiation interval, and resource usage of the synthesized FFT function.

  • What is the purpose of running RTL co-simulation after synthesis?

    -The RTL co-simulation is run to verify that the synthesized design produces the same results as the C simulation and to observe internal signals and the behavior of the design at the RTL level.

  • How does the video conclude?

    -The video concludes by demonstrating the export of the RTL as a package for implementation on an FPGA in Vivado, marking the end of the design process discussed.

Outlines

00:00

๐Ÿ“š Introduction to Fast Fourier Transform (FFT) with Vivado HLS

This video introduces a simple design implementation of the Fast Fourier Transform (FFT) using Vivado High-Level Synthesis (HLS) tools. The process includes generating reference data for testing, writing C code, simulating, synthesizing, post-synthesis simulation, and packaging the IP for export. The video does not cover optimization of bit widths, data representations, or design size and speed. Reference websites are provided for source code and FPGA examples, along with Python libraries for data analysis and signal processing. The demonstration is conducted on a Linux system, but the principles are generalizable. The first step is to clone the source code repository and navigate to the FFT example directory, which contains HLS source code and FPGA implementation files. A temporary building directory is created for synthesis projects. The script 'data_gen_FFT.py' generates random input data, computes FFT, and stores the results in floating-point and hex formats for testing on hardware.

05:05

๐Ÿ”ง Setting Up the Vivado HLS Project and Simulation

The video continues by detailing the setup of a Vivado HLS project, including naming the project, adding the main source code 'FFT.cpp', and the test bench 'FFT_TB.cpp'. It also covers the creation of '.txt' files for input and output data. The part number for the FPGA board is specified, and the test bench is explained, which reads input data, feeds it into the FFT function, and compares the output with expected results within a tolerance to account for floating-point to fixed-point conversion errors. The C simulation is run to ensure the correctness of the implementation, and the synthesis report is reviewed for achieved clock period, latency, and resource usage. The video emphasizes that no optimizations have been applied, and the results are expected to be basic. The interface signals of the module are described, including clock, reset, start, data in/out, and handshaking signals.

10:06

๐Ÿ”„ RTL Co-Simulation and Exporting the Design

The final part of the video script discusses running RTL co-simulation to verify the results against the C simulation. The co-simulation uses the generated VHDL or Verilog files and confirms that the design meets the expected latency and initiation interval. The waveform viewer is introduced to analyze the simulation results, highlighting the input and output data valid signals and the start/done signals for module operation. The video concludes with the export of the RTL as a package for FPGA implementation in Vivado, mentioning that various configuration options can be specified during export, but default values are used in this demonstration. The successful completion of the RTL export is indicated by a confirmation message, wrapping up the video's coverage on generating test data, C code simulation, synthesis, co-simulation, and IP export for Vivado HLS designs.

Mindmap

Keywords

๐Ÿ’กFast Fourier Transform (FFT)

The Fast Fourier Transform is an efficient algorithm to compute the Discrete Fourier Transform (DFT) and its inverse. It is fundamental in signal processing for converting time-domain signals into the frequency domain. In the video, FFT is the core algorithm being implemented using Vivado HLS tools to demonstrate the process of high-level synthesis.

๐Ÿ’กVivado HLS

Vivado HLS (High-Level Synthesis) is a tool from Xilinx that allows developers to write code in C, C++, or SystemC and then automatically synthesize it into hardware description language (HDL) for deployment on FPGAs. The video script outlines the steps to use Vivado HLS for implementing an FFT algorithm.

๐Ÿ’กReference Data

Reference data in the context of the video refers to a set of pre-computed or known results used for testing and verifying the correctness of the FFT implementation. The script mentions generating this data using Python scripts to ensure the FFT implementation matches expected outcomes.

๐Ÿ’กC Simulation

C Simulation is the process of running the C code to verify its functionality before synthesis into hardware. The script describes running a C simulation to ensure the FFT function operates correctly in software before proceeding to hardware synthesis.

๐Ÿ’กSynthesis

In the video, synthesis refers to the process of converting the high-level C code into a hardware description language (HDL) that can be implemented on an FPGA. The script details the synthesis process using Vivado HLS and the resulting resource usage and timing reports.

๐Ÿ’กPost-synthesis Simulation

Post-synthesis simulation is the act of simulating the synthesized hardware design to ensure it behaves as expected. The script mentions running RTL co-simulation after synthesis to confirm the FFT design's correctness.

๐Ÿ’กOptimization

Optimization in the context of the video pertains to improving the design for metrics such as size, speed, or power efficiency. The script notes that the tutorial does not cover optimization, focusing instead on the basic implementation process.

๐Ÿ’กBit Widths and Data Representations

Bit widths and data representations refer to the size of numerical data types and how they are stored or represented in the system. The script mentions that the tutorial does not delve into optimizing these aspects, which are crucial for efficient hardware implementation.

๐Ÿ’กNumeric Python Libraries

Numeric Python Libraries, such as those included in the Anaconda distribution, provide tools for numerical computations, which are used in the video for generating reference data. The script mentions using these libraries to create the initial data for the FFT algorithm.

๐Ÿ’กFPGA

FPGA stands for Field-Programmable Gate Array, which is a type of programmable hardware used for implementing digital circuits. The video is about implementing an FFT algorithm on an FPGA using Vivado HLS, showcasing the process from C code to hardware deployment.

๐Ÿ’กData Packing

Data packing in the context of the video refers to the technique of combining multiple data elements into a single, larger data element to optimize data transfer and storage. The script mentions using a data pack directive to combine real and imaginary parts of complex numbers into 32-bit values.

Highlights

Introduction to implementing a simple design, the Fast Fourier Transform (FFT), using Vivado High-Level Synthesis (HLS) tools.

Steps for generating reference data for testing, writing C code, simulating, synthesizing, and packaging the IP for export in Vivado HLS.

Omission of topics like optimization of bit widths, data representations, and design optimization for size, speed, or other metrics.

Reference to the source code website 'get-lab.com/Chandra/Children/each-FPGA' for the demonstration of FFT using HLS.

Mention of the anaconda distribution for Python libraries useful in data analysis and signal processing.

Adaptation of scripts for other languages like MATLAB is possible, but requires modification for appropriate functionality.

The demonstration is conducted on a Linux system, but the principles are generalizable to other systems like Windows.

Creation of a clone of the website containing all the source code data for the FFT example.

Explanation of the directory structure and the purpose of the 'scripts', 'VHLS', and 'Bavaro' folders.

Process of generating test data using a Python script that leverages the numeric Python library.

Generation of input and output data files in both floating-point and 16-bit fixed-point hexadecimal formats.

Initiation of an HLS project with the main source code and test bench, including the part number selection for FPGA implementation.

Description of the test bench that reads input data, feeds it into the FFT function, and compares the output to expected results.

Use of a tolerance limit to account for small errors due to the conversion from floating-point to fixed point.

Running C simulation to ensure the correctness of the implementation before synthesis.

Achievement of the target clock period and latency estimation in the synthesis report.

Interface description of the FFT module, including signals for data streams and handshaking.

Observation of the RTL co-simulation results, confirming the same outcome as the C simulation.

Exporting the RTL as a package for implementation on an FPGA, concluding the video on Vivado HLS design testing and simulation.

Transcripts

play00:02

in this video we are going to look at an

play00:05

implementation of a simple design namely

play00:07

the fast Fourier transform using Vivaro

play00:10

high level synthesis tools the steps

play00:12

that we will follow will be first

play00:14

generate reference data that will be

play00:16

used for testing write C code and

play00:19

simulate using Vivaro hls synthesize

play00:23

simulate post synthesis and then finally

play00:27

package the IP for export and use in the

play00:30

Vivaro tools for implementation we will

play00:34

not be covering certain topics in this

play00:37

video in particular optimization of bit

play00:40

widths and data representations and

play00:42

optimization of the design in any sense

play00:44

for size speed or any other metric there

play00:49

are a couple of useful reference

play00:51

websites over here one of them is the

play00:55

source code of this entire demonstration

play00:58

all the source code used for this

play01:00

demonstration is available at the

play01:02

website get lab comm slash Chandra

play01:04

children / each FPGA the website itself

play01:07

is called demos with FPGA is the example

play01:09

that we will be using is FFT using HLS

play01:12

you are free to clone the information

play01:15

from this website and modify it as you

play01:17

see fit experimentation is the best way

play01:20

that you can learn in addition to this

play01:25

we will also be using the numeric Python

play01:28

libraries in particular the anaconda

play01:31

distribution

play01:34

which you can get from anaconda calm

play01:36

slash distribution this includes numeric

play01:39

Python scientific Python and various

play01:41

other libraries that are useful for data

play01:43

analysis as well as signal processing

play01:46

installation and using these tools is

play01:48

beyond the scope of this video if you

play01:50

are more familiar with other languages

play01:52

such as MATLAB feel free to use them

play01:54

instead you will have to adapt some of

play01:56

the scripts to get them to work

play01:58

appropriately so the idea of this demo

play02:01

video is more to show you the principles

play02:03

involved rather than to say that this is

play02:05

specifically the only way to work

play02:10

please note that this entire

play02:12

demonstration is done on a Linux system

play02:14

and following the instructions given

play02:16

over here would be easiest if you have a

play02:18

similarly setup Linux system however the

play02:21

principles involved are general and even

play02:23

if you have a Windows system or a

play02:24

different distribution of Linux

play02:26

you should not really find it too hard

play02:27

to modify the instructions appropriately

play02:33

we will start by creating a clone of the

play02:36

website that contains all the source

play02:38

code data once you have a copy of the

play02:43

data you can CD into the appropriate

play02:45

directory and the folder corresponding

play02:51

to the FFT example there are three

play02:56

subfolders over here namely scripts VH

play02:58

LS which is the reward of HLS source

play03:00

code and Bavaro which will be used for

play03:02

the FPGA implementation later to keep

play03:07

things clean we will create a temporary

play03:09

building directory and store all our

play03:14

synthesis projects inside that folder

play03:19

the first thing we need to do is to

play03:21

generate the data to be used for testing

play03:23

I will first open up the script

play03:24

corresponding to that it is in the

play03:26

scripts folder and is called data gen

play03:28

underscore FFT dot B while it's a Python

play03:33

script it uses a numeric Python library

play03:36

and defines a few functions that help us

play03:38

to output data without going into the

play03:41

details of exactly how the different

play03:43

functions work all that you need to

play03:46

understand at the moment is it generates

play03:48

random input data starting from a known

play03:51

seed for repeatability the data is

play03:54

generated as a random complex values an

play03:56

FFT is computed and the input as well as

play04:00

the generated output are stored into two

play04:02

files in floating-point format the same

play04:06

data and the result are also stored in

play04:09

hex format into two other files these

play04:14

will be used later for the Vivaro

play04:16

testing on hardware

play04:22

we can run the script as follows

play04:28

and you will notice that four files have

play04:30

been created over here inputs a PP dot

play04:32

text and output C 2 P dot text contain

play04:35

the floating-point data that will be

play04:37

read by the C program or Bovada HLS test

play04:40

bench and impacts dot mem and out X dot

play04:43

mem have the same data but now stored

play04:45

after conversion to 16-bit fixed point

play04:48

in hexadecimal format

play04:54

the next step is to start with Auto HLS

play04:58

and generate a project

play05:05

we will create a new project where we

play05:09

give it any name that is suitable for

play05:11

example a 5032 the main source code will

play05:17

be added from the VHL s folder and it's

play05:20

FFT dot CPP the top function over here

play05:25

is FFT next we add the test bench which

play05:31

is FFT underscore TB dot CPP but in

play05:34

addition to this we also need to

play05:36

generate the two dot txt files InP

play05:39

underscore CPP dot txt and out

play05:43

underscore CPP dot txt that was just

play05:45

generated by running the data underscore

play05:47

gen script

play05:52

the part number needs to be selected

play05:55

appropriately so that we can later on

play05:56

implement it on FPGA the simplest way to

play06:00

do this is to directly type the part

play06:02

number corresponding to the basis 3 FPGA

play06:04

board over here it is XC 7 a 3 5 t c PG

play06:13

and the part number is 236 corresponding

play06:16

to a 236 pin package - won 4-1

play06:20

SpeedGrade

play06:26

once the project has been created you

play06:28

can look at the source code which is the

play06:31

FFT dot CPP file that we had just added

play06:34

and the test bench the test bench itself

play06:38

is fairly straightforward it basically

play06:40

reads in the InP cpp and out cpp dot txt

play06:43

files the InP CPP is used in order to

play06:47

generate the input data array which we

play06:53

then feed into the FFT function and the

play06:56

out underscore CPP dot txt file is used

play06:59

in order to generate another array

play07:00

called exp underscore out which is the

play07:02

expected output data so that finally we

play07:05

can then go over the computer data that

play07:10

is data underscore out and the expected

play07:11

data which is expander score out

play07:14

calculate the difference between them

play07:17

the l2 norm of that and as long as that

play07:21

value is less than a certain tolerance

play07:24

limit we will say that it is error-free

play07:26

the reason for applying this tolerance

play07:29

is that since we are converting from

play07:31

floating-point which was used in python

play07:33

to fixed point which is being used for

play07:35

roberto HLS there could in fact be some

play07:37

small errors and we don't want the

play07:40

system to show an error for that the

play07:44

tolerance that is set over here is 0.01

play07:47

you can play around with this value you

play07:50

may also notice that there is a for loop

play07:53

out here and the entire process is

play07:56

repeated twice the reason for this will

play07:58

become clearer later when we discuss the

play08:01

post synthesis RTL co-simulation the

play08:07

first step after having written the

play08:09

source code is to run a C simulation

play08:13

you can click on the button there so

play08:15

that says run Z simulation and once we

play08:18

select ok it goes ahead start launches a

play08:21

compiler and runs the simulation it is

play08:25

important that you see this message pass

play08:27

this was printed out by the test bench

play08:29

and the information message sees him

play08:32

done with zero errors is printed out

play08:34

later by wavered Rachel SC simulation

play08:36

itself the test bench has been written

play08:40

in such a way that if there were any

play08:43

errors it would have indicated a message

play08:45

fail and would also have returned a

play08:47

value which is not equal to zero now

play08:52

that we are comfortable that the C

play08:54

simulation passes and has end is doing

play08:56

the correct thing we can run C synthesis

play09:05

after a short while the synthesis report

play09:07

for the FFT function pops up you can see

play09:10

that the target clock period that we had

play09:12

set was 10 nanoseconds and the system

play09:13

was actually able to achieve an

play09:15

estimated 7.3 nanoseconds the latency

play09:18

and initiation interval are 833 clock

play09:20

cycles keep in mind that we have not

play09:22

applied any optimizations so this number

play09:26

is not surprising the resource usage

play09:29

once again we have not applied any kinds

play09:31

of optimizations so there is nothing

play09:32

specific to expect out here the only

play09:35

other thing of interest is the interface

play09:37

itself the signals that the module has

play09:40

r1 is the clock and then the reset and a

play09:43

start signal which is used to start the

play09:45

module operation as outputs it generates

play09:48

a Dun signal it also indicates when it

play09:51

is idle and when it is ready to accept

play09:53

new inputs the data in and data out

play09:56

signals have been declared as acts a

play09:58

stream in the fft code out here and as a

play10:03

result of that both of them show up as

play10:05

Aksai streams which have the data a

play10:08

valid signal to indicate when the input

play10:12

is valid and a ready signal that is

play10:14

given out by the module or here to

play10:16

indicate when it is ready to accept new

play10:17

data this is on the input side

play10:19

conversely on the output side the module

play10:22

generates a valid signal when it has

play10:23

valid data generated at the output and

play10:27

takes in a ready signal from the next

play10:30

stage which it can then feed into the

play10:34

reason why the data is showing up as 32

play10:37

bits is because we have also applied a

play10:39

data pack directive which takes the

play10:42

complex values and packs them into

play10:44

single 32-bit values combining the real

play10:49

and imaginary parts together

play10:53

once we are satisfied with the synthesis

play10:56

results we can click on run see RTL

play10:59

co-simulation this brings up a very log

play11:03

simulator very log or VHDL since both of

play11:06

them have been generated by the tool

play11:08

itself we don't really have a specific

play11:09

choice of one over the other the only

play11:12

thing of interest is that we can change

play11:14

the dump trace to all so that we can see

play11:16

internal signals and click on OK

play11:23

the RTL co-simulation takes a little

play11:26

longer than the say simulation but at

play11:27

the end of it you should once again see

play11:29

the message pass which indicates that

play11:31

exactly the same result as the C

play11:34

simulation has been obtained

play11:35

you'll also notice up here the

play11:37

co-simulation report says that the very

play11:39

log status is passed and that the

play11:40

latency is 833 clock cycles and the

play11:44

initiation interval is 834 this is

play11:46

actually measured from the

play11:47

implementation you can open the waveform

play11:50

viewer in order to actually see the

play11:53

results the waveform viewer shows you

play11:58

all the data from the simulation in a

play12:01

Vivaro environment but once again this

play12:03

is only a simulation the important thing

play12:06

as far as we are concerned is the design

play12:07

top signals you can look at the C

play12:10

outputs in particular all the signals

play12:13

corresponding to data out the C inputs

play12:15

all the signals corresponding to data in

play12:17

and the block level handshaking signals

play12:20

a few things to observe in this waveform

play12:23

one of them is the fact that since we

play12:27

ran the simulation twice in the test

play12:28

bench you have two sections over here

play12:32

where input is being read from the data

play12:36

here as well as around the 8 micro

play12:38

second mark what you can also during

play12:43

those times the data valid is considered

play12:45

to be one that is to say the data is

play12:47

actually being read into the FFT module

play12:50

the on the output side we have two

play12:54

segments once again where output is

play12:56

generated the data valid is one in short

play13:00

intervals and at each time that the data

play13:02

value becomes equal to one we find that

play13:03

there is a corresponding value of the

play13:05

output this once again happens two times

play13:09

corresponding to the fact that the for

play13:12

loop ran twice the start signal goes

play13:15

high at the 125 nanosecond mark

play13:18

and the corresponding done signal goes

play13:20

high at the eight point four six five

play13:22

microsecond mark

play13:23

this difference in clock cycles is used

play13:26

to compute the latency similarly the

play13:29

time between the done signal

play13:31

corresponding to the first input

play13:33

sequence which is at a point four six

play13:37

five microseconds and the second input

play13:39

sequence which ended at sixteen point

play13:41

eight zero five microseconds that

play13:44

difference gives us the initiation

play13:46

interval or the time between successive

play13:48

samples once we are satisfied with the

play13:52

waveform we can close this go back to

play13:56

Vivaro hls and now we can export the RTL

play14:00

as a package that can then be imported

play14:02

into Vivaro

play14:04

for implementation on an FPGA there are

play14:07

a number of configuration options that

play14:09

can be specified over here but we will

play14:10

just skip all of these and accept the

play14:12

default values once again we will not be

play14:14

evaluating the generated RTL that just

play14:17

gives us more detailed information on

play14:19

timing which could be useful in certain

play14:20

circumstances but for the simple example

play14:23

we are going to skip this at the end you

play14:27

should see the message finished export

play14:29

RTL which indicates that the RTL export

play14:32

completed successfully

play14:35

this brings us to the end of this video

play14:37

where we discussed how we can generate

play14:40

the sample data that can be used for

play14:42

testing a Vivaro hls design right the C

play14:45

code and a test bench simulate

play14:48

synthesize Co simulate and export the

play14:51

corresponding IP

Rate This
โ˜…
โ˜…
โ˜…
โ˜…
โ˜…

5.0 / 5 (0 votes)

Related Tags
FFTVivado HLSHigh Level SynthesisDesign TutorialC SimulationFPGASignal ProcessingNumeric PythonAnaconda DistributionLinux SystemData Generation