Vivado HLS Example: FFT
Summary
TLDRThis video script outlines the process of implementing a Fast Fourier Transform (FFT) using Vivado High-Level Synthesis (HLS) tools. It covers generating reference data, writing C code, simulating, synthesizing, and packaging the IP for export. The tutorial skips optimization for simplicity and assumes a Linux environment. It also uses Python for data generation and Anaconda for numeric libraries, emphasizing the importance of experimentation and adaptability across different platforms.
Takeaways
- 📚 The video demonstrates an implementation of the Fast Fourier Transform (FFT) using Vivado HLS tools.
- 🔍 It covers the steps from generating reference data to exporting the IP for use in Vivado tools, excluding optimization topics.
- 🌐 Reference websites are provided for the source code and additional tools like the Anaconda distribution for Python.
- 💻 The demonstration is conducted on a Linux system, but the principles are general and can be adapted to other systems.
- 📝 The process begins by cloning a repository containing all the source code and data for the FFT example.
- 🔢 Data for testing is generated using a Python script that creates random complex values, computes an FFT, and stores the results in different formats.
- 🛠️ The Vivado HLS project is set up with the main source code and test bench, including the generated .txt files for input and expected output.
- 🔄 The part number for the FPGA board is selected to match the hardware that will be used for implementation.
- 🔍 A C simulation is run to ensure the correctness of the implementation before proceeding to synthesis.
- 📈 Post-synthesis RTL co-simulation is performed to confirm that the synthesized design matches the C simulation results.
- 📦 Finally, the RTL is exported as a package for implementation on an FPGA, concluding the design process.
Q & A
What is the main topic of the video?
-The video is about implementing a simple design, specifically the Fast Fourier Transform (FFT), using Vivado High-Level Synthesis (HLS) tools.
What are the steps followed in the video for implementing the FFT design?
-The steps are: 1) Generate reference data for testing, 2) Write C code, 3) Simulate using Vivado HLS, 4) Synthesize and simulate post-synthesis, 5) Package the IP for export and use in Vivado tools for implementation.
Why are certain topics not covered in the video?
-The video does not cover optimization of bit widths, data representations, or design optimization for size, speed, or other metrics to focus on the basic principles of using Vivado HLS tools.
What is the source code of the demonstration available at?
-The source code is available at the website 'get-lab.com/Chandra/children/each-FPGA'.
What is the example used in the video for demonstrating the FFT using HLS?
-The example used is an FFT implementation with HLS, and the audience is encouraged to clone and modify the information from the provided website.
Which Python libraries are mentioned for data analysis and signal processing?
-The video mentions the use of numeric Python libraries, specifically the Anaconda distribution, which includes scientific Python and other libraries useful for data analysis and signal processing.
Is the video's demonstration platform-specific?
-The demonstration is done on a Linux system, but the principles involved are general and can be applied to Windows systems or different Linux distributions with appropriate modifications.
What is the purpose of the 'data_gen_FFT.py' script mentioned in the video?
-The 'data_gen_FFT.py' script generates random input data, computes an FFT, and stores the input and output data in both floating-point and 16-bit fixed-point hexadecimal formats for testing.
What is the significance of the tolerance limit set in the test bench?
-The tolerance limit is set to account for small errors that may occur due to the conversion from floating-point to fixed-point, ensuring the system does not flag an error for minor discrepancies.
What does the synthesis report provide after running C synthesis in Vivado HLS?
-The synthesis report provides information on the target clock period, estimated clock period achieved, latency, initiation interval, and resource usage of the synthesized FFT function.
What is the purpose of running RTL co-simulation after synthesis?
-The RTL co-simulation is run to verify that the synthesized design produces the same results as the C simulation and to observe internal signals and the behavior of the design at the RTL level.
How does the video conclude?
-The video concludes by demonstrating the export of the RTL as a package for implementation on an FPGA in Vivado, marking the end of the design process discussed.
Outlines
📚 Introduction to Fast Fourier Transform (FFT) with Vivado HLS
This video introduces a simple design implementation of the Fast Fourier Transform (FFT) using Vivado High-Level Synthesis (HLS) tools. The process includes generating reference data for testing, writing C code, simulating, synthesizing, post-synthesis simulation, and packaging the IP for export. The video does not cover optimization of bit widths, data representations, or design size and speed. Reference websites are provided for source code and FPGA examples, along with Python libraries for data analysis and signal processing. The demonstration is conducted on a Linux system, but the principles are generalizable. The first step is to clone the source code repository and navigate to the FFT example directory, which contains HLS source code and FPGA implementation files. A temporary building directory is created for synthesis projects. The script 'data_gen_FFT.py' generates random input data, computes FFT, and stores the results in floating-point and hex formats for testing on hardware.
🔧 Setting Up the Vivado HLS Project and Simulation
The video continues by detailing the setup of a Vivado HLS project, including naming the project, adding the main source code 'FFT.cpp', and the test bench 'FFT_TB.cpp'. It also covers the creation of '.txt' files for input and output data. The part number for the FPGA board is specified, and the test bench is explained, which reads input data, feeds it into the FFT function, and compares the output with expected results within a tolerance to account for floating-point to fixed-point conversion errors. The C simulation is run to ensure the correctness of the implementation, and the synthesis report is reviewed for achieved clock period, latency, and resource usage. The video emphasizes that no optimizations have been applied, and the results are expected to be basic. The interface signals of the module are described, including clock, reset, start, data in/out, and handshaking signals.
🔄 RTL Co-Simulation and Exporting the Design
The final part of the video script discusses running RTL co-simulation to verify the results against the C simulation. The co-simulation uses the generated VHDL or Verilog files and confirms that the design meets the expected latency and initiation interval. The waveform viewer is introduced to analyze the simulation results, highlighting the input and output data valid signals and the start/done signals for module operation. The video concludes with the export of the RTL as a package for FPGA implementation in Vivado, mentioning that various configuration options can be specified during export, but default values are used in this demonstration. The successful completion of the RTL export is indicated by a confirmation message, wrapping up the video's coverage on generating test data, C code simulation, synthesis, co-simulation, and IP export for Vivado HLS designs.
Mindmap
Keywords
💡Fast Fourier Transform (FFT)
💡Vivado HLS
💡Reference Data
💡C Simulation
💡Synthesis
💡Post-synthesis Simulation
💡Optimization
💡Bit Widths and Data Representations
💡Numeric Python Libraries
💡FPGA
💡Data Packing
Highlights
Introduction to implementing a simple design, the Fast Fourier Transform (FFT), using Vivado High-Level Synthesis (HLS) tools.
Steps for generating reference data for testing, writing C code, simulating, synthesizing, and packaging the IP for export in Vivado HLS.
Omission of topics like optimization of bit widths, data representations, and design optimization for size, speed, or other metrics.
Reference to the source code website 'get-lab.com/Chandra/Children/each-FPGA' for the demonstration of FFT using HLS.
Mention of the anaconda distribution for Python libraries useful in data analysis and signal processing.
Adaptation of scripts for other languages like MATLAB is possible, but requires modification for appropriate functionality.
The demonstration is conducted on a Linux system, but the principles are generalizable to other systems like Windows.
Creation of a clone of the website containing all the source code data for the FFT example.
Explanation of the directory structure and the purpose of the 'scripts', 'VHLS', and 'Bavaro' folders.
Process of generating test data using a Python script that leverages the numeric Python library.
Generation of input and output data files in both floating-point and 16-bit fixed-point hexadecimal formats.
Initiation of an HLS project with the main source code and test bench, including the part number selection for FPGA implementation.
Description of the test bench that reads input data, feeds it into the FFT function, and compares the output to expected results.
Use of a tolerance limit to account for small errors due to the conversion from floating-point to fixed point.
Running C simulation to ensure the correctness of the implementation before synthesis.
Achievement of the target clock period and latency estimation in the synthesis report.
Interface description of the FFT module, including signals for data streams and handshaking.
Observation of the RTL co-simulation results, confirming the same outcome as the C simulation.
Exporting the RTL as a package for implementation on an FPGA, concluding the video on Vivado HLS design testing and simulation.
Transcripts
in this video we are going to look at an
implementation of a simple design namely
the fast Fourier transform using Vivaro
high level synthesis tools the steps
that we will follow will be first
generate reference data that will be
used for testing write C code and
simulate using Vivaro hls synthesize
simulate post synthesis and then finally
package the IP for export and use in the
Vivaro tools for implementation we will
not be covering certain topics in this
video in particular optimization of bit
widths and data representations and
optimization of the design in any sense
for size speed or any other metric there
are a couple of useful reference
websites over here one of them is the
source code of this entire demonstration
all the source code used for this
demonstration is available at the
website get lab comm slash Chandra
children / each FPGA the website itself
is called demos with FPGA is the example
that we will be using is FFT using HLS
you are free to clone the information
from this website and modify it as you
see fit experimentation is the best way
that you can learn in addition to this
we will also be using the numeric Python
libraries in particular the anaconda
distribution
which you can get from anaconda calm
slash distribution this includes numeric
Python scientific Python and various
other libraries that are useful for data
analysis as well as signal processing
installation and using these tools is
beyond the scope of this video if you
are more familiar with other languages
such as MATLAB feel free to use them
instead you will have to adapt some of
the scripts to get them to work
appropriately so the idea of this demo
video is more to show you the principles
involved rather than to say that this is
specifically the only way to work
please note that this entire
demonstration is done on a Linux system
and following the instructions given
over here would be easiest if you have a
similarly setup Linux system however the
principles involved are general and even
if you have a Windows system or a
different distribution of Linux
you should not really find it too hard
to modify the instructions appropriately
we will start by creating a clone of the
website that contains all the source
code data once you have a copy of the
data you can CD into the appropriate
directory and the folder corresponding
to the FFT example there are three
subfolders over here namely scripts VH
LS which is the reward of HLS source
code and Bavaro which will be used for
the FPGA implementation later to keep
things clean we will create a temporary
building directory and store all our
synthesis projects inside that folder
the first thing we need to do is to
generate the data to be used for testing
I will first open up the script
corresponding to that it is in the
scripts folder and is called data gen
underscore FFT dot B while it's a Python
script it uses a numeric Python library
and defines a few functions that help us
to output data without going into the
details of exactly how the different
functions work all that you need to
understand at the moment is it generates
random input data starting from a known
seed for repeatability the data is
generated as a random complex values an
FFT is computed and the input as well as
the generated output are stored into two
files in floating-point format the same
data and the result are also stored in
hex format into two other files these
will be used later for the Vivaro
testing on hardware
we can run the script as follows
and you will notice that four files have
been created over here inputs a PP dot
text and output C 2 P dot text contain
the floating-point data that will be
read by the C program or Bovada HLS test
bench and impacts dot mem and out X dot
mem have the same data but now stored
after conversion to 16-bit fixed point
in hexadecimal format
the next step is to start with Auto HLS
and generate a project
we will create a new project where we
give it any name that is suitable for
example a 5032 the main source code will
be added from the VHL s folder and it's
FFT dot CPP the top function over here
is FFT next we add the test bench which
is FFT underscore TB dot CPP but in
addition to this we also need to
generate the two dot txt files InP
underscore CPP dot txt and out
underscore CPP dot txt that was just
generated by running the data underscore
gen script
the part number needs to be selected
appropriately so that we can later on
implement it on FPGA the simplest way to
do this is to directly type the part
number corresponding to the basis 3 FPGA
board over here it is XC 7 a 3 5 t c PG
and the part number is 236 corresponding
to a 236 pin package - won 4-1
SpeedGrade
once the project has been created you
can look at the source code which is the
FFT dot CPP file that we had just added
and the test bench the test bench itself
is fairly straightforward it basically
reads in the InP cpp and out cpp dot txt
files the InP CPP is used in order to
generate the input data array which we
then feed into the FFT function and the
out underscore CPP dot txt file is used
in order to generate another array
called exp underscore out which is the
expected output data so that finally we
can then go over the computer data that
is data underscore out and the expected
data which is expander score out
calculate the difference between them
the l2 norm of that and as long as that
value is less than a certain tolerance
limit we will say that it is error-free
the reason for applying this tolerance
is that since we are converting from
floating-point which was used in python
to fixed point which is being used for
roberto HLS there could in fact be some
small errors and we don't want the
system to show an error for that the
tolerance that is set over here is 0.01
you can play around with this value you
may also notice that there is a for loop
out here and the entire process is
repeated twice the reason for this will
become clearer later when we discuss the
post synthesis RTL co-simulation the
first step after having written the
source code is to run a C simulation
you can click on the button there so
that says run Z simulation and once we
select ok it goes ahead start launches a
compiler and runs the simulation it is
important that you see this message pass
this was printed out by the test bench
and the information message sees him
done with zero errors is printed out
later by wavered Rachel SC simulation
itself the test bench has been written
in such a way that if there were any
errors it would have indicated a message
fail and would also have returned a
value which is not equal to zero now
that we are comfortable that the C
simulation passes and has end is doing
the correct thing we can run C synthesis
after a short while the synthesis report
for the FFT function pops up you can see
that the target clock period that we had
set was 10 nanoseconds and the system
was actually able to achieve an
estimated 7.3 nanoseconds the latency
and initiation interval are 833 clock
cycles keep in mind that we have not
applied any optimizations so this number
is not surprising the resource usage
once again we have not applied any kinds
of optimizations so there is nothing
specific to expect out here the only
other thing of interest is the interface
itself the signals that the module has
r1 is the clock and then the reset and a
start signal which is used to start the
module operation as outputs it generates
a Dun signal it also indicates when it
is idle and when it is ready to accept
new inputs the data in and data out
signals have been declared as acts a
stream in the fft code out here and as a
result of that both of them show up as
Aksai streams which have the data a
valid signal to indicate when the input
is valid and a ready signal that is
given out by the module or here to
indicate when it is ready to accept new
data this is on the input side
conversely on the output side the module
generates a valid signal when it has
valid data generated at the output and
takes in a ready signal from the next
stage which it can then feed into the
reason why the data is showing up as 32
bits is because we have also applied a
data pack directive which takes the
complex values and packs them into
single 32-bit values combining the real
and imaginary parts together
once we are satisfied with the synthesis
results we can click on run see RTL
co-simulation this brings up a very log
simulator very log or VHDL since both of
them have been generated by the tool
itself we don't really have a specific
choice of one over the other the only
thing of interest is that we can change
the dump trace to all so that we can see
internal signals and click on OK
the RTL co-simulation takes a little
longer than the say simulation but at
the end of it you should once again see
the message pass which indicates that
exactly the same result as the C
simulation has been obtained
you'll also notice up here the
co-simulation report says that the very
log status is passed and that the
latency is 833 clock cycles and the
initiation interval is 834 this is
actually measured from the
implementation you can open the waveform
viewer in order to actually see the
results the waveform viewer shows you
all the data from the simulation in a
Vivaro environment but once again this
is only a simulation the important thing
as far as we are concerned is the design
top signals you can look at the C
outputs in particular all the signals
corresponding to data out the C inputs
all the signals corresponding to data in
and the block level handshaking signals
a few things to observe in this waveform
one of them is the fact that since we
ran the simulation twice in the test
bench you have two sections over here
where input is being read from the data
here as well as around the 8 micro
second mark what you can also during
those times the data valid is considered
to be one that is to say the data is
actually being read into the FFT module
the on the output side we have two
segments once again where output is
generated the data valid is one in short
intervals and at each time that the data
value becomes equal to one we find that
there is a corresponding value of the
output this once again happens two times
corresponding to the fact that the for
loop ran twice the start signal goes
high at the 125 nanosecond mark
and the corresponding done signal goes
high at the eight point four six five
microsecond mark
this difference in clock cycles is used
to compute the latency similarly the
time between the done signal
corresponding to the first input
sequence which is at a point four six
five microseconds and the second input
sequence which ended at sixteen point
eight zero five microseconds that
difference gives us the initiation
interval or the time between successive
samples once we are satisfied with the
waveform we can close this go back to
Vivaro hls and now we can export the RTL
as a package that can then be imported
into Vivaro
for implementation on an FPGA there are
a number of configuration options that
can be specified over here but we will
just skip all of these and accept the
default values once again we will not be
evaluating the generated RTL that just
gives us more detailed information on
timing which could be useful in certain
circumstances but for the simple example
we are going to skip this at the end you
should see the message finished export
RTL which indicates that the RTL export
completed successfully
this brings us to the end of this video
where we discussed how we can generate
the sample data that can be used for
testing a Vivaro hls design right the C
code and a test bench simulate
synthesize Co simulate and export the
corresponding IP
Voir Plus de Vidéos Connexes
C 語言入門 | 01 - 05 | 需要準備的工具
Tutorial 1- Anaconda Installation and Python Basics
How I Would Learn Data Science in 2022
Introduction to Spyder - Part 2
Por Que Criar Gráficos via Programação em Python se Podemos Usar Power BI, Tableau ou Looker Studio?
VHDL code for 4 bit ALU and Realization on FPGA development Board
5.0 / 5 (0 votes)