Image Processing on Zynq (FPGAs) : Part 1 Introduction

Vipin Kizheppatt
29 Mar 202019:39

Summary

TLDRThis tutorial series introduces image processing on FPGAs and Zynq chips, focusing on grayscale and color images as two-dimensional arrays. It differentiates between point processing, where output pixels depend solely on corresponding input pixels, and neighborhood processing, which considers neighboring pixels for output values. The architecture involves streaming images from DDR memory to an IP core via a DMA controller. The tutorial also covers convolution operations using kernels and the challenges of processing image edges. It discusses system design considerations, including buffering strategies and the use of line buffers and multiplexers for efficient processing. The series will progress to coding and implementation on FPGAs.

Takeaways

  • 📚 The tutorial series will cover image processing on FPGAs and Zynq chips, with a focus on practical application rather than theoretical details.
  • 🖼️ Images are described as two-dimensional arrays, with grayscale images being 2D and color images typically 3D arrays, where each element represents a pixel.
  • 🔄 Image processing is divided into point processing and neighborhood processing, with point processing involving transformations that depend solely on the value of a single pixel.
  • 🔀 In point processing, operations like image inversion can be performed by simple transformations such as 255 minus the input pixel value.
  • 🌐 For system design in image processing, images are initially stored in external DDR memory and then streamed to the processing IP via a DMA controller.
  • 🔄 Neighborhood processing involves considering the pixel's neighbors for operations, which can include different types of neighbors and convolution with a kernel.
  • 🛠️ Hardware implementation for neighborhood processing requires buffering parts of the image within the FPGA due to non-consecutive pixel processing needs.
  • 💾 Line buffers are used to store parts of the image within the FPGA, and multiplexers are employed to intelligently reuse lines for efficient processing.
  • 🔗 The tutorial discusses the importance of parallelizing data transmission and processing to improve system performance, such as adding extra line buffers.
  • 🛡️ Interrupt-based processing is mentioned as a method to handle data streaming between the IP and the PS, with an interrupt service routine to manage data flow efficiently.

Q & A

  • What is the main focus of the tutorial series mentioned in the transcript?

    -The tutorial series focuses on image processing on FPGAs and specifically on Zynq chips.

  • Which textbook is recommended for detailed information on image processing?

    -For detailed information on image processing, the transcript recommends 'Digital Image Processing' by Gonzalez.

  • How are images represented in terms of data structure?

    -Images are represented as two-dimensional arrays or matrices. Grayscale images are typically one-dimensional arrays, while color images are three-dimensional arrays with RGB values for each pixel.

  • What are the two broad categories of image processing mentioned in the transcript?

    -The two broad categories of image processing mentioned are point processing and neighborhood processing.

  • How does point processing differ from neighborhood processing?

    -In point processing, the value of a pixel in the output image depends only on the corresponding pixel in the input image. In contrast, in neighborhood processing, the value of a pixel in the output image depends on the pixel and its neighboring pixels in the input image.

  • What is an example of a point processing operation discussed in the transcript?

    -An example of a point processing operation is image inversion, where the pixel value in the output image is calculated as 255 minus the pixel value in the input image.

  • How is the image data transferred from external DDR memory to the image processing IP in the described architecture?

    -The image data is transferred from external DDR memory to the image processing IP using a DMA controller, which is configured by a driver running on the processor.

  • What is the role of line buffers in the neighborhood processing architecture?

    -Line buffers in the neighborhood processing architecture are used to store parts of the image necessary for processing. They allow for the buffering of image lines to facilitate the convolution operation with a kernel.

  • Why is pure streaming architecture not suitable for neighborhood processing?

    -Pure streaming architecture is not suitable for neighborhood processing because the pixels being processed are not consecutive. The convolution operation requires specific groups of pixels that may not be in sequential order.

  • What is the purpose of multiplexers in the line buffer architecture?

    -Multiplexers in the line buffer architecture are used to intelligently select and reuse the correct line buffers for the convolution operation, ensuring that the same line is not unnecessarily fetched multiple times from external memory.

  • How does adding a fourth line buffer improve the system performance in the architecture?

    -Adding a fourth line buffer allows data transmission and processing to happen in parallel, reducing idle time and improving overall system performance by enabling continuous data flow and processing without waiting for one operation to complete before starting another.

  • What is the significance of the interrupt-based processing mentioned in the transcript?

    -Interrupt-based processing is significant as it allows for efficient communication between the IP and the processor, enabling the processor to send the next line of data to the IP as soon as a convolution operation is completed, thus maintaining a smooth and continuous processing workflow.

Outlines

00:00

🖼️ Introduction to Image Processing

This paragraph introduces a tutorial series on image processing, specifically focusing on FPGAs and Zynq chips. It suggests referring to a standard textbook, such as 'Digital Image Processing' by Gonzalez, for detailed information. The tutorial is divided into multiple parts, starting with an introduction to the theory of image processing. Images are described as two-dimensional arrays, with grayscale images represented by single-byte intensity values and color images by three RGB values. The paragraph categorizes image processing into point processing, where output pixel values depend solely on the corresponding input pixel, and neighborhood processing, which will be detailed in subsequent parts. The architecture for designing a pixel processing system is briefly mentioned, involving interfacing with a DMA controller and processing images stored in external DDR memory.

05:00

🔍 Neighborhood Processing in Image Processing

Paragraph 2 delves into neighborhood processing, a technique where the value of a pixel in the output image depends on its neighbors in the input image. It explains that different types of neighbors can be considered, from immediate to more distant ones, and that this typically involves a 2D convolution with a smaller matrix known as a kernel. The process involves multiplying values in the kernel with corresponding pixels in the image and accumulating the results to form the output image. The paragraph also addresses the challenge of processing edge pixels, which may not have the full complement of neighbors, and suggests solutions such as adding dummy rows or reducing the output image's resolution. The hardware implementation of neighborhood processing is also touched upon, noting that it cannot use a pure streaming architecture due to the non-consecutive nature of pixel processing.

10:01

💾 Buffering and Processing in Hardware Design

Paragraph 3 discusses the practicalities of buffering and processing images in hardware design for neighborhood processing. It explains that buffering the entire image is impractical due to memory constraints in FPGAs, so only parts of the image necessary for processing are buffered. The concept of line buffers is introduced, which are small memories used to store one line of an image. The paragraph describes how line buffers and multiplexers are used to efficiently reuse the same line buffer multiple times during convolution operations. It also mentions the importance of the order of lines in convolution and how multiplexers are used to select the correct lines. The idea of adding a fourth line buffer to improve system performance by allowing data transmission and processing to occur in parallel is introduced.

15:03

🛠️ System Architecture and Interrupt-Based Processing

The final paragraph outlines the system architecture for image processing, emphasizing the use of interrupt-based processing to improve efficiency. It describes a process where images are initially stored in DDR memory, and the DMA controller is configured to stream data to line buffers within the FPGA. Convolution operations are performed, and upon completion of each row, an interrupt is sent to the processor. An interrupt service routine then sends the next line of data to continue the convolution. This process repeats until the entire image is processed. The paragraph also mentions that after convolution, the processed data is streamed back to the DMA controller and then returned to DDR memory. The architecture includes line buffers, multiplexers, and an interrupt mechanism to ensure that data transmission and processing occur efficiently and in parallel.

Mindmap

Keywords

💡Image Processing

Image processing refers to the application of various algorithms to perform operations on images, such as filtering, enhancement, and transformation. In the context of the video, image processing is the main theme, focusing on how to implement these operations on FPGAs and Zynq chips. The script mentions that images are treated as two-dimensional arrays, with operations like pixel value transformations being central to the discussion.

💡Grayscale Image

A grayscale image is an image composed of shades of gray, which can be represented as a two-dimensional array where each pixel has a single intensity value. In the video script, grayscale images are used to illustrate the concept of point processing, where the pixel value in the output image is determined solely by the value of the corresponding pixel in the input image.

💡Color Image

A color image is represented by a three-dimensional array, where each pixel has three values corresponding to the Red, Green, and Blue (RGB) color channels. The script uses color images to contrast with grayscale images, highlighting that color images require more data per pixel and different processing techniques.

💡Point Processing

Point processing is a type of image processing where the output pixel value is determined by a specific operation on the corresponding input pixel value. The script explains that in point processing, the transformation is independent of the surrounding pixels, using the example of image inversion where the output pixel is calculated as 255 minus the input pixel value.

💡Neighborhood Processing

Neighborhood processing involves operations where the output pixel value depends on the values of neighboring pixels in the input image. This is in contrast to point processing and is used for operations like edge detection and blurring. The script describes neighborhood processing by explaining how the value of a pixel in the output image is influenced by the pixel and its immediate neighbors.

💡Pixel

A pixel, short for 'picture element,' is the smallest addressable element in an image represented on a screen. The script discusses pixels in the context of both grayscale and color images, explaining that each pixel in a grayscale image has one byte of data, while in a color image, each pixel has three bytes representing the RGB values.

💡DMA Controller

A Direct Memory Access (DMA) controller is a hardware device that allows certain hardware subsystems to access main system memory, independent of the central processing unit (CPU). In the script, the DMA controller is integral to the system architecture for image processing, facilitating the transfer of image data between external memory and the image processing IP.

💡FPGA

A Field-Programmable Gate Array (FPGA) is an integrated circuit designed to be configured by the customer or designer after manufacturing. The script mentions FPGAs as the hardware platform on which image processing algorithms are implemented, highlighting their role in performing image processing tasks efficiently.

💡Zynq Chip

The Zynq chip is a type of System on Chip (SoC) that combines a processing system with programmable logic, allowing for both processing and reconfigurable hardware capabilities. The script refers to the Zynq chip as a specific platform where image processing on FPGAs is being discussed.

💡2D Convolution

2D Convolution is a mathematical operation used in image processing to apply a filter to an image. It involves multiplying the values of the image pixels with a smaller matrix called a kernel and summing the results to produce a new image. The script describes the process of 2D convolution in the context of neighborhood processing, explaining how it is used to compute the output pixel values based on the input pixel and its neighbors.

💡Line Buffer

A line buffer is a small memory used to store one line of an image within an FPGA for processing. The script discusses the use of line buffers in the hardware architecture for image processing, explaining how they are used to buffer parts of the image necessary for convolution operations, thus improving the efficiency of the processing.

Highlights

Introduction to image processing on FPGAs and Zynq chips.

Reference to the textbook 'Digital Image Processing by Gonzales' for detailed image processing techniques.

Images are two-dimensional arrays, with grayscale images represented by one byte per pixel and color images by three values (RGB).

Classification of image processing into point processing and neighborhood processing.

Point processing involves transformations that depend solely on the value of a single input pixel.

Examples of point processing include image inversion and gray level mapping.

General architecture for designing a system for pixel processing, interfacing with a DMA controller.

Image data is streamed from external DDR memory to the image processing IP via a DMA controller.

The IP processes pixels based on data width, which can be adjusted for performance.

After processing, the image is streamed back to the DMA controller and then to DDR memory.

Neighborhood processing considers the value of neighboring pixels in addition to the pixel of interest.

2D convolution is used in neighborhood processing with a smaller matrix called the kernel.

Different strategies for handling edge pixels in neighborhood processing, such as padding or reducing output resolution.

Hardware implementation challenges for neighborhood processing due to non-consecutive pixel processing.

Use of line buffers and multiplexers to efficiently reuse image lines for convolution.

System architecture improvements by adding a fourth line buffer to allow parallel data transmission and processing.

Interrupt-based processing to coordinate data streaming and convolution operations.

Overview of the complete system architecture, including DDR memory, DMA controller, IP, and interrupt service routine.

Transcripts

play00:00

hello on in a next set of tutorial we

play00:02

will be finding out how to do image

play00:04

processing on a PGA's and on Zink chip

play00:09

in particular so I'm not going into

play00:12

details of image processing for that you

play00:14

may refer to in a standard textbook

play00:17

maybe digital image processing by

play00:19

gonzales which is a good reference book

play00:23

and this is going to be a quietly long

play00:27

tutorial so i am breaking it into many

play00:30

different parts so this is the

play00:33

introduction to the image processing

play00:36

basically the theory part now as you

play00:39

know images they are nothing but

play00:42

two-dimensional arrays they are like

play00:45

matrix of two dimension if they are

play00:48

grayscale if they are color images

play00:51

usually we will say they are three

play00:53

dimension so in this two dimension each

play00:58

each box here it represents a pixel if

play01:02

it is a grayscale you usually have one

play01:04

byte that are representing the intensity

play01:08

of that pixel if it's a color image each

play01:11

box will have three values representing

play01:13

the RGB values of that pixel so broadly

play01:18

we are classifying image processing into

play01:20

two one is called the point processing

play01:23

other one is called the neighborhood

play01:25

processing in point processing you have

play01:28

an input image and you do some operation

play01:31

on that image and you get the output

play01:34

image now the particular pixel value in

play01:39

the output image depends on the

play01:43

particular pixel value on the input

play01:46

image so you are basically doing some

play01:48

transformation operation on a particular

play01:52

input pixel and you get the

play01:54

corresponding or per pixel so just

play01:56

remember the value of the output pixel

play01:59

depends only upon the value of the input

play02:03

pixel at that corresponding position and

play02:06

what our transformation operation you

play02:08

are doing for example we have the

play02:12

invasion of

play02:13

so we have already done this in detail

play02:16

how this is done

play02:18

so when you are doing the inversion

play02:20

operation the pixel value on the output

play02:24

image will be 255 minus the pixel value

play02:28

in the input image so there are a lot of

play02:31

other point processing also for example

play02:36

you have something called gray level

play02:38

mapping here what you do is you take an

play02:41

input pixel value and you add a bias to

play02:45

that pixel value now depending upon

play02:47

whether the bias is a positive number or

play02:49

negative number you will get a picture

play02:51

with increased brightness or you may get

play02:55

a pixel picture with decrease badness so

play02:59

these are what we call as spawn

play03:01

processing now this is the general

play03:04

architecture we will follow when we

play03:06

design a system for pixel processing and

play03:10

we have already seen this architecture

play03:13

we actually design this IP for doing

play03:15

image processing inversion operation

play03:18

basically and we interface with DMA

play03:21

controller so in this architecture

play03:23

initially the image will be stored in

play03:25

the external DDR memory how you get the

play03:29

image to that external diría depends

play03:31

upon what kind of interface you have you

play03:34

may use Ethernet USB PCI Express

play03:36

high-speed interfaces or if you have the

play03:40

low speed interface like you what you

play03:41

can use that also which is we basically

play03:44

used in our previous tutorial now from

play03:47

that DDR the image is stream the pixels

play03:52

are stream to the image processing IP

play03:54

with the help of the DMA controller so

play03:57

basically we will have a driver here

play03:59

running on the processor who will

play04:01

configure the DMA controller here and it

play04:04

will basically send the image data from

play04:07

external ddr2 30 mi controller and from

play04:12

the it is stream to your IP now your IP

play04:15

will be processing the pixels depending

play04:19

upon what is the data weight for example

play04:21

in our tutorial that whatever it was 32

play04:24

that means you can process 4 pixels in

play04:27

and you will process it but not have it

play04:30

you can make it 8 16 32 64 whatever

play04:33

width you choose which may improve the

play04:36

performance depends and you process it

play04:39

using your IP after you process it you

play04:42

will stream it back to the DMA

play04:44

controller and the DMA controller it

play04:46

will send that process picture back to

play04:49

the DDR using the AXI 4 interface and

play04:52

finally that tender processed image is

play04:54

in the media you can send it to the

play04:57

external world through your interfaces

play05:00

or to a display controller do you return

play05:02

a monitor so in our previous tutorial we

play05:05

used a UART interface to send it back to

play05:07

the computer and we or the computer now

play05:10

the other kind of popular image

play05:12

processing technique is called the

play05:14

neighborhood processing now in

play05:16

neighborhood processing what happens the

play05:18

pixel value in your output image not

play05:21

only depends upon the pixel in the

play05:24

corresponding position in the input

play05:26

image but also on the neighbors of that

play05:29

pixel for example if I am processing

play05:32

this pixel the value of the transponding

play05:35

pixel in the output depends on this

play05:38

pixel as well as his neighbors now

play05:41

neighbors we can have different kind of

play05:43

neighbors so there are eight immediately

play05:47

as you can see there are eight of them

play05:49

but in many cases we have more neighbors

play05:52

we can take the pixels which are further

play05:57

away from this pixel also so here we

play05:59

have eight and these are also neighbors

play06:02

they are not immediate neighbors so

play06:03

depends so basically what you are doing

play06:07

is you will take your input image and

play06:09

you will be doing a 2d convolution with

play06:13

a smaller matrix called the kernel so

play06:16

this is the Maps representation this is

play06:19

basically multiplication operation this

play06:22

is your image and this is so-called

play06:27

caramel you are multiplying them

play06:29

together and you are adding them

play06:32

together to make things clearer let me

play06:34

show this feature so this is your input

play06:37

image which is a 2d matrix

play06:40

you will have a smaller matrix called a

play06:43

mask or a kernel whose size can be three

play06:47

by three five by five three by five

play06:49

different size depending upon what

play06:52

operation you are doing what you will do

play06:54

is you will multiply the values in the

play06:58

mask with the corresponding pixel in the

play07:01

image so here it is a three by five so

play07:07

you multiply these these these this okay

play07:12

and after that you add them together so

play07:15

basically it's the Mac operation

play07:16

multiply and accumulator operation and

play07:19

that multiplied and accumulated result

play07:23

will be the corresponding pixel in the

play07:26

output image okay so that's what we are

play07:29

doing here now again I am showing

play07:31

pictorially assume this is you a picture

play07:33

this is the mask Here I am taking a

play07:36

three by three so there will be some

play07:38

values here in practical case so you

play07:41

place the mask on the picture and

play07:44

multiply each cut off funding values add

play07:47

them and you are actually getting the

play07:50

output pixel value for this bit so after

play07:54

that you move the mask by one pixel

play07:56

repeat the operation and you get the

play07:59

value for this pixel in the output image

play08:01

and you keep on doing it you keep on

play08:03

moving the mask now when you are doing

play08:06

this you will notice if you want to

play08:09

calculate the values of these pixel on

play08:11

the edges of the image you do not have

play08:15

eight neighbors for example the corner

play08:17

once they have only three neighbors here

play08:20

and the other ones on the edges they

play08:22

have only fine APIs now how to solve

play08:25

this issue again you can write in both

play08:29

there are different ways of that you may

play08:32

assume a dummy row on the top on the

play08:37

left right and bottom edges also and

play08:40

this dummy rows and column they may have

play08:44

value 0 or you may replicate the same

play08:47

values of the edges here also and you

play08:50

can do the calculation here that is one

play08:52

way of doing it

play08:53

another way is your awkward picture it

play08:57

will not have values corresponding to

play08:58

the pixels on the edges so in this case

play09:01

the resolution of the output picture

play09:04

picture will be slightly less than the

play09:07

resolution of the input picture for

play09:09

example if you have 512 by 512 input

play09:11

image the output will be 510 by 5/10

play09:14

because we are neglecting this row this

play09:16

row and these two corners so draw eyes

play09:21

there'll be a reduction of 2 pixel

play09:23

column wise there will be a reduction of

play09:24

two fits and so you will have 5 10 by 5

play09:27

okay so that's how we usually solve the

play09:30

issue of the pixels on the edge now once

play09:33

you finish this convolution for this

play09:37

particular row you will move the mask to

play09:39

the next row again you are not shifting

play09:43

by 3 pixels although it is a 3 by 3 you

play09:46

are shifting only by one pixel so keep

play09:49

that in mind and you repeat it until you

play09:52

finish convolution of the enter picture

play09:57

now when you design hardware for doing

play10:01

this you cannot use pure streaming

play10:03

architecture here why because in this

play10:05

particular case the pixels you are

play10:07

processing they are not consecutive for

play10:10

example here these 3 pixels are

play10:12

consecutive but these 3 pixels they are

play10:15

not consecutive with these 3 pixels so

play10:17

when I am streaming data from my

play10:20

external media I'd be always dreaming

play10:22

like starting from here I will first

play10:25

stream this entire line then I scream

play10:27

this entire line so and and so forth so

play10:29

pure streaming we cannot use because I

play10:32

need only these 3 pixel these 3 is the

play10:34

remaining I don't want then I want these

play10:36

3 then I want these 3 so on and so forth

play10:39

so pure streaming based implementation

play10:41

is not possible so you will have to

play10:44

buffer the image inside your IP before

play10:48

you start processing now it won't be

play10:51

practical to buffer the entire image

play10:53

inside your IP because depending upon

play10:56

the image size how much memory you need

play10:58

to buffer it varies again fight will by

play11:02

fight all grayscale image you need

play11:03

around 268 of kilobytes

play11:06

and if you remember inside FPGA we have

play11:09

something called the block Rams does

play11:11

more memory blocks which are like 36

play11:16

khilafat in science it is HPG and we

play11:18

have look up tables and flip-flops so

play11:20

you can make a buffers or small memories

play11:23

using them but the number of B Rams are

play11:27

quite limited if you check the

play11:29

particular FPGA we are using there will

play11:32

be a few hundred theorems usually and if

play11:35

you use look-up tables and for us to

play11:38

make this memory which we call as the

play11:39

distributor it is going to use a

play11:42

lot of CL base to do it so you won't

play11:44

have much CL piece left for implementing

play11:48

the remaining logic so practically we

play11:51

will never buffer and enter picture

play11:55

inside the FPGA unless the picture is

play11:58

really really small otherwise we won't

play12:00

buffer the entire picture what we will

play12:03

do is we will buffer only a part of the

play12:06

image inside our IP which is necessary

play12:09

for doing the processing so for example

play12:12

if you are using this 3 by 3 kernel for

play12:14

processing if I if I buffer 3 lines of

play12:19

my image inside the FPGA but it is

play12:22

enough for me to start processing site I

play12:26

buffer these 3 lines then I use the

play12:29

kernel to do convolution and I can keep

play12:33

on moving the kernel until I finish

play12:35

these 3 lines after that I will need

play12:39

this fourth line which I can send from

play12:42

the external media to my IP so what we

play12:46

usually have is something called line

play12:49

buffers so line buffers are again small

play12:52

memories they are like ramps which are

play12:56

used for storing one line of the image

play12:59

inside the FPGA now as I mentioned

play13:01

before for this fighter by 512 3x3

play13:04

kernel

play13:05

I need to buffer only 3 lines so

play13:08

basically I need only 3 line buffer and

play13:10

for 3 line buffer one line per 5 ml p

play13:13

512 so I need only 1536

play13:18

bytes

play13:19

report one-and-a-half kilobytes now what

play13:25

is the size of a line buffer depends

play13:27

upon what is the width of the image

play13:29

basically the resolution basically the

play13:32

width actually so as I mentioned the

play13:35

from DDR will send three lines of data

play13:37

to my IP I'll process those three lines

play13:40

then I will send the next slide then I

play13:43

will process so on and so forth now now

play13:48

remember for convolution we might have

play13:51

to reuse the same line buffer multiple

play13:53

times what does it mean so in this case

play13:55

when I am doing convolution I am using

play13:57

these three lines next time I am going

play14:00

to use these three lines so this line I

play14:03

am using twice here for convolution as

play14:08

well as here for convolution and this

play14:10

third line if you see I need to use here

play14:13

first round of convolution then I need

play14:16

to use it again then after this I will

play14:19

move my kernels down then again so same

play14:23

line I need to use thrice okay so for

play14:27

for better performance it won't make

play14:29

sense to send the same line information

play14:32

again and again for a external media to

play14:35

our IP so using some intelligent

play14:39

multiplexing scheme you can send one

play14:41

line only once and somehow reuse it so

play14:46

this is the architecture we'll be using

play14:47

so this is the image which is in that

play14:50

idea initially and these are our line

play14:53

buffers so we have three of them and

play14:55

these are three multiplexers so you will

play14:57

see each multiplexer it is getting data

play15:00

from all three line buffers and using

play15:03

some control signal we can choose which

play15:05

line buffer is chosen as the output of

play15:08

the multiplexer now this line 1 line 2

play15:10

line 3 they represent the first line

play15:12

second line and third line use for 2d

play15:15

convolution okay so they are fixed this

play15:17

is always line 1 this is line to this

play15:19

line 3 now which line buffer will be

play15:23

used as line 1 which is used as line 2

play15:26

which is used as line 3 will be decided

play15:29

by these three multiplexers so initially

play15:31

first three lines of my image

play15:33

just stored in these three lines were

play15:36

straightforward so there is one-to-one

play15:37

mapping and I will do the convolution

play15:40

operation so once I finish the first

play15:43

convolution the nth row I no longer need

play15:47

the first line buffer so I can replace

play15:50

the content of this line buffer with the

play15:52

fourth line okay so what I will do is I

play15:56

will send the fourth line and store it

play15:59

in the first line buffer now I will

play16:01

adjust my multiplexer in such a way that

play16:04

now this second line buffer will be

play16:07

chosen as line one the third one as line

play16:10

two and the first one as line three and

play16:13

I can do the convolution now sometimes

play16:16

it really doesn't matter what is the

play16:18

first second third lines and in some

play16:21

cases it really matters the order so

play16:24

that's why we keep the multiplexer in

play16:26

such a way that it always properly

play16:29

chooses the correct lines for the

play16:32

convolution operation now system

play16:36

architecture what we will first do is

play16:39

for improving the overall system

play16:42

performance we'll add a fourth line

play16:45

before why we are doing it for example

play16:47

here I am sending first three lines of

play16:51

data okay now I cannot send any more

play16:54

data because I don't have any more free

play16:56

line buffer so I'll send first three

play16:58

lines then I will wait for the

play17:00

convolution to be over then I will send

play17:04

the fourth line here and while I am

play17:06

sending the fourth line the convolution

play17:09

operation can't happen because there is

play17:11

no data for doing convolution so this is

play17:14

more like software operation you are

play17:16

sending data than you are doing

play17:17

convolution more like sequential but

play17:20

harder of course we need to paralyze

play17:22

things to make it faster so what we can

play17:25

do is we can add a fourth line buffer I

play17:28

haven't shown the picture here and what

play17:30

we will do is first we will send these

play17:32

three line buffer data and we will start

play17:35

the convolution and while the

play17:37

convolution is in progress we will send

play17:40

the fourth line to the fourth line

play17:42

buffer and once this convolution is over

play17:45

we will use

play17:47

these dis and the fourth line buffer for

play17:50

next convolution and parallely I will

play17:53

send the fifth line of data to this

play17:55

first line before so here data

play17:57

transmission and data processing they

play17:59

happen in parallel okay which will

play18:02

improve our overall system performance

play18:04

so that's what we'll do first

play18:07

then finally when we connect everything

play18:09

together it more or less looks exactly

play18:12

same as our previous architecture but in

play18:15

this particular case we are going to use

play18:17

interrupt based processing so we will

play18:19

have an interrupt line coming from our

play18:21

IP to the PS and we will also have the

play18:24

corresponding interrupt service routine

play18:26

so what will happen is as usual we will

play18:28

store the picture initially in the DDR

play18:30

then we will you configure the DMA

play18:33

controller to stream the first four

play18:36

lines to the IP to the four line buffers

play18:38

and he'll be doing convolution operation

play18:44

here and once he finish one row of

play18:48

convolution he will send an intra to the

play18:51

processor and as soon as we get the

play18:53

interrupt interrupt service routine it

play18:56

will send the next line of data to the

play18:58

IP and this operation keeps on happening

play19:02

until the end image is processed so once

play19:06

one line of convolution is over your IP

play19:10

will be streaming it to that area

play19:12

through the DMA controller ok so which

play19:15

is same as before after convolution we

play19:17

can stream the data that will come to

play19:19

the DMA controller which will send it

play19:21

back to that area through the stream

play19:23

interface here and through the xe4

play19:25

interface here so that architecture

play19:28

remains same

play19:30

ok so in the next tutorial we will

play19:33

actually start coding and we will see

play19:36

how to implement this Africa

Rate This

5.0 / 5 (0 votes)

الوسوم ذات الصلة
Image ProcessingFPGAZink ChipDigital SignalPixel ManipulationHardware DesignDMA ControllerConvolution OperationBuffering TechniquesParallel Processing
هل تحتاج إلى تلخيص باللغة الإنجليزية؟