But what is a convolution?

3Blue1Brown
18 Nov 202223:01

Summary

TLDRThis script explores the concept of convolution, a mathematical operation used in various fields like image processing and probability theory. It explains how convolution combines two lists or functions, such as multiplying polynomials or analyzing dice probabilities, to yield new insights. The video also delves into the practical application of convolution in blurring images and detecting edges. Furthermore, it introduces a faster algorithm for computing convolutions using the Fast Fourier Transform (FFT), which is more efficient than traditional methods, especially for large datasets.

Takeaways

  • 🔢 Convolution is a mathematical operation that combines two lists of numbers or functions to produce a new list or function. It's different from simple addition or multiplication.
  • 🎲 The concept is introduced through the example of rolling dice to illustrate how the sums can be visualized and calculated using convolution.
  • 📊 Convolution is widely used in various fields such as image processing, probability theory, and solving differential equations, including in the context of polynomial multiplication.
  • 💡 The script explains that convolution can be visualized by aligning one list with a flipped version of the other and calculating the sum of products at different offsets.
  • 🎛️ In probability, convolution helps calculate the probability of various outcomes when dealing with non-uniform distributions, like weighted dice.
  • 🖼️ In image processing, convolution is used for effects like blurring, edge detection, and sharpening by applying different 'kernels' or grids of values over the image.
  • 📉 The script discusses how a moving average can be seen as a type of convolution, smoothing out data by averaging it within a sliding window.
  • 🔄 The output length of a convolution is typically longer than the original lists, but in computer science, it can be truncated to match the original size.
  • 🌀 The Fast Fourier Transform (FFT) algorithm is highlighted as a way to compute convolutions much more efficiently (O(n log n) instead of O(n²)) by leveraging the properties of polynomial multiplication.
  • 🧮 An interesting connection is made to the idea that standard integer multiplication can also be viewed as a form of convolution, hinting at the possibility of more efficient multiplication algorithms for very large numbers.

Q & A

  • What are the three basic ways to combine two lists of numbers?

    -The three basic ways to combine two lists of numbers are by adding them term by term, multiplying them term by term, and using a convolution, which is less commonly discussed but very fundamental.

  • How is convolution different from addition and multiplication of lists?

    -Convolution is different from addition and multiplication of lists because it involves flipping one of the lists, aligning them at various offset values, taking pairwise products, and then summing them up, which creates a genuinely new sequence rather than just combining existing values.

  • In what contexts are convolutions commonly used?

    -Convolution is ubiquitous in image processing, a core construct in probability theory, frequently used in solving differential equations, and is also seen in polynomial multiplication.

  • How does the concept of convolution relate to rolling dice and calculating probabilities?

    -Convolution relates to rolling dice and calculating probabilities by visualizing the outcomes on a grid where diagonals represent constant sums, allowing for easy counting of combinations that result in a given sum.

  • What is the significance of the normal distribution in the context of probability as explained in the script?

    -The normal distribution is significant in probability because it naturally aligns with the process of convolution, which is a fundamental way to calculate probabilities when dealing with non-uniform probabilities across different outcomes.

  • How does the script explain the concept of convolution in the context of image processing?

    -The script explains convolution in image processing by using the example of blurring an image. It demonstrates how a grid of values (kernel) is used to multiply and sum pixel values, creating a smoothed or blurred version of the original image.

  • What is a kernel in the context of image processing?

    -In image processing, a kernel is a small grid of values that is used to apply effects to an image through convolution. Different kernels can be used for various effects like blurring, edge detection, or sharpening.

  • How does the script connect the concept of convolution to polynomial multiplication?

    -The script connects convolution to polynomial multiplication by showing that the process of creating a multiplication table for polynomials and summing along diagonals is equivalent to a convolution operation, which helps in expanding and combining polynomial terms efficiently.

  • What is the fast Fourier transform (FFT) and how does it relate to computing convolutions?

    -The fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform and its inverse very efficiently. It relates to computing convolutions by allowing the conversion of convolution operations into multiplications in the frequency domain, which can be computed much faster, especially for large lists.

  • How does the script suggest that the concept of convolution can be used to multiply large integers more efficiently?

    -The script suggests that by treating large integers as coefficients of polynomials, sampling them at roots of unity, and using the FFT algorithm, it's possible to multiply the integers more efficiently with an algorithm that requires O(n log n) operations instead of the traditional O(n^2).

Outlines

00:00

🔢 Combining Lists and Functions: An Introduction to Convolution

The paragraph introduces the concept of combining two lists or functions in various ways, such as addition and multiplication, and then delves into a less commonly discussed method known as convolution. Convolution is a fundamental operation in many fields, including image processing, probability theory, and differential equations. The paragraph uses the example of multiplying polynomials to illustrate convolution and sets the stage for a deeper exploration of the topic in the video. It also hints at the connection between convolution and normal distributions in probability without going into detail, promising a more comprehensive explanation in the video.

05:05

🎲 Convolving Dice Rolls: Understanding Convolution Through Probability

This paragraph uses the example of rolling dice to explain the concept of convolution in a probabilistic context. It describes how the sum of outcomes from two dice can be visualized and calculated through a grid arrangement, where diagonals represent constant sums. The concept is then extended to non-uniform dice, where the probabilities of each face are different, requiring a multiplication of probabilities for each pair of outcomes that sum to a particular value. The paragraph introduces the formal definition of convolution as a process of flipping one list, aligning it at different offsets, multiplying pairwise products, and summing them to get a new sequence of values.

10:06

📊 Convolution in Image Processing: Blurring and Edge Detection

The paragraph explores the application of convolution in image processing, specifically in blurring and edge detection. It describes how a grid of values, or kernel, is used to multiply and sum pixel values in an image to achieve a blurring effect. The paragraph contrasts different types of kernels, from uniform grids that produce a basic blur to Gaussian distributions that simulate an out-of-focus effect. It also demonstrates how altering the kernel can be used for edge detection by highlighting variations in pixel values. The concept of kernels and their impact on image processing is central to this section, showcasing the versatility of convolution in visual effects.

15:08

🧮 Fast Convolution Algorithms: The Power of FFT

This paragraph discusses the computational efficiency of convolution operations, particularly in the context of large data sets. It contrasts the direct computation of convolution, which is O(n^2), with the use of the Fast Fourier Transform (FFT) algorithm, which reduces the complexity to O(n log n). The paragraph explains how the FFT algorithm leverages the redundancy in the system of equations generated by evaluating polynomials at roots of unity, thus enabling a much faster computation. It also touches on the broader implications of this algorithm for various applications, including probability distributions and image processing, and suggests that the concept of convolutions appearing in different areas of math can lead to powerful and efficient algorithms.

20:11

🔍 Convolution and Polynomial Multiplication: A Deeper Dive

The final paragraph ties together the concept of convolution with polynomial multiplication, explaining that the process of expanding and combining like terms in polynomial multiplication is equivalent to convolution. It introduces the idea that instead of directly computing the convolution of two lists, one can sample the polynomials at specific points, multiply the samples, and then solve a system of equations to recover the coefficients. The paragraph highlights the potential for a more efficient algorithm for polynomial multiplication using this approach, especially when the numbers involved are very large. It concludes by suggesting that the connection between convolution and multiplication in various mathematical contexts can lead to faster algorithms and deeper insights.

Mindmap

Keywords

💡Convolution

Convolution is a mathematical operation that combines two functions or lists of numbers to produce a third function or list that represents the amount of overlap between the two as one is shifted over the other. In the video, convolution is central to understanding various applications such as image processing, probability distributions, and solving differential equations. The script uses the example of rolling dice to illustrate how convolution can be used to calculate probabilities of different sums, highlighting its fundamental role in combining and analyzing data sets.

💡Probability Distributions

A probability distribution is a statistical description of the likelihood of a random variable taking on each of a set of possible values. In the video, the concept is used to explain how convolution can be applied to calculate the likelihood of different outcomes when rolling dice with non-uniform probabilities. This demonstrates how convolution can be used to model and predict outcomes in probabilistic scenarios.

💡Image Processing

Image processing refers to any form of signal processing for which the input is an image, such as a photograph or video frame, and the output need not be an image. The video discusses how convolution is used in image processing to perform operations like blurring and edge detection. The script describes how applying a convolution with a specific kernel can blur an image, while choosing a different kernel can detect edges, showcasing convolution's versatility in manipulating visual data.

💡Polynomials

A polynomial is an expression consisting of variables and coefficients, involving only the operations of addition, subtraction, multiplication, and non-negative integer exponentiation. The video script uses polynomials to explain the connection between polynomial multiplication and convolution. It illustrates that multiplying two polynomials can be viewed as a convolution operation on their coefficients, which is a key insight that leads to more efficient computational methods.

💡Kernel

In the context of the video, a kernel is a small matrix used in image processing to perform various operations like blurring or edge detection through convolution. The script explains how different kernels can be used to achieve different effects on an image, such as a kernel with equal values for averaging (blurring) or a Gaussian kernel for a more realistic blur that mimics out-of-focus photography.

💡Fourier Transform

The Fourier Transform is a mathematical technique that decomposes a function into a set of frequencies. In the video, the script introduces the concept of the discrete Fourier transform (DFT) as part of the explanation for the fast Fourier transform (FFT) algorithm, which is used to compute convolutions more efficiently. The Fourier Transform is key to understanding how the FFT algorithm can reduce the computational complexity of convolutions from O(n^2) to O(n log n).

💡Fast Fourier Transform (FFT)

The Fast Fourier Transform (FFT) is an algorithm to compute the discrete Fourier transform (DFT) and its inverse. The video script highlights the FFT as a crucial tool in efficiently computing convolutions, especially for large data sets. By transforming the data into the frequency domain, the FFT allows for the multiplication of two data sets' spectra and then transforming back to the time domain to obtain the convolved output, all in a fraction of the time it would take with a direct convolution approach.

💡Discrete Case

The discrete case refers to scenarios where data is represented by a finite number of discrete points or samples. In the video, the script focuses on the discrete case of convolution, which is relevant to digital signal processing and digital image processing. The discussion of discrete convolution helps to build an understanding of the algorithmic and computational aspects of the operation, which is essential for practical applications in computer science and engineering.

💡Continuous Case

The continuous case, in contrast to the discrete case, involves data that can take on any value within a continuum. The video script mentions that the continuous case will be discussed in a separate part, focusing on probability distributions. This distinction is important for understanding the different mathematical treatments and applications of convolution in various fields, such as differential equations and continuous signal processing.

💡Algorithm

An algorithm is a set of rules or steps used to solve a problem or perform a computation. The video script discusses various algorithms related to convolution, including the direct computation of convolution and the more efficient FFT-based approach. Understanding these algorithms is crucial for optimizing the performance of operations that involve convolution, such as in image processing or probability theory.

Highlights

Exploration of combining two lists or functions through addition and multiplication, introducing convolution as a fundamental but less commonly discussed method.

Convolution's widespread applications in image processing, probability theory, and solving differential equations.

The convolution process as a new operation for combining lists or functions, distinct from simple addition or multiplication.

Visual explanation of convolution as a blending process, particularly in the context of polynomial multiplication.

Detailed discussion on the discrete case of convolution, emphasizing its importance in computational algorithms.

Introduction of a clever algorithm for computing convolutions, setting the stage for exploring continuous cases in a follow-up.

Use of probability and dice rolls as an accessible example to illustrate the concept of convolution.

Visualization of convolution through the alignment and multiplication of probability arrays, representing different outcomes.

Explanation of how convolution can be used to calculate probabilities with non-uniform dice, showcasing its adaptability.

The convolution of sequences as a method for generating new sequences through pairwise products and summation.

Demonstration of convolution with a simple numerical example, clarifying the process with step-by-step calculations.

Application of convolution in moving averages and data smoothing, providing a practical use case in statistics.

Image processing example using convolution to blur images, illustrating the technique's impact on visual data.

Discussion on the use of different kernels in image processing to achieve various effects, such as blurring and edge detection.

Introduction of the fast Fourier transform (FFT) as a method for efficiently computing convolutions, offering a significant computational advantage.

Comparison of computational efficiency between direct convolution and FFT-based methods, highlighting the后者's superior performance.

The polynomial multiplication example as a bridge between convolution and the fast Fourier transform, demonstrating the interconnectedness of mathematical concepts.

The potential for a faster algorithm for integer multiplication based on convolution, hinting at the broader implications of mathematical techniques.

Transcripts

play00:00

Suppose I give you two different lists of numbers, or maybe two different functions,

play00:04

and I ask you to think of all the ways you might combine those two lists to get

play00:07

a new list of numbers, or combine the two functions to get a new function.

play00:12

Maybe one simple way that comes to mind is to simply add them together term by term.

play00:17

Likewise with the functions, you can add all the corresponding outputs.

play00:20

In a similar vein, you could also multiply the two lists

play00:23

term by term and do the same thing with the functions.

play00:26

But there's another kind of combination just as fundamental as both of those,

play00:30

but a lot less commonly discussed, known as a convolution.

play00:34

But unlike the previous two cases, it's not something that's

play00:37

merely inherited from an operation you can do to numbers.

play00:39

It's something genuinely new for the context of lists of numbers or combining functions.

play00:45

They show up all over the place, they are ubiquitous in image processing,

play00:49

it's a core construct in the theory of probability,

play00:51

they're used a lot in solving differential equations,

play00:54

and one context where you've almost certainly seen it, if not by this name,

play00:58

is multiplying two polynomials together.

play01:00

As someone in the business of visual explanations, this is an especially great topic,

play01:04

because the formulaic definition in isolation and without context can look kind of

play01:09

intimidating, but if we take the time to really unpack what it's saying,

play01:12

and before that actually motivate why you would want something like this,

play01:16

it's an incredibly beautiful operation.

play01:18

And I have to admit, I actually learned a little something

play01:21

while putting together the visuals for this project.

play01:23

In the case of convolving two different functions,

play01:25

I was trying to think of different ways you might picture what that could mean,

play01:29

and with one of them I had a little bit of an aha moment for why it is that

play01:33

normal distributions play the role that they do in probability,

play01:36

why it's such a natural shape for a function.

play01:39

But I'm getting ahead of myself, there's a lot of setup for that one.

play01:41

In this video, our primary focus is just going to be on the discrete case,

play01:45

and in particular building up to a very unexpected but very clever algorithm for

play01:49

computing these.

play01:50

And I'll pull out the discussion for the continuous case into a second part.

play01:58

It's very tempting to open up with the image processing examples,

play02:01

since they're visually the most intriguing, but there are a couple bits of finickiness

play02:05

that make the image processing case less representative of convolutions overall,

play02:09

so instead let's kick things off with probability,

play02:11

and in particular one of the simplest examples that I'm sure everyone here has

play02:15

thought about at some point in their life, which is rolling a pair of dice and

play02:18

figuring out the chances of seeing various different sums.

play02:22

And you might say, not a problem, not a problem.

play02:24

Each of your two dice has six different possible outcomes,

play02:27

which gives us a total of 36 distinct possible pairs of outcomes,

play02:31

and if we just look through them all we can count up how many pairs have a given sum.

play02:36

And arranging all the pairs in a grid like this,

play02:39

one pretty nice thing is that all of the pairs that have a constant sum are visible

play02:43

along one of these different diagonals.

play02:45

So simply counting how many exist on each of those diagonals

play02:48

will tell you how likely you are to see a particular sum.

play02:53

And I'd say, very good, very good, but can you think of

play02:55

any other ways that you might visualize the same question?

play02:59

Other images that can come to mind to think of

play03:01

all the distinct pairs that have a given sum?

play03:04

And maybe one of you raises your hand and says, yeah, I've got one.

play03:08

Let's say you picture these two different sets of possibilities each in a row,

play03:12

but you flip around that second row.

play03:13

That way all of the different pairs which add up to seven line up vertically like this.

play03:19

And if we slide that bottom row all the way to the right,

play03:22

then the unique pair that adds up to two, the snake eyes, are the only ones that align,

play03:26

and if I schlunk that over one unit to the right,

play03:28

the pairs which align are the two different pairs that add up to three.

play03:32

And in general, different offset values of this lower array,

play03:36

which remember I had to flip around first, reveal all the distinct pairs that

play03:40

have a given sum.

play03:44

As far as probability questions go, this still isn't especially interesting because

play03:48

all we're doing is counting how many outcomes there are in each of these categories.

play03:52

But that is with the implicit assumption that there's

play03:55

an equal chance for each of these faces to come up.

play03:58

But what if I told you I have a special set of dice that's not uniform?

play04:02

Maybe the blue die has its own set of numbers describing the probabilities for

play04:05

each face coming up, and the red die has its own unique distinct set of numbers.

play04:10

In that case, if you wanted to figure out, say,

play04:12

the probability of seeing a 2, you would multiply the probability

play04:16

that the blue die is a 1 times the probability that the red die is a 1.

play04:20

And for the chances of seeing a 3, you look at the two distinct

play04:23

pairs where that's possible, and again multiply the corresponding

play04:26

probabilities and then add those two products together.

play04:30

Similarly, the chances of seeing a 4 involves multiplying together

play04:33

three different pairs of possibilities and adding them all together.

play04:36

And in the spirit of setting up some formulas, let's name these top probabilities a 1,

play04:41

a 2, a 3, and so on, and name the bottom ones b 1, b 2, b 3, and so on.

play04:46

And in general, this process where we're taking two different arrays of numbers,

play04:50

flipping the second one around, and then lining them up at various different

play04:54

offset values, taking a bunch of pairwise products and adding them up,

play04:57

that's one of the fundamental ways to think about what a convolution is.

play05:04

So just to spell it out a little more exactly, through this process,

play05:08

we just generated probabilities for seeing 2, 3, 4, on and on up to 12,

play05:12

and we got them by mixing together one list of values, a, and another list of values, b.

play05:17

In the lingo, we'd say the convolution of those two sequences gives us this new sequence,

play05:22

the new sequence of 11 values, each of which looks like some sum of pairwise products.

play05:27

If you prefer, another way you could think about the same operation is to first

play05:31

create a table of all the pairwise products, and then add up along all these diagonals.

play05:37

Again, that's a way of mixing together these two

play05:39

sequences of numbers to get us a new sequence of 11 numbers.

play05:43

It's the same operation as the sliding windows thought, just another perspective.

play05:47

Putting a little notation to it, here's how you might see it written down.

play05:50

The convolution of a and b, denoted with this little asterisk, is a new list,

play05:54

and the nth element of that list looks like a sum,

play05:58

and that sum goes over all different pairs of indices, i and j,

play06:01

so that the sum of those indices is equal to n.

play06:05

It's kind of a mouthful, but for example, if n was 6,

play06:08

the pairs we're going over are 1 and 5, 2 and 4, 3 and 3, 4 and 2, 5 and 1,

play06:13

all the different pairs that add up to 6.

play06:16

But honestly, however you write it down, the notation is secondary in

play06:19

importance to the visual you might hold in your head for the process.

play06:23

Here, maybe it helps to do a super simple example,

play06:26

where I might ask you what's the convolution of the list 1, 2, 3 with the list 4, 5, 6.

play06:31

You might picture taking both of these lists, flipping around that second one,

play06:35

and then starting with its lid all the way over to the left.

play06:38

Then the pair of values which align are 1 and 4,

play06:40

multiply them together, and that gives us our first term of our output.

play06:43

Slide that bottom array one unit to the right,

play06:46

the pairs which align are 1, 5 and 2 and 4, multiply those pairs,

play06:50

add them together, and that gives us 13, the next entry in our output.

play06:54

Slide things over once more, and we'll take 1 times 6 plus 2 times 5 plus 3 times 4,

play07:00

which happens to be 28.

play07:02

One more slide and we get 2 times 6 plus 3 times 5, and that gives us 27.

play07:07

And finally the last term will look like 3 times 6.

play07:10

If you'd like, you can pull up whatever your favorite programming

play07:13

language is and your favorite library that includes various numerical operations,

play07:17

and you can confirm I'm not lying to you.

play07:18

If you take the convolution of 1, 2, 3 against 4,

play07:21

5, 6, this is indeed the result that you'll get.

play07:25

We've seen one case where this is a natural and desirable operation,

play07:29

adding up to probability distributions, and another common example would be a

play07:32

moving average.

play07:34

Imagine you have some long list of numbers, and you take

play07:37

another smaller list of numbers that all add up to 1.

play07:40

In this case, I just have a little list of 5 values and they're all equal to 1 5th.

play07:44

Then if we do this sliding window convolution process,

play07:46

and kind of close our eyes and sweep under the rug what happens at

play07:50

the very beginning of it, once our smaller list of values entirely

play07:53

overlaps with the bigger one, think about what each term in this convolution really means.

play07:59

At each iteration, what you're doing is multiplying each of the values

play08:03

from your data by 1 5th and adding them all together,

play08:06

which is to say you're taking an average of your data inside this little window.

play08:11

Overall, the process gives you a smoothed out version of the original data,

play08:14

and you could modify this starting with a different little list of numbers,

play08:18

and as long as that little list all adds up to 1,

play08:20

you can still interpret it as a moving average.

play08:23

In the example shown here, that moving average

play08:25

would be giving more weight towards the central value.

play08:28

This also results in a smoothed out version of the data.

play08:33

If you do kind of a two-dimensional analog of this,

play08:35

it gives you a fun algorithm for blurring a given image.

play08:38

And I should say the animations I'm about to show are modified from something

play08:42

I originally made for part of a set of lectures I did with the Julia lab at

play08:46

MIT for a certain open courseware class that included an image processing unit.

play08:51

There we did a little bit more to dive into the code behind all of this,

play08:54

so if you're curious, I'll leave you some links.

play08:56

But focusing back on this blurring example, what's going on is I've got this

play09:00

little 3x3 grid of values that's marching along our original image, and if we zoom in,

play09:05

each one of those values is 1 9th, and what I'm doing at each iteration is

play09:09

multiplying each of those values by the corresponding pixel that it sits on top of.

play09:13

And of course in computer science, we think of colors as little vectors of three values,

play09:17

representing the red, green, and blue components.

play09:20

When I multiply all these little values by 1 9th and I add them together,

play09:24

it gives us an average along each color channel,

play09:26

and the corresponding pixel for the image on the right is defined to be that sum.

play09:31

The overall effect, as we do this for every single pixel on the image,

play09:35

is that each one kind of bleeds into all of its neighbors,

play09:38

which gives us a blurrier version than the original.

play09:41

In the lingo, we'd say that the image on the right is a

play09:44

convolution of our original image with a little grid of values.

play09:48

Or more technically, maybe I should say that it's the convolution

play09:51

with a 180 degree rotated version of that little grid of values.

play09:54

Not that it matters when the grid is symmetric,

play09:56

but it's just worth keeping in mind that the definition of a convolution,

play10:00

as inherited from the pure math context, should always invite you to think about

play10:03

flipping around that second array.

play10:05

If we modify this slightly, we can get a much more elegant

play10:08

blurring effect by choosing a different grid of values.

play10:11

In this case, I have a little 5x5 grid, but the distinction is not so much its size.

play10:15

If we zoom in, we notice that the value in the middle

play10:18

is a lot bigger than the value towards the edges.

play10:21

And where this is coming from is they're all sampled from a bell curve,

play10:24

known as a Gaussian distribution.

play10:26

That way, when we multiply all of these values by the corresponding

play10:29

pixel that they're sitting on top of, we're giving a lot more weight

play10:33

to that central pixel, and much less towards the ones out at the edge.

play10:36

And just as before, the corresponding pixel on the right is defined to be this sum.

play10:41

As we do this process for every single pixel, it gives a blurring effect,

play10:44

which much more authentically simulates the notion of putting your lens out of focus,

play10:48

or something like that.

play10:49

But blurring is far from the only thing that you can do with this idea.

play10:53

For instance, take a look at this little grid of values,

play10:56

which involves some positive numbers on the left,

play10:58

and some negative numbers on the right, which I'll color with blue and red respectively.

play11:03

Take a moment to see if you can predict and understand

play11:06

what effect this will have on the final image.

play11:10

So in this case, I'll just be thinking of the image as grayscale instead of colored,

play11:14

so each of the pixels is just represented by one number instead of three.

play11:18

And one thing worth noticing is that as we do this convolution,

play11:21

it's possible to get negative values.

play11:23

For example, at this point here, if we zoom in,

play11:25

the left half of our little grid sits entirely on top of black pixels,

play11:28

which would have a value of zero, but the right half of negative values all sit on

play11:32

top of white pixels, which would have a value of one.

play11:36

So when we multiply corresponding terms and add them together,

play11:39

the results will be very negative.

play11:41

And the way I'm displaying this with the image on the right

play11:43

is to color negative values red and positive values blue.

play11:46

Another thing to notice is that when you're on a patch that's all the same color,

play11:50

everything goes to zero, since the sum of the values in our little grid is zero.

play11:55

This is very different from the previous two examples where the sum of our little

play11:58

grid was one, which let us interpret it as a moving average and hence a blur.

play12:03

All in all, this little process basically detects wherever there's

play12:06

variation in the pixel value as you move from left to right,

play12:09

and so it gives you a kind of way to pick up on all the vertical edges from your image.

play12:16

And similarly, if we rotated that grid around so that it varies as you move from

play12:20

the top to the bottom, this will be picking up on all the horizontal edges,

play12:24

which in the case of our little pie creature image does result in some pretty

play12:28

demonic eyes.

play12:30

This smaller grid, by the way, is often called a kernel,

play12:32

and the beauty here is how just by choosing a different kernel,

play12:35

you can get different image processing effects, not just blurring your edge detection,

play12:39

but also things like sharpening.

play12:40

For those of you who have heard of a convolutional neural network,

play12:44

the idea there is to use data to figure out what the kernels should be in

play12:47

the first place, as determined by whatever the neural network wants to detect.

play12:52

Another thing I should maybe bring up is the length of the output.

play12:55

For something like the moving average example,

play12:57

you might only want to think about the terms when both of the windows

play13:00

fully align with each other.

play13:02

Or in the image processing example, maybe you want

play13:04

the final output to have the same size as the original.

play13:07

Now, convolutions as a pure math operation always produce an

play13:10

array that's bigger than the two arrays that you started with,

play13:13

at least assuming one of them doesn't have a length of one.

play13:16

Just know that in certain computer science contexts,

play13:19

you often want to deliberately truncate that output.

play13:24

Another thing worth highlighting is that in the computer science context,

play13:28

this notion of flipping around that kernel before you let it march across

play13:31

the original often feels really weird and just uncalled for, but again,

play13:35

note that that's what's inherited from the pure math context,

play13:38

where like we saw with the probabilities, it's an incredibly natural thing to do.

play13:43

And actually, I can show you one more pure math example where

play13:45

even the programmers should care about this one,

play13:48

because it opens the doors for a much faster algorithm to compute all of these.

play13:52

To set up what I mean by faster here, let me go back and pull up some Python again,

play13:56

and I'm going to create two different relatively big arrays.

play13:59

Each one will have a hundred thousand random elements in it,

play14:03

and I'm going to assess the runtime, of the convolve function from the NumPy library.

play14:08

And in this case, it runs it for multiple different iterations, tries to find an average,

play14:12

and it looks like, on this computer at least, it averages at 4.87 seconds.

play14:16

By contrast, if I use a different function from the SciPy library called fftConvolve,

play14:21

which is the same thing just implemented differently,

play14:25

that only takes 4.3 milliseconds on average, so three orders of magnitude improvement.

play14:30

And again, even though it flies under a different name,

play14:32

it's giving the same output that the other convolve function does,

play14:36

it's just doing something to go about it in a cleverer way.

play14:42

Remember how with the probability example, I said another way you could

play14:45

think about the convolution was to create this table of all the pairwise products,

play14:49

and then add up those pairwise products along the diagonals.

play14:53

There's of course nothing specific to probability.

play14:55

Any time you're convolving two different lists of numbers,

play14:57

you can think about it this way.

play14:59

Create this kind of multiplication table with all pairwise products,

play15:02

and then each sum along the diagonal corresponds to one of your final outputs.

play15:07

One context where this view is especially natural

play15:10

is when you multiply together two polynomials.

play15:13

For example, let me take the little grid we already have and replace the top terms

play15:18

with 1, 2x, and 3x squared, and replace the other terms with 4, 5x, and 6x squared.

play15:24

Now, think about what it means when we're creating all of

play15:26

these different pairwise products between the two lists.

play15:29

What you're doing is essentially expanding out the full product of

play15:32

the two polynomials I have written down, and then when you add up along the diagonal,

play15:37

that corresponds to collecting all like terms.

play15:40

Which is pretty neat.

play15:41

Expanding a polynomial and collecting like terms

play15:44

is exactly the same process as a convolution.

play15:47

But this allows us to do something that's pretty cool,

play15:50

because think about what we're saying here.

play15:52

We're saying if you take two different functions and you multiply them together,

play15:56

which is a simple pointwise operation, that's the same thing as if you had

play16:00

first extracted the coefficients from each one of those, assuming they're polynomials,

play16:05

and then taken a convolution of those two lists of coefficients.

play16:09

What makes that so interesting is that convolutions feel,

play16:12

in principle, a lot more complicated than simple multiplication.

play16:15

And I don't just mean conceptually they're harder to think about.

play16:18

I mean, computationally, it requires more steps to perform a convolution

play16:22

than it does to perform a pointwise product of two different lists.

play16:26

For example, let's say I gave you two really big polynomials,

play16:29

say each one with a hundred different coefficients.

play16:32

Then if the way you multiply them was to expand out this product,

play16:36

you know, filling in this entire 100 by 100 grid of pairwise products,

play16:40

that would require you to perform 10,000 different products.

play16:43

And then, when you're collecting all the like terms along the diagonals,

play16:47

that's another set of around 10,000 operations.

play16:50

More generally, in the lingo, we'd say the algorithm is O of n squared,

play16:54

meaning for two lists of size n, the way that the number of

play16:58

operations scales is in proportion to the square of n.

play17:01

On the other hand, if I think of two polynomials in terms of their outputs, for example,

play17:06

sampling their values at some handful of inputs,

play17:09

then multiplying them only requires as many operations as the number of samples,

play17:13

since again, it's a pointwise operation.

play17:16

And with polynomials, you only need finitely many

play17:18

samples to be able to recover the coefficients.

play17:20

For example, two outputs are enough to uniquely specify a linear polynomial,

play17:24

three outputs would be enough to uniquely specify a quadratic polynomial,

play17:29

and in general, if you know n distinct outputs,

play17:32

that's enough to uniquely specify a polynomial that has n different coefficients.

play17:37

Or, if you prefer, we could phrase this in the language of systems of equations.

play17:41

Imagine I tell you I have some polynomial, but I don't tell you what the coefficients are.

play17:45

Those are a mystery to you.

play17:46

In our example, you might think of this as the product that we're trying to figure out.

play17:50

And then, suppose I say, I'll just tell you what the outputs of this polynomial

play17:54

would be if you inputted various different inputs, like 0, 1, 2, 3, on and on,

play17:59

and I give you enough so that you have as many equations as you have unknowns.

play18:04

It even happens to be a linear system of equations, so that's nice,

play18:07

and in principle, at least, this should be enough to recover the coefficients.

play18:11

So the rough algorithm outline then would be, whenever you want to convolve two lists

play18:16

of numbers, you treat them like they're coefficients of two polynomials,

play18:20

you sample those polynomials at enough outputs, multiply those samples point-wise,

play18:24

and then solve this system to recover the coefficients as a sneaky backdoor way to find

play18:29

the convolution.

play18:31

And, as I've stated it so far, at least, some of you could rightfully complain, grant,

play18:36

that is an idiotic plan, because, for one thing,

play18:38

just calculating all these samples for one of the polynomials we know already takes

play18:43

on the order of n-squared operations.

play18:45

Not to mention, solving that system is certainly going to be

play18:48

computationally as difficult as just doing the convolution in the first place.

play18:52

So, like, sure, we have this connection between multiplication and convolutions,

play18:56

but all of the complexity happens in translating from one viewpoint to the other.

play19:01

But there is a trick, and those of you who know about Fourier

play19:04

transforms and the FFT algorithm might see where this is going.

play19:07

If you're unfamiliar with those topics, what I'm

play19:09

about to say might seem completely out of the blue.

play19:12

Just know that there are certain paths you could have

play19:14

walked in math that make this more of an expected step.

play19:17

Basically, the idea is that we have a freedom of choice here.

play19:20

If instead of evaluating at some arbitrary set of inputs, like 0, 1, 2, 3,

play19:24

on and on, you choose to evaluate on a very specially selected set of complex numbers,

play19:29

specifically the ones that sit evenly spaced on the unit circle,

play19:32

what are known as the roots of unity, this gives us a friendlier system.

play19:38

The basic idea is that by finding a number where taking its powers falls into

play19:42

this cycling pattern, it means that the system we generate is going to have a

play19:46

lot of redundancy in the different terms that you're calculating,

play19:49

and by being clever about how you leverage that redundancy,

play19:52

you can save yourself a lot of work.

play19:56

This set of outputs that I've written has a special name.

play19:58

It's called the discrete Fourier transform of the coefficients,

play20:02

and if you want to learn more, I actually did another lecture for that same Julia MIT

play20:06

class all about discrete Fourier transforms, and there's also a really excellent video on

play20:11

the channel Reducible talking about the fast Fourier transform,

play20:14

which is an algorithm for computing these more quickly.

play20:17

Also Veritasium recently did a really good video on FFTs, so you've got lots of options.

play20:22

And that fast algorithm really is the point for us.

play20:25

Again, because of all this redundancy, there exists a method to go from

play20:28

the coefficients to all of these outputs, where instead of doing on

play20:32

the order of n squared operations, you do on the order of n times the

play20:35

log of n operations, which is much much better as you scale to big lists.

play20:39

And, importantly, this FFT algorithm goes both ways.

play20:42

It also lets you go from the outputs to the coefficients.

play20:46

So, bringing it all together, let's look back at our algorithm outline.

play20:49

Now we can say, whenever you're given two long lists of numbers,

play20:52

and you want to take their convolution, first compute the fast Fourier transform of

play20:56

each one of them, which, in the back of your mind,

play20:59

you can just think of as treating them like they're the coefficients of a polynomial,

play21:03

and evaluating it at a very specially selected set of points.

play21:06

Then, multiply together the two results that you just got, point-wise,

play21:10

which is nice and fast, and then do an inverse fast Fourier transform,

play21:13

and what that gives you is the sneaky backdoor way to compute the convolution that

play21:17

we were looking for.

play21:19

But this time, it only involves O of n log n operations.

play21:23

That's really cool to me.

play21:25

This very specific context where convolutions show up,

play21:27

multiplying two polynomials, opens the doors for an algorithm

play21:30

that's relevant everywhere else where convolutions might come up.

play21:34

If you want to add probability distributions, do some large image processing,

play21:38

whatever it might be, and I just think that's such a good example of why you should be

play21:42

excited when you see some operation or concept in math show up in a lot of seemingly

play21:46

unrelated areas.

play21:48

If you want a little homework, here's something that's fun to think about.

play21:51

Explain why when you multiply two different numbers,

play21:54

just ordinary multiplication the way we all learn in elementary school,

play21:57

what you're doing is basically a convolution between the digits of those numbers.

play22:02

There's some added steps with carries and the like, but the core step is a convolution.

play22:07

In light of the existence of a fast algorithm,

play22:09

what that means is if you have two very large integers,

play22:12

then there exists a way to find their product that's faster than the method we learn

play22:16

in elementary school, that instead of requiring O of n squared operations,

play22:20

only requires O of n log n, which doesn't even feel like it should be possible.

play22:25

The catch is that before this is actually useful in practice,

play22:28

your numbers would have to be absolutely monstrous.

play22:31

But still, it's cool that such an algorithm exists.

play22:35

And next up, we'll turn our attention to the continuous

play22:37

case with a special focus on probability distributions.

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
ConvolutionProbabilityImage ProcessingPolynomial MultiplicationFFT AlgorithmDiscrete MathData SmoothingEdge DetectionAlgorithm OptimizationNumerical Computation
Benötigen Sie eine Zusammenfassung auf Englisch?