What are Maximum Likelihood (ML) and Maximum a posteriori (MAP)? ("Best explanation on YouTube")

Iain Explains Signals, Systems, and Digital Comms
19 Dec 202018:19

Summary

TLDRThis script explores the concepts of Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) in the context of signal processing. It uses the example of a system with a constant 'a' and additive white Gaussian noise (AWGN) to illustrate how to estimate an unknown variable 'x' from measurements 'y'. The video explains the process of finding the MLE by maximizing the probability density function (PDF) of 'y' given 'x', and contrasts it with MAP, which incorporates prior knowledge about the distribution of 'x'. The script clarifies the difference between the two by visualizing the PDF plots for various 'x' values and how they are affected by the measurement 'y-bar'. It concludes by noting that MLE and MAP are often the same in digital communications where symbols are equally likely.

Takeaways

  • πŸ“š The script discusses two statistical estimation methods: Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP).
  • πŸ” MLE is an estimation technique where the estimate of a parameter is the one that maximizes the likelihood function given the observed data.
  • πŸ“ˆ The script uses the example of a system with a constant 'a' and additive white Gaussian noise (AWGN) to explain MLE, where measurements 'y' are taken to estimate an unknown 'x'.
  • 🌑️ Practical applications of the discussed concepts include estimating temperature in IoT devices, digital signal processing, and radar tracking.
  • πŸ“‰ The likelihood function for a Gaussian noise model is given by a normal distribution with mean 'a times x' and variance 'sigma squared'.
  • πŸ“Š To find the MLE, one must plot the likelihood function for various values of 'x' and identify the peak that corresponds to the measured 'y'.
  • βœ‚οΈ MAP estimation incorporates prior knowledge about the distribution of 'x', in addition to the likelihood function, to find the estimate that maximizes the posterior probability.
  • πŸ€– The Bayesian rule is applied in MAP to combine the likelihood function and the prior distribution of 'x', normalizing by the probability density function of 'y'.
  • πŸ“ The script emphasizes the importance of understanding the difference between MLE and MAP, especially when prior information about the parameter distribution is available.
  • πŸ”„ The maximum a priori (MAPri) estimate is the value of 'x' that maximizes the prior distribution without considering any measurements.
  • πŸ”„ If the prior distribution is uniform, meaning all values of 'x' are equally likely, then MLE and MAP yield the same result, which is often the case in digital communications.
  • πŸ‘ The script encourages viewers to engage with the content by liking, subscribing, and visiting the channel's webpage for more categorized content.

Q & A

  • What is the basic model described in the script for signal estimation?

    -The basic model described in the script is y = a * x + n, where 'a' is a constant, 'x' is the signal we want to estimate, and 'n' represents additive white Gaussian noise (AWGN).

  • What are some practical examples where the described model could be applied?

    -The model could be applied in scenarios such as measuring temperature in an IoT device, digital signal processing in the presence of noise, and radar tracking where 'x' could represent grid coordinates of a target.

  • What is the maximum likelihood estimate (MLE) and how is it found?

    -The maximum likelihood estimate (MLE) is the value of 'x' that maximizes the probability density function (pdf) of 'y' given 'x'. It is found by evaluating the pdf for different values of 'x' and selecting the one that maximizes the function for the measured value of 'y'.

  • How does the Gaussian noise affect the pdf of 'y' given a specific 'x'?

    -Given a specific 'x', the pdf of 'y' will have a Gaussian shape with a mean value of 'a' times 'x'. The noise is Gaussian, so the distribution of 'y' will be centered around this mean.

  • What is the formula for the pdf of 'y' given 'x' in the context of Gaussian noise?

    -The formula for the pdf of 'y' given 'x' is f(y|x) = 1 / (sqrt(2 * pi * Οƒ^2)) * exp(-(y - a * x)^2 / (2 * Οƒ^2)), where Οƒ is the standard deviation of the Gaussian noise.

  • How does the maximum likelihood estimation differ from maximum a posteriori estimation?

    -Maximum likelihood estimation only considers the likelihood function given the observed data, whereas maximum a posteriori estimation also incorporates prior knowledge about the distribution of 'x', combining it with the likelihood function.

  • What is the Bayesian formula used to express the maximum a posteriori (MAP) estimate?

    -The Bayesian formula used for MAP is arg max_x (f(y|x) * f(x)) / f(y), where f(y|x) is the likelihood function, f(x) is the prior distribution of 'x', and f(y) is the marginal likelihood.

  • Why does the denominator f(y) in the MAP formula often get ignored during the maximization process?

    -The denominator f(y) is a normalizing constant that is independent of 'x', so it does not affect the location of the maximum and can be ignored during the maximization process.

  • How does the concept of 'maximum a priori' relate to 'maximum a posteriori'?

    -Maximum a priori (MAPri) is the estimate of 'x' that maximizes the prior distribution f(x) before any measurements are taken. Maximum a posteriori is similar but includes the likelihood function given the observed data, providing a more informed estimate.

  • In what type of scenarios would the maximum likelihood estimate be the same as the maximum a posteriori estimate?

    -The maximum likelihood estimate would be the same as the maximum a posteriori estimate when the prior distribution of 'x' is uniform, meaning all values of 'x' are equally likely, which is often the case in digital communications.

Outlines

00:00

πŸ” Introduction to Maximum Likelihood and Maximum a Posteriori Estimation

This paragraph introduces the concepts of Maximum Likelihood Estimation (MLE) and Maximum a Posteriori (MAP) in the context of signal processing and detection. It uses the example of a system where measurements 'y' are taken in the presence of Additive White Gaussian Noise (AWGN) to estimate an unknown variable 'x'. The paragraph explains that MLE involves finding the value of 'x' that maximizes the probability density function (pdf) of 'y' given 'x'. The example of estimating temperature in an IoT device, digital signal in noise, and radar tracking coordinates are given to illustrate the application of these concepts. The explanation also includes the mathematical representation of the Gaussian pdf for the given model.

05:00

πŸ“ˆ Visualizing Maximum Likelihood Estimation

The second paragraph delves deeper into the process of Maximum Likelihood Estimation by visualizing the probability density function (pdf) for different values of 'x'. It describes the task of finding the argument that maximizes the pdf when a measurement 'y' is given. The paragraph illustrates this with a series of Gaussian plots, each representing the pdf of 'y' given different hypothetical values of 'x'. The goal is to identify the peak of the Gaussian that aligns with the measured value 'y-bar', which corresponds to the Maximum Likelihood Estimate (MLE) of 'x'. The explanation simplifies the concept by showing how the MLE can be found graphically and mathematically, concluding with the formula for the MLE in the context of the Gaussian example.

10:01

πŸ“š Transitioning from Maximum Likelihood to Maximum a Posteriori

This paragraph transitions from discussing Maximum Likelihood Estimation to introducing Maximum a Posteriori estimation. It highlights the difference between the two by pointing out that MLE does not assume any prior knowledge about the distribution of 'x', while MAP incorporates prior information. The paragraph explains the concept of 'a priori', which is the expected distribution of 'x' before any measurements are taken, and contrasts it with 'a posteriori', which considers the updated distribution after measurements. The explanation includes the Bayesian formula for MAP, emphasizing how it combines the likelihood function with the prior distribution of 'x', and discusses the normalization factor 'f of y', which is independent of 'x' and can be ignored in the maximization process.

15:02

🌐 Understanding Maximum a Posteriori Estimation with Examples

The final paragraph provides a comprehensive understanding of Maximum a Posteriori estimation by explaining how it differs from Maximum Likelihood Estimation. It describes the process of weighting the likelihood function by the prior distribution of 'x' and then finding the value of 'x' that maximizes this product. The paragraph uses the example of radar tracking to illustrate how prior knowledge about the location of ships in a harbor can influence the MAP estimate. It also clarifies that if the prior distribution is uniform, the MAP estimate is equivalent to the MLE estimate, which is often the case in digital communications where symbols are equally likely. The paragraph concludes by encouraging viewers to engage with the content and explore further resources on the channel's webpage.

Mindmap

Keywords

πŸ’‘Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation is a statistical method used to estimate the parameters of a model. It involves finding the parameter values that maximize the likelihood function, which is the probability of observing the given data. In the video, MLE is used to estimate 'x', the variable of interest, based on measurements 'y' that include noise. The script explains that MLE does not assume any prior knowledge about the distribution of 'x' and focuses on the value of 'x' that maximizes the probability of the observed measurements.

πŸ’‘Maximum A Posteriori (MAP)

Maximum A Posteriori is a Bayesian statistical method that estimates the most likely value for a parameter given prior knowledge. Unlike MLE, MAP incorporates prior beliefs or information about the distribution of the parameter before new data is observed. In the context of the video, MAP is used to refine the estimate of 'x' by considering both the likelihood of the data and the prior distribution of 'x', leading to a more informed estimate that may differ from the MLE if the prior information is strong.

πŸ’‘Additive White Gaussian Noise (AWGN)

Additive White Gaussian Noise refers to a type of noise that is added to a signal and follows a Gaussian probability distribution with a mean of zero. In the script, AWGN is used to model the noise 'n' that corrupts the measurements 'y' in the system. The presence of AWGN affects the accuracy of the parameter estimation, and understanding its properties is crucial for developing effective estimation techniques.

πŸ’‘Likelihood Function

The likelihood function is a mathematical expression that represents the probability of observing the given data for different values of a parameter. In the video, the likelihood function is central to both MLE and MAP, as it is used to determine the values of 'x' that are most consistent with the observed measurements 'y'. The script illustrates how this function is maximized in MLE and how it is combined with the prior distribution in MAP.

πŸ’‘Bayesian Rules

Bayesian Rules are a set of mathematical formulas used in Bayesian statistics to update or calculate conditional probabilities. In the script, Bayesian rules are used to express the MAP estimate in terms of the likelihood function and the prior distribution. The video explains how these rules allow for the incorporation of prior knowledge into the estimation process, which is a key distinction between MLE and MAP.

πŸ’‘Prior Distribution

The prior distribution represents the probability distribution of a parameter before new data is observed. It encapsulates prior beliefs or information about the parameter. In the video, the prior distribution of 'x' is used in MAP to adjust the likelihood function, reflecting prior knowledge that certain values of 'x' are more likely than others, which can be crucial in applications like radar tracking or temperature measurement.

πŸ’‘Posterior Distribution

The posterior distribution is the updated probability distribution of a parameter after accounting for new data. It is central to Bayesian analysis and is derived using Bayesian rules. In the context of the video, the posterior distribution is not explicitly mentioned, but it is implied in the discussion of MAP, where the prior distribution is updated based on the likelihood function to reflect the new information from the measurements.

πŸ’‘Digital Signal

A digital signal is a type of signal that represents information in a discrete rather than continuous form. In the script, the digital signal is an example of 'x', the parameter being estimated, which can be corrupted by noise when transmitted or received. The video uses this example to illustrate how MLE and MAP can be applied to recover the original signal from a noisy observation.

πŸ’‘Radar Tracking

Radar tracking is a process used to monitor and determine the position and movement of objects, such as ships, using radar systems. In the video, radar tracking serves as an example where the position coordinates ('x') of a target are estimated from noisy radar measurements ('y'). The script discusses how prior knowledge about the typical locations of ships within a harbor can be used in MAP to improve the estimation of their positions.

πŸ’‘Temperature Measurement

Temperature measurement is an example used in the script to illustrate the application of MLE and MAP in estimating a physical quantity. The script mentions that temperature in an IoT device could be the parameter 'x' being estimated from noisy measurements ('y'), such as those affected by amplifier noise. The video explains how the prior knowledge that temperature values are bounded (e.g., not negative 4,000 degrees) can influence the estimation process.

Highlights

Introduction to the concepts of maximum likelihood and maximum a posteriori estimation in the context of signal processing and detection.

Explanation of a basic system model with a constant 'a' and additive white Gaussian noise (AWGN) 'n'.

Discussion of practical examples where the system model is applicable, such as temperature measurement in IoT devices.

Illustration of how to estimate 'x' from measurements 'y' using the maximum likelihood estimate (MLE).

Definition and explanation of the maximum likelihood function and its role in estimating 'x'.

Clarification on the difference between maximum likelihood and the given value of 'y'.

Derivation of the probability density function (pdf) for the Gaussian example.

Visualization of the Gaussian function for different values of 'x' and the task of finding the argument that maximizes it.

Graphical representation of how to find the maximum likelihood estimate by comparing different Gaussian plots.

Explanation of the maximum a posteriori (MAP) estimation considering prior knowledge about the distribution of 'x'.

Introduction to the concept of maximum a priori estimation before taking any measurements.

Bayesian interpretation of maximum a posteriori estimation using the likelihood function and prior distribution.

Difference between maximum likelihood and maximum a posteriori in terms of incorporating prior knowledge.

Practical implications of maximum likelihood and maximum a posteriori in digital communications where symbols are equally likely.

Encouragement for viewers to engage with the content through likes and subscriptions for more informative videos.

Transcripts

play00:02

so what

play00:03

are maximum likelihood and maximum

play00:06

a posteriori for estimation and

play00:09

detection and i like to think of an

play00:12

example

play00:12

so let's think of y equals a

play00:16

x plus n so let's think of this system

play00:20

where a is a constant

play00:23

and so i'm going to put constant and

play00:26

let's say n

play00:27

is additive white gaussian noise

play00:30

awgn okay so in this

play00:34

system we we're going to take

play00:36

measurements y and we're going to try to

play00:38

work out what x is

play00:39

so lots of examples of this could be for

play00:42

example

play00:42

x could be the temperature i'm just

play00:45

going to write make a few

play00:47

little notes here the temperature in an

play00:49

iot device for example could be

play00:51

measuring the temperature

play00:52

that might be one thing that's being

play00:54

that's going on here

play00:56

and we might be measuring it with an

play00:58

amplifier

play00:59

which is uh then going to have noise in

play01:02

the amplifier

play01:03

so maybe one example is measuring

play01:05

temperature in an iot device

play01:07

uh what about a digital signal so maybe

play01:10

it's uh

play01:12

a digital signal in noise

play01:16

and uh then x is the digital data

play01:19

uh a is the gain of the channel and

play01:22

n is the noise from uh the receiver

play01:26

electronics another one might be

play01:29

x might be a two dimensional might be a

play01:31

vector so it might be grid coordinates

play01:33

uh grid coordinates uh

play01:37

in in tracking in in a radar tracking in

play01:40

radar

play01:41

so of a target in radar so all of these

play01:45

different examples

play01:47

fit into this category so we're going to

play01:49

measure it y

play01:50

and we're going to try to work out what

play01:51

x is okay so let's start with maximum

play01:54

likelihood

play01:55

so the maximum likelihood function is an

play01:58

estimate of x so we're going to use x

play02:01

hat

play02:02

for an estimate of x and then we could

play02:05

we put mle for maximum likelihood

play02:08

estimate

play02:09

and now it's going to be a function of y

play02:11

which is the measurement that we've

play02:13

measured

play02:14

and what it equals is the the definition

play02:17

so the argument

play02:19

of the maximizing uh term

play02:22

of x the ma the value of x that

play02:25

maximizes

play02:26

this function so this function is f of

play02:30

y so the pdf of y

play02:33

given the value x so this is

play02:37

the equation for the maximum likelihood

play02:40

estimate this is the definition so the

play02:42

the argument that maximizes this

play02:44

function

play02:45

and and so it means the value of x that

play02:47

maximizes this function

play02:49

okay so this is a pdf and we're given

play02:53

a value of x okay now what does this

play02:56

mean because this is not particularly

play02:58

intuitive because

play02:59

you would think that we're given the

play03:00

value of y so we're going to take a

play03:02

measurement y you would think that we're

play03:04

given

play03:05

that value okay so turns out in maximum

play03:09

a posteriori that is what

play03:11

we're looking at but for maximum

play03:12

likelihood no we have this function

play03:14

so i'd like to try to understand that

play03:16

function a little bit

play03:18

more okay so for the gaussian let's

play03:21

for this model here let's think of what

play03:22

that function is

play03:24

okay so this pdf of y given x

play03:27

well if if you were to be given x if you

play03:30

knew what x is

play03:31

what would be the distribution of y well

play03:34

in this case here if you were given a

play03:36

value of x then the distribution of y

play03:38

well it would have a gaussian shape

play03:40

because the noise is gaussian

play03:41

and it would have a mean value of this

play03:44

constant a

play03:45

times x so in this case in the in the

play03:48

gaussian example

play03:51

so i'm going to write in gaussian

play03:53

example here

play03:54

so that's an important thing this is a

play03:57

general formula

play03:58

i'll put a box around it because it's

play04:00

the general formula

play04:02

but then i'm going to write down an

play04:04

example of gaussian which helps us to

play04:06

to understand it so this is f of y the

play04:08

pdf of y

play04:09

given x equals 1 divided by the square

play04:14

root of 2 pi sigma exponential

play04:17

of minus y minus

play04:21

ax all squared divided by 2

play04:24

sigma squared okay so that is

play04:28

the pdf that is this function for this

play04:31

gaussian example

play04:33

okay so i like to now think of this in

play04:35

terms of

play04:36

what the actual plots look like and what

play04:37

this is asking us to do

play04:40

so what this is actually saying if we

play04:41

look at this here we can plot this

play04:44

function

play04:44

so let me plot one example of that

play04:46

function okay so here is

play04:48

an example of this function we're

play04:50

plotting it as a function of y

play04:52

and it's a gaussian because in this

play04:54

example it's a gaussian

play04:56

and it has the the peak of the gaussian

play05:00

is at the value y

play05:01

equals ax okay so here's

play05:04

a gaussian drawn at the value

play05:08

a times x now i'm going to draw it for a

play05:11

number of different values of x

play05:13

because we have to find the argument of

play05:15

the maximum over

play05:16

all different values of x so what this

play05:18

says is we have to search

play05:20

all the values of x look at this

play05:22

function for each value of x

play05:24

and find the one which is the maximum

play05:27

okay so let me do this here let me just

play05:30

draw this here for x1 so i'm going to

play05:32

put x1 i'm going to put a circle around

play05:34

that so this is a plot

play05:35

for x1 it is a plot of

play05:38

f of y

play05:42

given x1 that's what that plot is

play05:46

okay now i'm then maybe i'll draw it for

play05:50

some different

play05:50

so this is x1 here so let me draw it for

play05:53

some other values as well

play05:55

okay so here's another one so we can

play05:57

visualize what we're

play05:58

what we're having to do in this argument

play06:00

of this maximization so here's another

play06:02

one

play06:03

maybe this is x x 2 gives a times x 2 at

play06:06

this point

play06:07

so this is a plot for x 2 and x 2 is a

play06:11

gaussian shifted so that it's centered

play06:13

at a

play06:14

times x 2. and so then there'll be other

play06:17

ones here and i'm going to draw them

play06:18

down here to

play06:20

a general one for example let's say this

play06:22

is a

play06:23

times x n so this is for x n here

play06:27

and this one has a gaussian shape with a

play06:30

peak

play06:30

at a times x n and x n is bigger we're

play06:33

still plotting

play06:34

with respect to y okay so here

play06:38

are different versions of this function

play06:40

this function is a gaussian i haven't

play06:42

drawn this gaussian here very well but

play06:44

this is a gaussian function

play06:46

and when we look at different values of

play06:48

x it shifts the gaussian across

play06:50

because this is the example we're

play06:52

looking at okay so what what's our task

play06:55

what is our task here well we're given a

play06:57

value

play06:58

we're going to measure a value y so

play07:00

we're measuring y and we'd

play07:01

like to evaluate this function to find

play07:05

the value of x that maximizes

play07:08

this function so here we've drawn that

play07:10

function for different values of x

play07:13

okay so this is x1 x2 n

play07:16

so let's say we're going to be given

play07:19

so we're going to measure we're going to

play07:21

make a measurement

play07:22

so let's say we've measured

play07:26

the y equals y bar okay so let's say y

play07:30

equals y-bar okay so where is y-bar well

play07:33

this is values of y

play07:34

on these plots so let's say for example

play07:37

that y-bar

play07:38

is this value here okay i'm going to

play07:40

draw it on here y-bar

play07:42

and we're going to go all the way down

play07:44

and this is y

play07:45

bar and this is y bar here okay so let's

play07:48

say we've

play07:49

taken our measurement we've measured y

play07:50

bar and we'd like to find the

play07:52

maximum likelihood estimate of x maybe

play07:54

it's the temperature

play07:56

maybe it's the digital signal from our

play07:58

noisy measurement or maybe it's grid

play07:59

coordinates in radar

play08:01

okay so what are we doing we're looking

play08:03

at this y-bar and we're going to

play08:04

evaluate this function

play08:06

at y equals y-bar so that's this value

play08:09

here

play08:10

that value there is f of

play08:13

y at y-bar given x1

play08:17

okay this one here this value here that

play08:20

is

play08:22

f of y given y-bar for x

play08:25

2 and this value here

play08:28

this value here is f of y

play08:32

for y bar the exact value we've measured

play08:35

for x

play08:36

n okay we're going to look for all these

play08:39

values

play08:39

this value here this value here this

play08:41

value here and all the different ones

play08:43

for all the different values of x

play08:44

n because that's what we have to do and

play08:46

we have to find the value that is the

play08:48

biggest

play08:49

and i think you can see there's a

play08:52

there's a plot in here somewhere between

play08:54

x2 and xn there's a plot where

play08:57

the peak of the gaussian sits exactly

play09:00

over

play09:01

y-bar okay and so one of these plots is

play09:05

this one

play09:06

and this plot is because that is the

play09:09

maximum that the biggest value that

play09:10

you're going to get

play09:11

at y bar is going to be that value

play09:14

and that value is going to correspond to

play09:17

x

play09:18

mle and we put a hat for the estimate

play09:21

okay so this was for x1 the plot this

play09:24

was the plot for x2

play09:25

this was the plot for xn this is the

play09:27

plot when it is the maximum

play09:29

that is the plot that corresponds to the

play09:31

maximum likelihood estimate of x

play09:33

x mle and so this value here

play09:36

i think you can say is a times what is x

play09:39

hat

play09:40

mle okay this is a function of y so

play09:44

so then once we've measured this then we

play09:46

can see here

play09:47

now because this is y bar this is going

play09:49

to equal

play09:51

in our example we're looking at x our a

play09:54

times

play09:54

x hat mle so the maximum likelihood

play09:58

estimate for this example here the

play10:01

maximum likelihood estimate of x

play10:03

is y bar divided by a

play10:06

okay so that is what we mean with the

play10:09

maximum likelihood estimate i think the

play10:11

important thing

play10:12

is really to understand what you're

play10:14

doing you're plotting this function for

play10:16

different values of x over

play10:18

all the different values of x and you're

play10:19

finding the one

play10:21

where there's a maximum for the value of

play10:23

y bar that you've measured

play10:25

if we'd measured a different value of y

play10:26

bar this a different one of these would

play10:29

be the maximum i think hopefully you can

play10:31

see that

play10:32

okay this is maximum likelihood well

play10:34

what about this maximum a posteriori

play10:37

the the one thing we haven't mentioned

play10:38

is that everything so far

play10:40

has assumed nothing about the

play10:42

distribution of x

play10:44

we didn't assume that x was i mean it

play10:47

could be for example

play10:49

if it was the temperature example let's

play10:50

take that case we know that

play10:52

it's very unlikely that x is going to be

play10:55

negative 4

play10:56

000 because there are no temperatures at

play10:59

negative 4

play11:00

000 if you're measuring in celsius or or

play11:02

fahrenheit or kelvin for that matter

play11:05

so it can be that sometimes you have

play11:08

some information about the distribution

play11:10

of x

play11:10

for example for grid coordinates in

play11:12

radar you might know

play11:14

that objects tend to exist

play11:17

near let's say it's a maritime radar and

play11:20

you're you're monitoring ship

play11:21

movements in a in a port then you'll

play11:24

know where the ships

play11:25

tend to go in the port and you'll have

play11:27

some information that some

play11:29

areas of the harbour are more likely to

play11:31

find for the ships to be than

play11:33

other areas of the harbour so if you

play11:36

know

play11:36

something about the distribution of x

play11:39

then you can do something else

play11:41

other than just the maximum likelihood

play11:43

the maximum likelihood did not

play11:45

assume any knowledge about that so let's

play11:48

let's say we have this other

play11:50

situation where we know something about

play11:51

the distribution of x

play11:53

and what would we do if we didn't even

play11:55

take any measurements

play11:56

at all so if we knew about the

play11:58

distribution of x this is the

play12:00

distribution of x

play12:01

the pdf of x over the values

play12:04

that x can take so if we were to

play12:08

not take any measurements and just ask

play12:09

ourselves where is it likely to find

play12:12

the value the the if it's the radar

play12:14

where is it likely to find the ship

play12:16

then what we could do is we could find

play12:18

the argument

play12:19

that maximizes this

play12:23

function here over the values of x and

play12:26

this would be

play12:27

the value of x hat which would be called

play12:30

the maximum a priori

play12:34

so this is what we call as latin a

play12:37

p-r-i-o-r-i

play12:38

priori means prior to the measurement

play12:42

so if we didn't make any measurement we

play12:44

just know about

play12:46

where we expect to find uh expect values

play12:49

of x to be

play12:50

then we can take an argument we can find

play12:53

the value of x that maximizes the

play12:55

over the density of x and this would be

play12:57

what we would call our maximum

play12:58

a priori estimate of x so this is

play13:02

leading up to maximum a posteriori so

play13:05

this is the concept of

play13:06

a priori is before you've taken the

play13:08

measurement so

play13:09

prior to taking the measurement so what

play13:12

could we do

play13:13

so what does it mean is a posteriori

play13:15

well the maximum a posteriori

play13:18

we use map for that maximum a posteriori

play13:22

this equals the argument

play13:25

that maximizes over values of

play13:29

x this function f of x

play13:33

of x given y and as i said before this

play13:36

is the one that's a little bit more

play13:37

intuitive because this is given the

play13:39

measurement

play13:40

now you're going to maximize the pdf of

play13:42

x conditioned on given the measurement

play13:45

and so in this case based on bayesian

play13:48

rules

play13:49

we know that this can be written as the

play13:53

so we've got the arg max

play13:56

over x of

play13:59

of f of y of

play14:02

y given x times

play14:05

f of x

play14:08

divided by f of y

play14:12

so this is just bayesian rules so this

play14:14

is the bayesian formula here

play14:16

and so in this case uh we've got

play14:19

the likelihood function you'll recognize

play14:21

it from up here

play14:22

so now we're still going to uh we've

play14:24

still got to look through all the

play14:25

different values of x

play14:26

and and look at the maximizing this but

play14:28

now it's this function times

play14:30

f of x so times the information that we

play14:33

have about

play14:34

x so we're now weighting the likelihood

play14:37

function

play14:38

by our prior knowledge of where we

play14:41

expect to see

play14:42

the the um the boats if it's a radar

play14:45

example of boats in a harbor

play14:47

but we've also got this divided by f y

play14:50

of y

play14:50

now what do we know about this well this

play14:53

is the probability

play14:54

density function for y and this

play14:58

you we don't know this just from our

play15:02

from our measurement we've only measured

play15:03

one value of y but what we do know is we

play15:05

can calculate it

play15:07

and we can calculate it again by using

play15:10

bayesian formula

play15:11

it's the integral over all the values of

play15:13

x of f

play15:14

y y given x times f x

play15:17

of x integrated over all those values of

play15:20

x

play15:21

and this is because it's integrated over

play15:25

all the values of

play15:26

x this is actually independent

play15:29

of the particular value of x that we're

play15:31

searching for

play15:32

so this function here is sometimes a bit

play15:34

counter-intuitive and it takes you a bit

play15:36

of time to think through this

play15:38

but this function here this is a value

play15:41

which is the

play15:41

overall probability density function of

play15:43

y it's not conditioned on x

play15:45

and it's actually doesn't depend on x

play15:48

so therefore in this arg max we can

play15:50

ignore the

play15:51

denominator and what we've got in our

play15:54

map

play15:55

maximum a posteriori is we're doing

play15:57

exactly the same as maximum likelihood

play15:59

except we're weighting this value

play16:01

by f of x and so all we're saying in

play16:04

this for

play16:05

i mean how does that relate to these

play16:06

pictures well all we're doing

play16:08

is if you took a value of y if you made

play16:10

a measurement of y bar like we did

play16:11

before

play16:12

then instead of just drawing these plots

play16:15

here and taking these values

play16:17

and looking for the maximum of these

play16:19

values

play16:20

now you have to take these values

play16:22

multiplied by

play16:24

f of x so in this case it would be this

play16:26

value here multiplied by

play16:28

fx of x1 and in this case it would be

play16:31

this value

play16:32

multiplied by fx of x2

play16:35

in this case fx mult this one multiplied

play16:37

by fx

play16:38

of x3 and so this one here i didn't draw

play16:42

this one

play16:42

in here this one is f y y bar or

play16:45

given x hat m l e

play16:49

uh and so you'd have this one multiplied

play16:51

by

play16:52

f of x uh of the x that that corresponds

play16:56

to this value here

play16:57

so in these in the case of i wrote it

play16:59

there with a maximum likelihood estimate

play17:01

because this was the

play17:02

value when you didn't have the prior

play17:04

knowledge but if you do have the prior

play17:06

knowledge

play17:06

and you're going to do a map estimate

play17:08

you need to multiply this by

play17:10

that prior knowledge and that's really

play17:12

the difference between

play17:14

maximum likelihood and maximum a

play17:15

posteriori

play17:17

naturally enough if the distribution was

play17:20

uniform if all values of x were equally

play17:23

likely

play17:24

then the maximum likelihood estimate is

play17:26

exactly the same

play17:27

as the maximum a posteriori estimate and

play17:30

that's often the case in digital

play17:32

communications

play17:33

so unlike radar where you know where the

play17:36

targets might be more likely to be than

play17:38

others and temperature where you know

play17:39

certain values of temperature are more

play17:41

likely than others

play17:42

in digital signals you often have a very

play17:45

accurate compression coding or source

play17:47

coding

play17:48

which means that the symbols are equally

play17:51

likely

play17:52

the ones and zeros or the different

play17:54

constellation points are equally likely

play17:56

and so for digital communications is

play17:58

often the case that the maximum

play17:59

likelihood is the same as the maximum a

play18:02

posteriori

play18:04

so if you found this video helpful give

play18:06

it a thumbs up it helps others to find

play18:08

the video

play18:09

subscribe to the channel for more videos

play18:11

and check out the webpage in the link

play18:13

below for a full categorised list of all

play18:16

the videos

play18:16

on the channel

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Machine LearningSignal ProcessingEstimation TechniquesNoise ReductionBayesian RulesGaussian NoiseIoT DevicesDigital SignalsRadar TrackingPrior KnowledgeLikelihood Function