What are Maximum Likelihood (ML) and Maximum a posteriori (MAP)? ("Best explanation on YouTube")
Summary
TLDRThis script explores the concepts of Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) in the context of signal processing. It uses the example of a system with a constant 'a' and additive white Gaussian noise (AWGN) to illustrate how to estimate an unknown variable 'x' from measurements 'y'. The video explains the process of finding the MLE by maximizing the probability density function (PDF) of 'y' given 'x', and contrasts it with MAP, which incorporates prior knowledge about the distribution of 'x'. The script clarifies the difference between the two by visualizing the PDF plots for various 'x' values and how they are affected by the measurement 'y-bar'. It concludes by noting that MLE and MAP are often the same in digital communications where symbols are equally likely.
Takeaways
- 📚 The script discusses two statistical estimation methods: Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP).
- 🔍 MLE is an estimation technique where the estimate of a parameter is the one that maximizes the likelihood function given the observed data.
- 📈 The script uses the example of a system with a constant 'a' and additive white Gaussian noise (AWGN) to explain MLE, where measurements 'y' are taken to estimate an unknown 'x'.
- 🌡️ Practical applications of the discussed concepts include estimating temperature in IoT devices, digital signal processing, and radar tracking.
- 📉 The likelihood function for a Gaussian noise model is given by a normal distribution with mean 'a times x' and variance 'sigma squared'.
- 📊 To find the MLE, one must plot the likelihood function for various values of 'x' and identify the peak that corresponds to the measured 'y'.
- ✂️ MAP estimation incorporates prior knowledge about the distribution of 'x', in addition to the likelihood function, to find the estimate that maximizes the posterior probability.
- 🤖 The Bayesian rule is applied in MAP to combine the likelihood function and the prior distribution of 'x', normalizing by the probability density function of 'y'.
- 📝 The script emphasizes the importance of understanding the difference between MLE and MAP, especially when prior information about the parameter distribution is available.
- 🔄 The maximum a priori (MAPri) estimate is the value of 'x' that maximizes the prior distribution without considering any measurements.
- 🔄 If the prior distribution is uniform, meaning all values of 'x' are equally likely, then MLE and MAP yield the same result, which is often the case in digital communications.
- 👍 The script encourages viewers to engage with the content by liking, subscribing, and visiting the channel's webpage for more categorized content.
Q & A
What is the basic model described in the script for signal estimation?
-The basic model described in the script is y = a * x + n, where 'a' is a constant, 'x' is the signal we want to estimate, and 'n' represents additive white Gaussian noise (AWGN).
What are some practical examples where the described model could be applied?
-The model could be applied in scenarios such as measuring temperature in an IoT device, digital signal processing in the presence of noise, and radar tracking where 'x' could represent grid coordinates of a target.
What is the maximum likelihood estimate (MLE) and how is it found?
-The maximum likelihood estimate (MLE) is the value of 'x' that maximizes the probability density function (pdf) of 'y' given 'x'. It is found by evaluating the pdf for different values of 'x' and selecting the one that maximizes the function for the measured value of 'y'.
How does the Gaussian noise affect the pdf of 'y' given a specific 'x'?
-Given a specific 'x', the pdf of 'y' will have a Gaussian shape with a mean value of 'a' times 'x'. The noise is Gaussian, so the distribution of 'y' will be centered around this mean.
What is the formula for the pdf of 'y' given 'x' in the context of Gaussian noise?
-The formula for the pdf of 'y' given 'x' is f(y|x) = 1 / (sqrt(2 * pi * σ^2)) * exp(-(y - a * x)^2 / (2 * σ^2)), where σ is the standard deviation of the Gaussian noise.
How does the maximum likelihood estimation differ from maximum a posteriori estimation?
-Maximum likelihood estimation only considers the likelihood function given the observed data, whereas maximum a posteriori estimation also incorporates prior knowledge about the distribution of 'x', combining it with the likelihood function.
What is the Bayesian formula used to express the maximum a posteriori (MAP) estimate?
-The Bayesian formula used for MAP is arg max_x (f(y|x) * f(x)) / f(y), where f(y|x) is the likelihood function, f(x) is the prior distribution of 'x', and f(y) is the marginal likelihood.
Why does the denominator f(y) in the MAP formula often get ignored during the maximization process?
-The denominator f(y) is a normalizing constant that is independent of 'x', so it does not affect the location of the maximum and can be ignored during the maximization process.
How does the concept of 'maximum a priori' relate to 'maximum a posteriori'?
-Maximum a priori (MAPri) is the estimate of 'x' that maximizes the prior distribution f(x) before any measurements are taken. Maximum a posteriori is similar but includes the likelihood function given the observed data, providing a more informed estimate.
In what type of scenarios would the maximum likelihood estimate be the same as the maximum a posteriori estimate?
-The maximum likelihood estimate would be the same as the maximum a posteriori estimate when the prior distribution of 'x' is uniform, meaning all values of 'x' are equally likely, which is often the case in digital communications.
Outlines
🔍 Introduction to Maximum Likelihood and Maximum a Posteriori Estimation
This paragraph introduces the concepts of Maximum Likelihood Estimation (MLE) and Maximum a Posteriori (MAP) in the context of signal processing and detection. It uses the example of a system where measurements 'y' are taken in the presence of Additive White Gaussian Noise (AWGN) to estimate an unknown variable 'x'. The paragraph explains that MLE involves finding the value of 'x' that maximizes the probability density function (pdf) of 'y' given 'x'. The example of estimating temperature in an IoT device, digital signal in noise, and radar tracking coordinates are given to illustrate the application of these concepts. The explanation also includes the mathematical representation of the Gaussian pdf for the given model.
📈 Visualizing Maximum Likelihood Estimation
The second paragraph delves deeper into the process of Maximum Likelihood Estimation by visualizing the probability density function (pdf) for different values of 'x'. It describes the task of finding the argument that maximizes the pdf when a measurement 'y' is given. The paragraph illustrates this with a series of Gaussian plots, each representing the pdf of 'y' given different hypothetical values of 'x'. The goal is to identify the peak of the Gaussian that aligns with the measured value 'y-bar', which corresponds to the Maximum Likelihood Estimate (MLE) of 'x'. The explanation simplifies the concept by showing how the MLE can be found graphically and mathematically, concluding with the formula for the MLE in the context of the Gaussian example.
📚 Transitioning from Maximum Likelihood to Maximum a Posteriori
This paragraph transitions from discussing Maximum Likelihood Estimation to introducing Maximum a Posteriori estimation. It highlights the difference between the two by pointing out that MLE does not assume any prior knowledge about the distribution of 'x', while MAP incorporates prior information. The paragraph explains the concept of 'a priori', which is the expected distribution of 'x' before any measurements are taken, and contrasts it with 'a posteriori', which considers the updated distribution after measurements. The explanation includes the Bayesian formula for MAP, emphasizing how it combines the likelihood function with the prior distribution of 'x', and discusses the normalization factor 'f of y', which is independent of 'x' and can be ignored in the maximization process.
🌐 Understanding Maximum a Posteriori Estimation with Examples
The final paragraph provides a comprehensive understanding of Maximum a Posteriori estimation by explaining how it differs from Maximum Likelihood Estimation. It describes the process of weighting the likelihood function by the prior distribution of 'x' and then finding the value of 'x' that maximizes this product. The paragraph uses the example of radar tracking to illustrate how prior knowledge about the location of ships in a harbor can influence the MAP estimate. It also clarifies that if the prior distribution is uniform, the MAP estimate is equivalent to the MLE estimate, which is often the case in digital communications where symbols are equally likely. The paragraph concludes by encouraging viewers to engage with the content and explore further resources on the channel's webpage.
Mindmap
Keywords
💡Maximum Likelihood Estimation (MLE)
💡Maximum A Posteriori (MAP)
💡Additive White Gaussian Noise (AWGN)
💡Likelihood Function
💡Bayesian Rules
💡Prior Distribution
💡Posterior Distribution
💡Digital Signal
💡Radar Tracking
💡Temperature Measurement
Highlights
Introduction to the concepts of maximum likelihood and maximum a posteriori estimation in the context of signal processing and detection.
Explanation of a basic system model with a constant 'a' and additive white Gaussian noise (AWGN) 'n'.
Discussion of practical examples where the system model is applicable, such as temperature measurement in IoT devices.
Illustration of how to estimate 'x' from measurements 'y' using the maximum likelihood estimate (MLE).
Definition and explanation of the maximum likelihood function and its role in estimating 'x'.
Clarification on the difference between maximum likelihood and the given value of 'y'.
Derivation of the probability density function (pdf) for the Gaussian example.
Visualization of the Gaussian function for different values of 'x' and the task of finding the argument that maximizes it.
Graphical representation of how to find the maximum likelihood estimate by comparing different Gaussian plots.
Explanation of the maximum a posteriori (MAP) estimation considering prior knowledge about the distribution of 'x'.
Introduction to the concept of maximum a priori estimation before taking any measurements.
Bayesian interpretation of maximum a posteriori estimation using the likelihood function and prior distribution.
Difference between maximum likelihood and maximum a posteriori in terms of incorporating prior knowledge.
Practical implications of maximum likelihood and maximum a posteriori in digital communications where symbols are equally likely.
Encouragement for viewers to engage with the content through likes and subscriptions for more informative videos.
Transcripts
so what
are maximum likelihood and maximum
a posteriori for estimation and
detection and i like to think of an
example
so let's think of y equals a
x plus n so let's think of this system
where a is a constant
and so i'm going to put constant and
let's say n
is additive white gaussian noise
awgn okay so in this
system we we're going to take
measurements y and we're going to try to
work out what x is
so lots of examples of this could be for
example
x could be the temperature i'm just
going to write make a few
little notes here the temperature in an
iot device for example could be
measuring the temperature
that might be one thing that's being
that's going on here
and we might be measuring it with an
amplifier
which is uh then going to have noise in
the amplifier
so maybe one example is measuring
temperature in an iot device
uh what about a digital signal so maybe
it's uh
a digital signal in noise
and uh then x is the digital data
uh a is the gain of the channel and
n is the noise from uh the receiver
electronics another one might be
x might be a two dimensional might be a
vector so it might be grid coordinates
uh grid coordinates uh
in in tracking in in a radar tracking in
radar
so of a target in radar so all of these
different examples
fit into this category so we're going to
measure it y
and we're going to try to work out what
x is okay so let's start with maximum
likelihood
so the maximum likelihood function is an
estimate of x so we're going to use x
hat
for an estimate of x and then we could
we put mle for maximum likelihood
estimate
and now it's going to be a function of y
which is the measurement that we've
measured
and what it equals is the the definition
so the argument
of the maximizing uh term
of x the ma the value of x that
maximizes
this function so this function is f of
y so the pdf of y
given the value x so this is
the equation for the maximum likelihood
estimate this is the definition so the
the argument that maximizes this
function
and and so it means the value of x that
maximizes this function
okay so this is a pdf and we're given
a value of x okay now what does this
mean because this is not particularly
intuitive because
you would think that we're given the
value of y so we're going to take a
measurement y you would think that we're
given
that value okay so turns out in maximum
a posteriori that is what
we're looking at but for maximum
likelihood no we have this function
so i'd like to try to understand that
function a little bit
more okay so for the gaussian let's
for this model here let's think of what
that function is
okay so this pdf of y given x
well if if you were to be given x if you
knew what x is
what would be the distribution of y well
in this case here if you were given a
value of x then the distribution of y
well it would have a gaussian shape
because the noise is gaussian
and it would have a mean value of this
constant a
times x so in this case in the in the
gaussian example
so i'm going to write in gaussian
example here
so that's an important thing this is a
general formula
i'll put a box around it because it's
the general formula
but then i'm going to write down an
example of gaussian which helps us to
to understand it so this is f of y the
pdf of y
given x equals 1 divided by the square
root of 2 pi sigma exponential
of minus y minus
ax all squared divided by 2
sigma squared okay so that is
the pdf that is this function for this
gaussian example
okay so i like to now think of this in
terms of
what the actual plots look like and what
this is asking us to do
so what this is actually saying if we
look at this here we can plot this
function
so let me plot one example of that
function okay so here is
an example of this function we're
plotting it as a function of y
and it's a gaussian because in this
example it's a gaussian
and it has the the peak of the gaussian
is at the value y
equals ax okay so here's
a gaussian drawn at the value
a times x now i'm going to draw it for a
number of different values of x
because we have to find the argument of
the maximum over
all different values of x so what this
says is we have to search
all the values of x look at this
function for each value of x
and find the one which is the maximum
okay so let me do this here let me just
draw this here for x1 so i'm going to
put x1 i'm going to put a circle around
that so this is a plot
for x1 it is a plot of
f of y
given x1 that's what that plot is
okay now i'm then maybe i'll draw it for
some different
so this is x1 here so let me draw it for
some other values as well
okay so here's another one so we can
visualize what we're
what we're having to do in this argument
of this maximization so here's another
one
maybe this is x x 2 gives a times x 2 at
this point
so this is a plot for x 2 and x 2 is a
gaussian shifted so that it's centered
at a
times x 2. and so then there'll be other
ones here and i'm going to draw them
down here to
a general one for example let's say this
is a
times x n so this is for x n here
and this one has a gaussian shape with a
peak
at a times x n and x n is bigger we're
still plotting
with respect to y okay so here
are different versions of this function
this function is a gaussian i haven't
drawn this gaussian here very well but
this is a gaussian function
and when we look at different values of
x it shifts the gaussian across
because this is the example we're
looking at okay so what what's our task
what is our task here well we're given a
value
we're going to measure a value y so
we're measuring y and we'd
like to evaluate this function to find
the value of x that maximizes
this function so here we've drawn that
function for different values of x
okay so this is x1 x2 n
so let's say we're going to be given
so we're going to measure we're going to
make a measurement
so let's say we've measured
the y equals y bar okay so let's say y
equals y-bar okay so where is y-bar well
this is values of y
on these plots so let's say for example
that y-bar
is this value here okay i'm going to
draw it on here y-bar
and we're going to go all the way down
and this is y
bar and this is y bar here okay so let's
say we've
taken our measurement we've measured y
bar and we'd like to find the
maximum likelihood estimate of x maybe
it's the temperature
maybe it's the digital signal from our
noisy measurement or maybe it's grid
coordinates in radar
okay so what are we doing we're looking
at this y-bar and we're going to
evaluate this function
at y equals y-bar so that's this value
here
that value there is f of
y at y-bar given x1
okay this one here this value here that
is
f of y given y-bar for x
2 and this value here
this value here is f of y
for y bar the exact value we've measured
for x
n okay we're going to look for all these
values
this value here this value here this
value here and all the different ones
for all the different values of x
n because that's what we have to do and
we have to find the value that is the
biggest
and i think you can see there's a
there's a plot in here somewhere between
x2 and xn there's a plot where
the peak of the gaussian sits exactly
over
y-bar okay and so one of these plots is
this one
and this plot is because that is the
maximum that the biggest value that
you're going to get
at y bar is going to be that value
and that value is going to correspond to
x
mle and we put a hat for the estimate
okay so this was for x1 the plot this
was the plot for x2
this was the plot for xn this is the
plot when it is the maximum
that is the plot that corresponds to the
maximum likelihood estimate of x
x mle and so this value here
i think you can say is a times what is x
hat
mle okay this is a function of y so
so then once we've measured this then we
can see here
now because this is y bar this is going
to equal
in our example we're looking at x our a
times
x hat mle so the maximum likelihood
estimate for this example here the
maximum likelihood estimate of x
is y bar divided by a
okay so that is what we mean with the
maximum likelihood estimate i think the
important thing
is really to understand what you're
doing you're plotting this function for
different values of x over
all the different values of x and you're
finding the one
where there's a maximum for the value of
y bar that you've measured
if we'd measured a different value of y
bar this a different one of these would
be the maximum i think hopefully you can
see that
okay this is maximum likelihood well
what about this maximum a posteriori
the the one thing we haven't mentioned
is that everything so far
has assumed nothing about the
distribution of x
we didn't assume that x was i mean it
could be for example
if it was the temperature example let's
take that case we know that
it's very unlikely that x is going to be
negative 4
000 because there are no temperatures at
negative 4
000 if you're measuring in celsius or or
fahrenheit or kelvin for that matter
so it can be that sometimes you have
some information about the distribution
of x
for example for grid coordinates in
radar you might know
that objects tend to exist
near let's say it's a maritime radar and
you're you're monitoring ship
movements in a in a port then you'll
know where the ships
tend to go in the port and you'll have
some information that some
areas of the harbour are more likely to
find for the ships to be than
other areas of the harbour so if you
know
something about the distribution of x
then you can do something else
other than just the maximum likelihood
the maximum likelihood did not
assume any knowledge about that so let's
let's say we have this other
situation where we know something about
the distribution of x
and what would we do if we didn't even
take any measurements
at all so if we knew about the
distribution of x this is the
distribution of x
the pdf of x over the values
that x can take so if we were to
not take any measurements and just ask
ourselves where is it likely to find
the value the the if it's the radar
where is it likely to find the ship
then what we could do is we could find
the argument
that maximizes this
function here over the values of x and
this would be
the value of x hat which would be called
the maximum a priori
so this is what we call as latin a
p-r-i-o-r-i
priori means prior to the measurement
so if we didn't make any measurement we
just know about
where we expect to find uh expect values
of x to be
then we can take an argument we can find
the value of x that maximizes the
over the density of x and this would be
what we would call our maximum
a priori estimate of x so this is
leading up to maximum a posteriori so
this is the concept of
a priori is before you've taken the
measurement so
prior to taking the measurement so what
could we do
so what does it mean is a posteriori
well the maximum a posteriori
we use map for that maximum a posteriori
this equals the argument
that maximizes over values of
x this function f of x
of x given y and as i said before this
is the one that's a little bit more
intuitive because this is given the
measurement
now you're going to maximize the pdf of
x conditioned on given the measurement
and so in this case based on bayesian
rules
we know that this can be written as the
so we've got the arg max
over x of
of f of y of
y given x times
f of x
divided by f of y
so this is just bayesian rules so this
is the bayesian formula here
and so in this case uh we've got
the likelihood function you'll recognize
it from up here
so now we're still going to uh we've
still got to look through all the
different values of x
and and look at the maximizing this but
now it's this function times
f of x so times the information that we
have about
x so we're now weighting the likelihood
function
by our prior knowledge of where we
expect to see
the the um the boats if it's a radar
example of boats in a harbor
but we've also got this divided by f y
of y
now what do we know about this well this
is the probability
density function for y and this
you we don't know this just from our
from our measurement we've only measured
one value of y but what we do know is we
can calculate it
and we can calculate it again by using
bayesian formula
it's the integral over all the values of
x of f
y y given x times f x
of x integrated over all those values of
x
and this is because it's integrated over
all the values of
x this is actually independent
of the particular value of x that we're
searching for
so this function here is sometimes a bit
counter-intuitive and it takes you a bit
of time to think through this
but this function here this is a value
which is the
overall probability density function of
y it's not conditioned on x
and it's actually doesn't depend on x
so therefore in this arg max we can
ignore the
denominator and what we've got in our
map
maximum a posteriori is we're doing
exactly the same as maximum likelihood
except we're weighting this value
by f of x and so all we're saying in
this for
i mean how does that relate to these
pictures well all we're doing
is if you took a value of y if you made
a measurement of y bar like we did
before
then instead of just drawing these plots
here and taking these values
and looking for the maximum of these
values
now you have to take these values
multiplied by
f of x so in this case it would be this
value here multiplied by
fx of x1 and in this case it would be
this value
multiplied by fx of x2
in this case fx mult this one multiplied
by fx
of x3 and so this one here i didn't draw
this one
in here this one is f y y bar or
given x hat m l e
uh and so you'd have this one multiplied
by
f of x uh of the x that that corresponds
to this value here
so in these in the case of i wrote it
there with a maximum likelihood estimate
because this was the
value when you didn't have the prior
knowledge but if you do have the prior
knowledge
and you're going to do a map estimate
you need to multiply this by
that prior knowledge and that's really
the difference between
maximum likelihood and maximum a
posteriori
naturally enough if the distribution was
uniform if all values of x were equally
likely
then the maximum likelihood estimate is
exactly the same
as the maximum a posteriori estimate and
that's often the case in digital
communications
so unlike radar where you know where the
targets might be more likely to be than
others and temperature where you know
certain values of temperature are more
likely than others
in digital signals you often have a very
accurate compression coding or source
coding
which means that the symbols are equally
likely
the ones and zeros or the different
constellation points are equally likely
and so for digital communications is
often the case that the maximum
likelihood is the same as the maximum a
posteriori
so if you found this video helpful give
it a thumbs up it helps others to find
the video
subscribe to the channel for more videos
and check out the webpage in the link
below for a full categorised list of all
the videos
on the channel
浏览更多相关视频
Probabilistic view of linear regression
Find a function’s output value when given an input value
Calculus- Lesson 8 | Derivative of a Function | Don't Memorise
Numerical Integration With Trapezoidal and Simpson's Rule
Lecture 10.1 - Binomial distribution - Bernoulli distribution
How to solve one-step equations | Linear equations | Algebra I | Khan Academy
5.0 / 5 (0 votes)