Hidden Markov Model Clearly Explained! Part - 5

Normalized Nerd
26 Dec 202009:32

Summary

TLDRIn this 'Normalized Nerd' video, the presenter delves into Hidden Markov Models (HMMs), a concept derived from Markov chains, which are widely used in fields like bioinformatics and natural language processing. The video offers both intuition and mathematical insights into HMMs, illustrating their workings with a hypothetical scenario involving weather and mood. It explains the model's components, including the transition and emission matrices, and demonstrates how to calculate the probability of a sequence of observed variables. The presenter also introduces the use of Bayes' theorem in determining the most likely sequence of hidden states, providing a clear and engaging explanation of the complex topic.

Takeaways

  • 😀 Hidden Markov Models (HMMs) are an extension of Markov chains and are used in various fields such as bioinformatics, natural language processing, and speech recognition.
  • 🔍 The video aims to explain both the intuition and the mathematics behind HMMs, including the role of Bayes' theorem in the process.
  • 🌤️ The script uses a hypothetical town with three types of weather (rainy, cloudy, sunny) to illustrate the concept of a Markov chain where the weather tomorrow depends only on today's weather.
  • 🤔 It introduces the concept of 'hidden' states by showing that while we can't observe the weather directly, we can infer it from Jack's mood, which is an observed variable dependent on the weather.
  • 📊 The script explains the use of matrices to represent the probabilities of state transitions (transition matrix) and the probabilities of observed variables given the states (emission matrix).
  • 📚 It emphasizes the importance of understanding the basics of Markov chains before diving into HMMs, suggesting viewers watch previous videos for a foundation.
  • 🧩 The video provides a step-by-step example of calculating the joint probability of an observed mood sequence and a hypothetical weather sequence using the Markov property and matrices.
  • 🔑 The concept of the stationary distribution of a Markov chain is introduced as necessary for calculating the probability of the initial state in the HMM.
  • 🔍 The video poses the question of finding the most likely sequence of hidden states given an observed sequence, a common problem in applications of HMMs.
  • 📈 The script explains the formal mathematical approach to solving HMM problems using Bayes' theorem and the joint probability distribution of hidden and observed variables.
  • 📝 The video concludes by encouraging viewers to rewatch if they didn't understand everything, highlighting the complexity of the topic and the elegance of the mathematical solution.

Q & A

  • What is the main topic discussed in the video?

    -The main topic discussed in the video is Hidden Markov Models (HMMs), their concept, intuition, and mathematics, with applications in bioinformatics, natural language processing, and speech recognition.

  • What is the relationship between Hidden Markov Models and Markov Chains?

    -Hidden Markov Models are derived from Markov Chains. They consist of an ordinary Markov Chain and a set of observed variables, where the states of the Markov Chain are unknown or hidden, but some variables dependent on the states can be observed.

  • Why is the concept of 'emission matrix' important in HMMs?

    -The 'emission matrix' in HMMs captures the probabilities corresponding to the observed variables, which depend only on the current state of the Markov Chain. It's essential for understanding the relationship between the hidden states and the observable outcomes.

  • What is the role of Bayes' Theorem in Hidden Markov Models?

    -Bayes' Theorem is used in HMMs to find the most likely sequence of hidden states given a sequence of observed variables. It helps in rewriting the problem of finding the probability of hidden states given observed variables into a more manageable form.

  • How does the video script illustrate the concept of Hidden Markov Models using Jack's hypothetical town?

    -The script uses Jack's town, where the weather (rainy, cloudy, sunny) and Jack's mood (sad, happy) are the variables. The weather is the hidden state, and Jack's mood is the observed variable. The script explains how these variables interact and how HMMs can be used to predict the weather based on Jack's mood.

  • What is the significance of the 'transition matrix' in the context of the video?

    -The 'transition matrix' represents the probabilities of transitioning from one state to another in the Markov Chain. In the video, it shows how the weather changes from one day to the next, which is essential for modeling the sequence of states in HMMs.

  • What is the purpose of the 'stationary distribution' in calculating the probability of the first state in HMMs?

    -The 'stationary distribution' is used to find the probability of the initial state in a Markov Chain. It is necessary because the probability of the first state cannot be directly observed and must be inferred from the long-term behavior of the system.

  • How does the video script explain the computation of the probability of a given scenario in HMMs?

    -The script explains the computation by breaking down the scenario into a product of terms from the emission matrix and transition matrix, and using the stationary distribution for the initial state probability. It provides a step-by-step approach to calculate the joint probability of the observed mood sequence and the weather sequence.

  • What is the most likely weather sequence for a given mood sequence according to the video?

    -The video does not provide the specific sequence but explains the process of finding the most likely weather sequence for a given mood sequence by computing the probability for each possible permutation and selecting the one with the maximum probability.

  • How does the video script help in understanding the formal mathematics behind HMMs?

    -The script introduces symbols to represent the hidden states and observed variables, explains the use of Bayes' Theorem, and simplifies the problem into a form that can be maximized. It provides a clear explanation of the mathematical expressions involved in HMMs, making the formal mathematics more accessible.

Outlines

00:00

🌦️ Introduction to Hidden Markov Models

This paragraph introduces the concept of Hidden Markov Models (HMMs), explaining their derivation from Markov chains and their applications in various fields such as bioinformatics, natural language processing, and speech recognition. The speaker encourages viewers to watch previous videos for a better understanding of Markov chains, which are foundational to HMMs. The video promises to cover both the intuition and mathematical aspects of HMMs, including how Bayes' theorem is applied. The scenario of Jack's town with three types of weather and its effect on Jack's mood is used to illustrate the concept of hidden states and observed variables in HMMs. The transition and emission matrices are introduced as key components in modeling the system.

05:01

🔍 Analyzing a Scenario with Hidden Markov Models

This paragraph delves deeper into the mechanics of HMMs by analyzing a hypothetical scenario where Jack's mood sequence is observed over three days. The task is to determine the most likely weather sequence that corresponds to this mood sequence. The speaker explains the process of calculating the joint probability of the observed mood sequence and the weather sequence, using the transition and emission matrices. The concept of the stationary distribution is introduced to determine the initial probability of the weather states. The paragraph concludes with a teaser about the formal mathematics behind HMMs, hinting at the use of Bayes' theorem and the importance of understanding the joint probability distribution of the hidden states and observed variables.

Mindmap

Keywords

💡Hidden Markov Model

Hidden Markov Models (HMMs) are a statistical tool used to model sequences of data where the actual underlying system state is not directly observable but can be inferred from the observable outcomes. In the video, HMMs are presented as an extension of Markov chains, useful in various fields like bioinformatics and natural language processing. The script uses the scenario of a hypothetical town with weather conditions and a character's mood to illustrate how HMMs can be used to infer hidden states from observable data.

💡Markov Chain

A Markov chain is a mathematical system that undergoes transitions from one state to another according to certain probabilistic rules. The key feature of a Markov chain is that no memory of past states is required, only the present state influences the future state. In the context of the video, the weather in Jack's town is modeled as a Markov chain where the probability of tomorrow's weather depends solely on today's weather.

💡Bayes Theorem

Bayes Theorem is a fundamental principle in probability theory and statistics that describes how to update the probabilities of hypotheses when given evidence. In the video, Bayes Theorem is highlighted for its role in HMMs to calculate the probability of a sequence of hidden states given a sequence of observed states, which is central to the process of inference in HMMs.

💡Observed Variables

In the context of HMMs, observed variables are the outcomes or data points that can be directly measured or seen. In the video script, Jack's mood is the observed variable that provides clues about the hidden state, which is the weather. The mood is used to infer the likely weather conditions despite not being able to observe them directly.

💡Transition Matrix

A transition matrix in a Markov chain is a matrix that describes the probabilities of moving from one state to another. In the video, the transition matrix is used to represent the probabilities of weather changing from one day to the next in Jack's town, which is a key component in calculating the probabilities of different weather sequences.

💡Emission Matrix

The emission matrix in an HMM is used to describe the probabilities of observing certain outcomes given a particular state. In the video, the emission matrix captures the probabilities of Jack being in a certain mood given the current weather, which is crucial for inferring the hidden states from the observed moods.

💡Stationary Distribution

The stationary distribution of a Markov chain is the long-term proportion of time the chain spends in each state, assuming it has reached equilibrium. In the video, the stationary distribution is used to determine the initial state probabilities for the Markov chain, which is necessary for calculating the probability of a sequence of events.

💡Joint Probability

Joint probability refers to the probability of two or more events occurring together. In the video, the joint probability is used to calculate the likelihood of a sequence of observed moods and the corresponding hidden weather states over a series of days.

💡Most Likely Sequence

The most likely sequence in the context of HMMs is the sequence of hidden states that maximizes the joint probability given the observed sequence. The video script discusses finding this sequence by calculating the probabilities of various possible weather sequences and identifying the one with the highest probability.

💡Natural Language Processing

Natural Language Processing (NLP) is a field of computer science that deals with the interaction between computers and human language. In the video, it is mentioned as one of the domains where HMMs are particularly useful, likely for tasks such as speech recognition or part-of-speech tagging, where the sequence of words or tags is modeled.

💡Bioinformatics

Bioinformatics is an interdisciplinary field that uses computational methods to analyze and interpret biological data. The video mentions HMMs as a tool in bioinformatics, likely referring to their use in modeling biological sequences such as DNA, RNA, or proteins, where the states might represent different types of nucleotides or amino acids.

Highlights

Introduction to Hidden Markov Models (HMMs) and their derivation from Markov chains.

HMMs' applications in bioinformatics, natural language processing, and speech recognition.

Explanation of the intuition and mathematics behind Hidden Markov Models.

Importance of understanding Markov chains before diving into HMMs.

Use of a hypothetical town scenario to illustrate the concept of a Markov chain.

Description of the weather and mood states in the hypothetical town as a Markov chain model.

Introduction of the concept of hidden states in HMMs through the weather example.

Differentiation between the transition matrix and the emission matrix in HMMs.

Explanation of how observed variables depend only on the current state, not on previous states.

Calculation of the probability of a given mood sequence using the emission and transition matrices.

Discussion on finding the most likely sequence of hidden states given an observed sequence.

Use of the stationary distribution to determine the probability of the initial state.

Computational approach to finding the most probable sequence using a Python script.

Formal mathematical explanation involving Bayes' theorem in HMMs.

Derivation of the joint probability distribution of observed and hidden states.

Clarification of the process to maximize the probability of a sequence of hidden states given observed data.

Encouragement for viewers to rewatch the video for better understanding.

Call to action for comments, suggestions, and subscriptions to support the channel.

Transcripts

play00:00

hello people from the future welcome to

play00:02

normalized nerd

play00:03

today i'm gonna talk about hidden markov

play00:05

models

play00:06

this is a concept derived from our old

play00:08

friend markov chains

play00:10

and it is very useful in bioinformatics

play00:12

natural language processing

play00:14

speech recognition and in many other

play00:16

domains in this video i will go through

play00:18

both intuition and the mathematics

play00:21

behind it

play00:22

so make sure to watch this video till

play00:24

the end and yes i will also show you

play00:26

how bayes theorem helps us in this

play00:29

process

play00:30

so if you find value in this video

play00:31

please consider subscribing and hit the

play00:33

bell icon

play00:34

that will help me a lot so let's get

play00:36

started well to understand hidden markov

play00:38

models

play00:39

obviously you need to know what a markov

play00:41

chain really is

play00:42

so i would highly recommend you to watch

play00:45

my previous videos

play00:46

before continuing with this one okay

play00:48

let's start our discussion with our

play00:50

friend

play00:51

jack who lives in a hypothetical town

play00:54

there exists only three kinds of weather

play00:56

in the town

play00:58

rainy cloudy and sunny on

play01:01

any given day only one of them will

play01:04

occur

play01:05

and the weather tomorrow depends only on

play01:08

today's weather yes we can model this as

play01:12

a simple markov chain let me add the

play01:15

state transitions

play01:21

and here are the transition

play01:22

probabilities

play01:24

also assume that on any given day

play01:27

jack can have one of the two following

play01:30

moods

play01:31

sad or happy moreover his mood

play01:34

depends on the weather of that

play01:36

particular day

play01:37

i'm gonna represent it with red arrows

play01:41

here are the corresponding probabilities

play01:45

so according to our diagram there's a 90

play01:48

chance that jack is sad given that it's

play01:51

a rainy day

play01:52

hopefully it's clear to you too okay

play01:55

so now our problem is we don't live in

play01:58

jack's town

play01:59

so we can't possibly know what's the

play02:02

weather on a particular day

play02:04

however we can contact jack over the

play02:07

internet and get to know

play02:08

his mood so the states of the markov

play02:11

chain

play02:12

are unknown or hidden from us but

play02:16

we can observe some variables that are

play02:18

dependent on the states

play02:20

and this my friend is called a hidden

play02:23

markov model

play02:24

in other words a hidden markov model

play02:26

consists of

play02:27

an ordinary markov chain and a set of

play02:30

observed variables

play02:32

please note that jack's mood that is the

play02:35

observed variable

play02:37

depends only on today's weather

play02:40

not on the previous day's mood

play02:43

to make things more clear i am going to

play02:45

write the probabilities

play02:46

into a matrix you are already familiar

play02:50

with the green matrix that is the

play02:51

transition matrix

play02:53

and the red one captures the

play02:55

probabilities corresponding to the

play02:56

observed variables

play02:58

which is also known as the emission

play03:00

matrix

play03:01

let me add the indices to make it more

play03:03

interpretable

play03:05

this is the correct time to pause and

play03:07

convince yourself that you have

play03:09

understood everything so far

play03:11

okay let's consider a particular

play03:12

scenario we will look at three

play03:14

consecutive days

play03:16

suppose on the first day it was sunny

play03:18

and jack was happy

play03:20

next day was cloudy and jack was happy

play03:23

and on the third day it was sunny again

play03:26

but

play03:26

jack was sad well i know that

play03:30

we can't observe the hidden states but

play03:33

just assume

play03:33

that this scenario happened trust me

play03:36

analyzing this

play03:37

will help us a lot okay the question i

play03:40

want to ask here is

play03:41

what is the probability of this scenario

play03:43

occurring

play03:44

more precisely what is the joint

play03:46

probability of the observed mood

play03:49

sequence

play03:50

and the wither sequence pause here if

play03:52

you would like to try

play03:54

by using the markov property we can

play03:56

compute this

play03:57

as a product of six terms first

play04:01

the probability of a sunny day then the

play04:04

probability of

play04:05

happy mood given a sunny day then the

play04:08

transition probability from sunny to

play04:10

cloudy

play04:11

and so on

play04:15

but how we got these six terms please

play04:18

bear with me till the end

play04:19

to understand the underlying mathematics

play04:22

we can fill the rate terms

play04:24

from the emission matrix and the green

play04:27

ones

play04:27

from the transition matrix

play04:34

but what about the first term how can we

play04:37

find the probability of the first state

play04:39

yes we need the stationary distribution

play04:42

of the markov chain for this purpose

play04:44

you can use the normalized left

play04:46

eigenvectors or repeated matrix

play04:48

multiplication to compute the stationary

play04:50

distribution

play04:51

i have explained both of them in my

play04:53

previous videos

play04:58

so as you can see the sunny state has

play05:01

the probability of

play05:03

0.549 now we can compute the product

play05:12

and the answer turns out to be 0.00391

play05:18

okay let me hide the states now

play05:22

so we only have a sequence of the

play05:25

observed variable

play05:26

here i'm going to ask a more interesting

play05:28

question

play05:29

what is the most likely weather sequence

play05:32

for this given mood sequence

play05:34

huh there are many possible permutations

play05:37

right

play05:38

we can have rainy cloudy sunny cloudy

play05:41

rainy sunny

play05:43

rainy cloudy rainy and so on

play05:52

to find the most likely sequence we need

play05:55

to compute the probability corresponding

play05:57

to each of them

play05:59

and find the one with the maximum

play06:01

probability

play06:03

we can calculate the probability just

play06:05

like we did in the last case

play06:07

i wrote a python script to do the

play06:08

computations and found that

play06:10

this weather sequence maximizes the

play06:12

joint probability

play06:16

and the probability is 0.04105

play06:21

if you are still watching this i believe

play06:23

you would also like to know

play06:25

the formal mathematics behind this so

play06:27

get ready for some symbols

play06:29

instead of emojis i will represent the

play06:32

hidden states of our markov chain

play06:34

by x and the observed variable by

play06:37

y we can rewrite our problem like this

play06:43

this simply means find that particular

play06:46

sequence of x

play06:47

for which the probability of x given y

play06:50

is maximum please note that in a hidden

play06:54

markov model

play06:55

we observe the sequence of y

play06:58

that's why i have written x given y now

play07:02

if you notice carefully you will see

play07:04

there's no direct way to find this

play07:06

probability

play07:08

here comes the base theorem by the way i

play07:11

have a video explaining bayes theorem

play07:13

you can watch that if you want so by

play07:16

using bayes theorem

play07:17

we can rewrite this as the probability

play07:20

of y

play07:20

given x times the probability of x

play07:24

upon the probability of y for all

play07:26

practical purposes we can neglect the

play07:28

denominator and the

play07:30

numerator is just the joint probability

play07:33

distribution

play07:34

of x and y now we are going to further

play07:37

simplify this

play07:38

let's take the first part the

play07:40

probability of

play07:42

y given x according to our assumption

play07:45

y i depends only on x i

play07:49

so we can write probability of y

play07:52

given x as a product

play07:56

we can fill all these terms from our red

play08:00

matrix

play08:02

for the second term probability of x

play08:06

we must use the markov property that

play08:08

says

play08:09

x i depends only on x i minus 1

play08:14

so we can convert p x into this product

play08:18

now there's one subtlety involved for x

play08:21

0

play08:22

we must use the stationary distribution

play08:24

vector

play08:25

as i already showed you earlier okay so

play08:28

now replace these things into the above

play08:31

expression

play08:32

and we have got the thing that we want

play08:34

to maximize

play08:36

you can clearly see that this expression

play08:38

is just a product

play08:40

of two n terms i hope now you fully

play08:43

understand the origin of

play08:44

those six terms in the example that i

play08:47

computed earlier

play08:49

isn't it super elegant well i guess it

play08:52

is

play08:53

i know it's not the easiest thing so

play08:55

don't worry if you didn't get everything

play08:57

at once

play08:58

you can always replay the video to

play08:59

remove your confusions

play09:01

so that was all for this video guys do

play09:03

comment below your thoughts and

play09:05

suggestions

play09:06

and subscription will be a major help to

play09:08

me

play09:09

goodbye guys stay safe and thanks for

play09:15

[Music]

play09:20

watching

play09:24

[Music]

play09:31

you

Rate This

5.0 / 5 (0 votes)

Related Tags
Hidden Markov ModelsMarkov ChainsBayesian TheoremBioinformaticsNLPSpeech RecognitionWeather ModelingMood AnalysisProbability TheoryMachine LearningStatistical Modeling