Time Series Talk : Autoregressive Model
Summary
TLDRThis video delves into the Autoregressive (AR) model, a favored method for time series forecasting. It emphasizes the model's strength in predicting future values based on past data, using a milk distributor's monthly demand as an example. The presenter introduces the concept of auto regression, discusses the importance of selecting relevant lags to avoid overfitting, and employs the Partial Autocorrelation Function (PACF) chart to determine the optimal model. The simplicity and intuitive nature of AR models are highlighted, making them accessible for viewers new to time series analysis.
Takeaways
- 📈 The script introduces the AR (Auto-Regressive) model, a time series forecasting model that predicts future values based on past values of the same variable.
- 🔍 The importance of using past values for prediction is emphasized, as it's a natural approach to forecasting, considering the inherent patterns in time series data.
- 🚚 The example of a milk distributor needing to predict monthly milk demand illustrates the practical application of the AR model in a business context.
- 📊 A visual representation of milk demand over time is suggested, highlighting the cyclical pattern that can be leveraged for prediction.
- 📝 The notation M_t for current month's demand and M_t-1, M_t-2, etc., for past demands is introduced to formalize the model.
- ❌ The script warns against overfitting by including too many lags in the model, advocating for a simpler model that captures the essential patterns.
- 📉 The concept of partial autocorrelation function (PA CF) is introduced as a tool to determine which lags have a significant direct effect on the current demand.
- 📈 The PA CF chart helps in selecting relevant lags for the model by identifying those with significant correlations outside the confidence bands.
- 📝 A potential AR model is outlined, including an intercept, coefficients for selected lags, and an error term, based on the PA CF analysis.
- 🔧 The script suggests that the chosen model should be tested and refined, acknowledging that while the basics are covered, further complexities will be discussed in future videos.
- 👍 The presenter expresses a preference for the AR model due to its simplicity and intuitive approach to forecasting based on past values.
Q & A
What is the main topic of the video?
-The main topic of the video is time series forecasting, specifically focusing on the Autoregressive (AR) model.
What does 'autoregressive' mean in the context of the AR model?
-In the context of the AR model, 'autoregressive' means that the model predicts future values of a variable based on its own past values.
Why is the AR model considered powerful in time series forecasting?
-The AR model is considered powerful because it leverages the natural pattern of a variable's past values to predict its future values, which can lead to stronger predictions if a pattern emerges.
What is the example scenario used in the video to illustrate the AR model?
-The example scenario is that of a milk distributor who wants to predict the monthly demand for milk to avoid overproduction or undersupply.
What is the significance of plotting the quantity of milk demanded over time?
-Plotting the quantity of milk demanded over time helps to visualize patterns and trends that can be used to make predictions about future demand.
What notation is introduced to represent the quantity of milk demanded in the video?
-The notation introduced is M sub T for the quantity of milk demanded in the current month, and M sub t minus n for the quantity demanded n months ago.
Why might including all lags from 1 through 12 in the model be problematic?
-Including all lags from 1 through 12 might lead to overfitting, where the model is too closely tuned to the specific data and may not generalize well over time.
What is the role of the partial autocorrelation function (PACF) in selecting lags for the AR model?
-The PACF helps in determining which lags have a significant direct correlation with the current period's milk demand, excluding the effects of intermediate periods, thus guiding the selection of important lags for the model.
How does the video suggest determining the best AR model for the milk demand forecasting scenario?
-The video suggests using the PACF plot to identify lags with significant direct correlations and then constructing an AR model that includes those lags.
What is the importance of preferring a simpler model when possible in regression modeling?
-A simpler model is preferred when it can perform as well as a more complex model because it is likely to be more robust and hold up better over time, avoiding issues like overfitting.
What does the video suggest as the next steps after constructing the AR model?
-The video suggests that after constructing the AR model based on the PACF plot, the next steps would involve testing the model and considering other factors that might influence milk demand in future videos.
Outlines
📈 Introduction to AR Model in Time Series Forecasting
This paragraph introduces the concept of the Autoregressive (AR) model in time series forecasting. The speaker emphasizes the natural inclination to predict future values of a variable based on its past values, using the example of a milk distributor needing to predict monthly milk demand. The paragraph sets the stage for a detailed exploration of the AR model, highlighting its simplicity and effectiveness in capturing patterns in time series data.
📊 Utilizing Partial Autocorrelation Function (PACF) for Model Selection
The second paragraph delves into the practical application of the Partial Autocorrelation Function (PACF) in selecting the appropriate lags for an AR model. The speaker explains how PACF helps identify the direct correlation of past values on the current value, excluding the effects of intermediate lags. The example of milk demand forecasting continues, illustrating how to determine which lags have a significant impact on current demand. The paragraph concludes with a basic model structure based on the selected lags, showcasing the simplicity and intuitive nature of the AR model for time series prediction.
Mindmap
Keywords
💡Timeseries Forecasting
💡Autoregressive Model (AR Model)
💡Regression
💡Lag
💡Overfitting
💡Partial Autocorrelation Function (PA CF)
💡Statistical Significance
💡Intercept
💡Coefficient
💡Error Term
💡Model Simplicity
Highlights
Introduction to the AR model (Autoregressive model) for time series forecasting.
Explanation of 'auto regressive' as predicting future values based on past values of the same variable.
The importance of using time series forecasting in various applications such as predicting item prices or quantities.
The natural inclination to predict current values based on historical data.
The potential for patterns to emerge in time series data that can be used for stronger predictions.
The example of a milk distributor needing to predict monthly milk demand.
Visualizing time series data with a plot to identify patterns.
The concept of overfitting and the preference for simpler models in regression analysis.
Introduction of the PACF (Partial Autocorrelation Function) for model selection.
The process of selecting significant lags for the AR model based on PACF values.
The exclusion of lags with no direct correlation from the model to avoid unnecessary complexity.
The use of statistical significance (red bands) to determine which lags to include in the model.
Constructing an AR model with selected lags and coefficients based on PACF analysis.
The simplicity and intuitive nature of the AR model, making it a preferred choice for time series forecasting.
The practical application of the AR model in predicting milk demand based on past data.
The potential for future videos to delve deeper into other factors affecting time series forecasting.
Transcripts
in this video we're going to be
continuing our exploration into
timeseriesforecasting and we'll be
talking about one of my all-time
favorite models the AR model or the auto
regressive model let's just talk about
the name for a second before we get into
this really easy example auto regressive
so that means that it's a regression
that you're probably familiar with right
you're trying to predict something based
on other things but this is a specific
type of regression it's an auto
regression which means you're trying to
predict something based on past values
of that same thing and that's a really
powerful point that I think doesn't get
emphasized enough in
timeseriesforecasting videos or courses
is that it's very natural to want to
predict something maybe it's the price
of some kind of item or it's the
quantity of something you need or it's
the number of houses sold per month
whatever it is of course there's a lot
of factors going into each thing such as
the weather or the stock market or many
other different things but what's more
natural than saying I want to predict
the value of that thing today based on
what the value of that thing was
yesterday based on what the value of
that thing was last week last month last
year going back right because that
thing's gonna change in maybe some
particular way maybe it's not been
predictable at all but chances are that
there could be some pattern that emerges
and if we can capture that pattern we
can get a much stronger prediction
especially if we incorporate all those
more common things that people think of
when you do a regression all these other
factors okay so I wanted to just give
you guys a really really gentle
introduction into why auto regression is
a very powerful concept now let's get
into the example and how you would
figure out what is the best auto
regressive model for your situation
so in this setup you are a milk salesman
more particularly you are a distributor
of milk you ship milk all over the
country and one really big problem for
you is month by month you want to know
how much milk should I produce so that I
can have the exact amount for pretty
much the right amount to ship to
everyone who needs it I don't want to
have too much right because I don't have
milk which is going to spoil I don't
want to have too little because then I
can't
fulfill all my orders so you want to
know exactly how much milk should I load
onto the truck this month so let's say
you go ahead and see if you can use
timeseriesforecasting or an auto
regressive model maybe for this kind of
situation so the first thing you do is
you go ahead and drop a plot where the
y-axis is the quantity of milk that is
shipped and the x-axis is time so here
we're saying each of these blocks
separated by the purple dotted lines are
years so here's 2016 2017 and 2018 and
you make a chart of how much milk was
demanded in each of those months so each
of these black dots here's a month maybe
let's say and you draw it out you can
already see a pretty clear pattern here
right as you go into the month into the
year the quantity of milk demanded goes
up up up and a little bit more halfway
past a given year then it dips right and
then maybe a plateaus and then at the
beginning of the next year it starts all
over again up and then down up and then
down so this is a very predictable
pattern that you can take advantage of
to predict exactly how much milk you
might need for any given month in the
future in 2019 and young now how would
we figure that out how would we figure
out let me introduce some notation here
so we can write a model in just a second
let's say M sub T is the quantity of
milk that is demanded this month let's
say M sub t minus 1 is the quantity of
milk that was demanded last month so
minus 1 and and T's minus 12 for example
is the quantity of milk that was
demanded 12 months ago or this time last
year okay so this is our notation for
quantity of milk demanded of course the
thing I'm trying to predict is M sub T
because I'm in my current time period
and the thing I have available to
predict with or all these and so T minus
1 minus 2 minus 12 however much I want
however much data I actually have right
so one naive approach you could say hey
why don't I just throw every single lag
from 1 through 12 maybe into the model
then I'll have a great prediction model
right because I'm incorporating all the
data that I have well you might get a
seemingly strong model but it's gonna be
prone to a lot of statistical issues
like overfitting which just means that
it's too too tuned to your certain data
and besides in statistics in regression
modeling if a simpler model can do the
job or pretty much same job as a very
complicated model we're going to prefer
that simple model because it's going to
hold up better over time so for that
reason we want to figure out only which
lags only which of these T - what are
important for our situation we're going
to be using our good friend the PA CF
chart or partial autocorrelation
function so if you haven't seen my video
on autocorrelation and partial
autocorrelation go ahead and watch that
if you really don't want to watch it
then the basics of PA CF are that the PA
CF at a given lab so for example PA CF
of lag 1 is going to be the direct
correlation actually maybe better to say
the P AC F of 3 it's going to be the
direct correlation of the quantity of
note demanded three months ago on the
quantity of note today without
considering so removing the effects of
the intermediary temporaries which are
so we're trying to do MT - three direct
effect on M sub T that means it removes
the effect of M sub T - - price of the
quarter Damon up two months ago and M
sub t minus one quantity of milk just
last night
it's the direct effect so it's pretty
natural here we only want to keep the
lags whose direct effects are high in
magnitude either positive or negative
if those direct effects are zero or
statistically very close to zero we
don't want to include those lives
because if some certain lab has no
direct correlation with our quantity of
milk donated today why would we include
it it's not important it's just going to
make our model noisy and cluttered right
so we only want to include the lands
whose PA CF are above these red bands
and these red bands basically you can
think of them as anything within the red
bands we don't we think is statistically
close to zero anything outside the red
bands are statistically different than
zero so we have evidence to say that
anything else other advanced is actually
different from zero so let's just go
through our target and see lag one
definitely is statistically different
than zero in a positive direction lag -
statistically different from zero in the
negative direction
like three does not cut it because it's
below the top air band lag for does cut
it statistically different from zero in
the negative direction and let's say all
these lags in between do not cut it but
the lag at twelve or one year ago
well months ago does cut it and it's
very strong okay and let's just say that
all the lags after twelve are
statistically below zero they don't cut
it so we're only concerned with these
four that do cut it okay so what might a
good model look like a good model might
look like
of course we first start out with the
thing we're trying to predict which is M
sub T we have a coefficient here they
debate or not the intercept and then we
have beta one and of course the first
flag is M sub t minus 1 plus beta 2 and
sub t minus 2 then 3 didn't cut it so we
have 4 plus beta for M sub t minus 4 and
then we had one more theta 12 and sub t
minus Bob and we need to include that
error term so me box this model in a
different color
purple here so this based on our
evidence could be a good model to help
us predict the quantity of milk demanded
today based on the quantity of milk
demanded a month ago two months ago four
months ago and 12 months ago okay and we
deduced that based on the PA CF plot
which again is just measuring the direct
correlation the price of milk some
number of lives ago along the price I'm
sorry quantity of milk some months ago
on the quantity of milk today that is
the basics of an AR model and the reason
I liked it so much is just its
simplicity its simplicity starting from
the concept of it predicting something
based on past values of that thing to
figuring out a model based on this p ACF
which is very intuitive to think about
going from there to actually creating
your model and testing okay this was a
very gentle introduction to a our models
of course there's many other factors
going into this but we will save those
for in a future video okay so until next
time
浏览更多相关视频
Autoregressive Models | Auto Regression | Machine Learning for Beginners | Edureka
Smoothing 4: Simple exponential smoothing (SES)
Week 1 Lecture 2 - Supervised Learning
Machine Learning Tutorial Python - 8: Logistic Regression (Binary Classification)
Logit model explained: regression with binary variables (Excel)
Indoor Navigation in AR with Unity and Revit
5.0 / 5 (0 votes)