Time Series Talk : Autoregressive Model

ritvikmath
11 Apr 201908:54

Summary

TLDRThis video delves into the Autoregressive (AR) model, a favored method for time series forecasting. It emphasizes the model's strength in predicting future values based on past data, using a milk distributor's monthly demand as an example. The presenter introduces the concept of auto regression, discusses the importance of selecting relevant lags to avoid overfitting, and employs the Partial Autocorrelation Function (PACF) chart to determine the optimal model. The simplicity and intuitive nature of AR models are highlighted, making them accessible for viewers new to time series analysis.

Takeaways

  • 📈 The script introduces the AR (Auto-Regressive) model, a time series forecasting model that predicts future values based on past values of the same variable.
  • 🔍 The importance of using past values for prediction is emphasized, as it's a natural approach to forecasting, considering the inherent patterns in time series data.
  • 🚚 The example of a milk distributor needing to predict monthly milk demand illustrates the practical application of the AR model in a business context.
  • 📊 A visual representation of milk demand over time is suggested, highlighting the cyclical pattern that can be leveraged for prediction.
  • 📝 The notation M_t for current month's demand and M_t-1, M_t-2, etc., for past demands is introduced to formalize the model.
  • ❌ The script warns against overfitting by including too many lags in the model, advocating for a simpler model that captures the essential patterns.
  • 📉 The concept of partial autocorrelation function (PA CF) is introduced as a tool to determine which lags have a significant direct effect on the current demand.
  • 📈 The PA CF chart helps in selecting relevant lags for the model by identifying those with significant correlations outside the confidence bands.
  • 📝 A potential AR model is outlined, including an intercept, coefficients for selected lags, and an error term, based on the PA CF analysis.
  • 🔧 The script suggests that the chosen model should be tested and refined, acknowledging that while the basics are covered, further complexities will be discussed in future videos.
  • 👍 The presenter expresses a preference for the AR model due to its simplicity and intuitive approach to forecasting based on past values.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is time series forecasting, specifically focusing on the Autoregressive (AR) model.

  • What does 'autoregressive' mean in the context of the AR model?

    -In the context of the AR model, 'autoregressive' means that the model predicts future values of a variable based on its own past values.

  • Why is the AR model considered powerful in time series forecasting?

    -The AR model is considered powerful because it leverages the natural pattern of a variable's past values to predict its future values, which can lead to stronger predictions if a pattern emerges.

  • What is the example scenario used in the video to illustrate the AR model?

    -The example scenario is that of a milk distributor who wants to predict the monthly demand for milk to avoid overproduction or undersupply.

  • What is the significance of plotting the quantity of milk demanded over time?

    -Plotting the quantity of milk demanded over time helps to visualize patterns and trends that can be used to make predictions about future demand.

  • What notation is introduced to represent the quantity of milk demanded in the video?

    -The notation introduced is M sub T for the quantity of milk demanded in the current month, and M sub t minus n for the quantity demanded n months ago.

  • Why might including all lags from 1 through 12 in the model be problematic?

    -Including all lags from 1 through 12 might lead to overfitting, where the model is too closely tuned to the specific data and may not generalize well over time.

  • What is the role of the partial autocorrelation function (PACF) in selecting lags for the AR model?

    -The PACF helps in determining which lags have a significant direct correlation with the current period's milk demand, excluding the effects of intermediate periods, thus guiding the selection of important lags for the model.

  • How does the video suggest determining the best AR model for the milk demand forecasting scenario?

    -The video suggests using the PACF plot to identify lags with significant direct correlations and then constructing an AR model that includes those lags.

  • What is the importance of preferring a simpler model when possible in regression modeling?

    -A simpler model is preferred when it can perform as well as a more complex model because it is likely to be more robust and hold up better over time, avoiding issues like overfitting.

  • What does the video suggest as the next steps after constructing the AR model?

    -The video suggests that after constructing the AR model based on the PACF plot, the next steps would involve testing the model and considering other factors that might influence milk demand in future videos.

Outlines

00:00

📈 Introduction to AR Model in Time Series Forecasting

This paragraph introduces the concept of the Autoregressive (AR) model in time series forecasting. The speaker emphasizes the natural inclination to predict future values of a variable based on its past values, using the example of a milk distributor needing to predict monthly milk demand. The paragraph sets the stage for a detailed exploration of the AR model, highlighting its simplicity and effectiveness in capturing patterns in time series data.

05:02

📊 Utilizing Partial Autocorrelation Function (PACF) for Model Selection

The second paragraph delves into the practical application of the Partial Autocorrelation Function (PACF) in selecting the appropriate lags for an AR model. The speaker explains how PACF helps identify the direct correlation of past values on the current value, excluding the effects of intermediate lags. The example of milk demand forecasting continues, illustrating how to determine which lags have a significant impact on current demand. The paragraph concludes with a basic model structure based on the selected lags, showcasing the simplicity and intuitive nature of the AR model for time series prediction.

Mindmap

Keywords

💡Timeseries Forecasting

Timeseries forecasting is a method used to predict future values based on previously observed values. It is central to the video's theme, as the presenter discusses using it to predict the demand for milk. The script uses this concept to explain how one might forecast the quantity of milk needed for the next month based on historical data.

💡Autoregressive Model (AR Model)

The autoregressive model, or AR model, is a statistical approach that uses past values of a variable to predict future values. It is a key concept in the video, where the presenter explains its application in forecasting milk demand. The script illustrates this by discussing how the model can incorporate past milk demand data to make predictions.

💡Regression

Regression is a statistical process to estimate the relationships among variables. In the context of the video, it is used to predict future values of a time series based on its past values. The script emphasizes the autoregressive aspect of regression, highlighting its use in predicting milk demand.

💡Lag

In time series analysis, a lag refers to the time interval between the observation of a variable and its previous observation. The script discusses using lags in the AR model to predict current milk demand based on past demands, such as 'M sub t minus 1' for last month's demand.

💡Overfitting

Overfitting occurs when a statistical model is too complex and captures noise in the data, rather than the underlying distribution. The video warns against overfitting by suggesting not to include all lags in the model but only those that are statistically significant, as per the PA CF chart.

💡Partial Autocorrelation Function (PA CF)

The partial autocorrelation function is a statistical tool used to measure the direct correlation between an observation in a time series and an observation at a prior time, excluding the effects of intermediate observations. The script uses the PA CF chart to determine which lags to include in the AR model for milk demand forecasting.

💡Statistical Significance

Statistical significance refers to the likelihood that a result is not due to random chance. In the video, the presenter uses the PA CF chart to identify lags with statistical significance, indicating a direct and meaningful correlation with current milk demand, which should be included in the model.

💡Intercept

In a regression model, the intercept is the point where the line crosses the y-axis. The script mentions the intercept in the context of the AR model, where it represents the expected value of milk demand when all other variables (lags) are zero.

💡Coefficient

A coefficient in a regression model represents the factor by which the value of an independent variable contributes to the dependent variable. The script discusses coefficients (beta values) associated with each lag in the AR model, which quantify the influence of past milk demands on the current demand.

💡Error Term

The error term in a regression model accounts for the variation in the dependent variable that cannot be explained by the model. The script includes an error term in the AR model to represent the unpredicted fluctuations in milk demand.

💡Model Simplicity

Model simplicity refers to the preference for a model with fewer parameters that performs as well as a more complex model. The video emphasizes the importance of simplicity in the AR model, suggesting that simpler models are more robust and less prone to overfitting, as they are easier to interpret and maintain.

Highlights

Introduction to the AR model (Autoregressive model) for time series forecasting.

Explanation of 'auto regressive' as predicting future values based on past values of the same variable.

The importance of using time series forecasting in various applications such as predicting item prices or quantities.

The natural inclination to predict current values based on historical data.

The potential for patterns to emerge in time series data that can be used for stronger predictions.

The example of a milk distributor needing to predict monthly milk demand.

Visualizing time series data with a plot to identify patterns.

The concept of overfitting and the preference for simpler models in regression analysis.

Introduction of the PACF (Partial Autocorrelation Function) for model selection.

The process of selecting significant lags for the AR model based on PACF values.

The exclusion of lags with no direct correlation from the model to avoid unnecessary complexity.

The use of statistical significance (red bands) to determine which lags to include in the model.

Constructing an AR model with selected lags and coefficients based on PACF analysis.

The simplicity and intuitive nature of the AR model, making it a preferred choice for time series forecasting.

The practical application of the AR model in predicting milk demand based on past data.

The potential for future videos to delve deeper into other factors affecting time series forecasting.

Transcripts

play00:00

in this video we're going to be

play00:01

continuing our exploration into

play00:03

timeseriesforecasting and we'll be

play00:06

talking about one of my all-time

play00:07

favorite models the AR model or the auto

play00:10

regressive model let's just talk about

play00:12

the name for a second before we get into

play00:14

this really easy example auto regressive

play00:17

so that means that it's a regression

play00:18

that you're probably familiar with right

play00:20

you're trying to predict something based

play00:22

on other things but this is a specific

play00:23

type of regression it's an auto

play00:26

regression which means you're trying to

play00:28

predict something based on past values

play00:31

of that same thing and that's a really

play00:34

powerful point that I think doesn't get

play00:35

emphasized enough in

play00:37

timeseriesforecasting videos or courses

play00:40

is that it's very natural to want to

play00:43

predict something maybe it's the price

play00:45

of some kind of item or it's the

play00:47

quantity of something you need or it's

play00:49

the number of houses sold per month

play00:51

whatever it is of course there's a lot

play00:53

of factors going into each thing such as

play00:54

the weather or the stock market or many

play00:58

other different things but what's more

play00:59

natural than saying I want to predict

play01:01

the value of that thing today based on

play01:04

what the value of that thing was

play01:06

yesterday based on what the value of

play01:08

that thing was last week last month last

play01:09

year going back right because that

play01:12

thing's gonna change in maybe some

play01:14

particular way maybe it's not been

play01:16

predictable at all but chances are that

play01:18

there could be some pattern that emerges

play01:20

and if we can capture that pattern we

play01:23

can get a much stronger prediction

play01:24

especially if we incorporate all those

play01:26

more common things that people think of

play01:28

when you do a regression all these other

play01:30

factors okay so I wanted to just give

play01:32

you guys a really really gentle

play01:34

introduction into why auto regression is

play01:36

a very powerful concept now let's get

play01:39

into the example and how you would

play01:41

figure out what is the best auto

play01:43

regressive model for your situation

play01:45

so in this setup you are a milk salesman

play01:48

more particularly you are a distributor

play01:52

of milk you ship milk all over the

play01:54

country and one really big problem for

play01:56

you is month by month you want to know

play01:58

how much milk should I produce so that I

play02:02

can have the exact amount for pretty

play02:04

much the right amount to ship to

play02:06

everyone who needs it I don't want to

play02:08

have too much right because I don't have

play02:10

milk which is going to spoil I don't

play02:12

want to have too little because then I

play02:13

can't

play02:13

fulfill all my orders so you want to

play02:14

know exactly how much milk should I load

play02:16

onto the truck this month so let's say

play02:20

you go ahead and see if you can use

play02:21

timeseriesforecasting or an auto

play02:23

regressive model maybe for this kind of

play02:26

situation so the first thing you do is

play02:28

you go ahead and drop a plot where the

play02:29

y-axis is the quantity of milk that is

play02:32

shipped and the x-axis is time so here

play02:35

we're saying each of these blocks

play02:36

separated by the purple dotted lines are

play02:38

years so here's 2016 2017 and 2018 and

play02:41

you make a chart of how much milk was

play02:44

demanded in each of those months so each

play02:46

of these black dots here's a month maybe

play02:48

let's say and you draw it out you can

play02:51

already see a pretty clear pattern here

play02:53

right as you go into the month into the

play02:55

year the quantity of milk demanded goes

play02:57

up up up and a little bit more halfway

play03:00

past a given year then it dips right and

play03:02

then maybe a plateaus and then at the

play03:04

beginning of the next year it starts all

play03:06

over again up and then down up and then

play03:07

down so this is a very predictable

play03:10

pattern that you can take advantage of

play03:12

to predict exactly how much milk you

play03:13

might need for any given month in the

play03:15

future in 2019 and young now how would

play03:18

we figure that out how would we figure

play03:20

out let me introduce some notation here

play03:22

so we can write a model in just a second

play03:24

let's say M sub T is the quantity of

play03:28

milk that is demanded this month let's

play03:31

say M sub t minus 1 is the quantity of

play03:33

milk that was demanded last month so

play03:36

minus 1 and and T's minus 12 for example

play03:39

is the quantity of milk that was

play03:41

demanded 12 months ago or this time last

play03:43

year okay so this is our notation for

play03:47

quantity of milk demanded of course the

play03:49

thing I'm trying to predict is M sub T

play03:51

because I'm in my current time period

play03:52

and the thing I have available to

play03:55

predict with or all these and so T minus

play03:58

1 minus 2 minus 12 however much I want

play04:00

however much data I actually have right

play04:03

so one naive approach you could say hey

play04:05

why don't I just throw every single lag

play04:07

from 1 through 12 maybe into the model

play04:10

then I'll have a great prediction model

play04:12

right because I'm incorporating all the

play04:13

data that I have well you might get a

play04:16

seemingly strong model but it's gonna be

play04:19

prone to a lot of statistical issues

play04:20

like overfitting which just means that

play04:23

it's too too tuned to your certain data

play04:26

and besides in statistics in regression

play04:29

modeling if a simpler model can do the

play04:31

job or pretty much same job as a very

play04:33

complicated model we're going to prefer

play04:35

that simple model because it's going to

play04:37

hold up better over time so for that

play04:39

reason we want to figure out only which

play04:41

lags only which of these T - what are

play04:45

important for our situation we're going

play04:48

to be using our good friend the PA CF

play04:51

chart or partial autocorrelation

play04:52

function so if you haven't seen my video

play04:55

on autocorrelation and partial

play04:57

autocorrelation go ahead and watch that

play04:59

if you really don't want to watch it

play05:01

then the basics of PA CF are that the PA

play05:05

CF at a given lab so for example PA CF

play05:07

of lag 1 is going to be the direct

play05:10

correlation actually maybe better to say

play05:13

the P AC F of 3 it's going to be the

play05:15

direct correlation of the quantity of

play05:18

note demanded three months ago on the

play05:21

quantity of note today without

play05:22

considering so removing the effects of

play05:25

the intermediary temporaries which are

play05:27

so we're trying to do MT - three direct

play05:30

effect on M sub T that means it removes

play05:33

the effect of M sub T - - price of the

play05:36

quarter Damon up two months ago and M

play05:38

sub t minus one quantity of milk just

play05:40

last night

play05:41

it's the direct effect so it's pretty

play05:44

natural here we only want to keep the

play05:46

lags whose direct effects are high in

play05:49

magnitude either positive or negative

play05:51

if those direct effects are zero or

play05:53

statistically very close to zero we

play05:56

don't want to include those lives

play05:57

because if some certain lab has no

play05:59

direct correlation with our quantity of

play06:02

milk donated today why would we include

play06:04

it it's not important it's just going to

play06:05

make our model noisy and cluttered right

play06:07

so we only want to include the lands

play06:11

whose PA CF are above these red bands

play06:14

and these red bands basically you can

play06:16

think of them as anything within the red

play06:18

bands we don't we think is statistically

play06:20

close to zero anything outside the red

play06:23

bands are statistically different than

play06:25

zero so we have evidence to say that

play06:27

anything else other advanced is actually

play06:29

different from zero so let's just go

play06:31

through our target and see lag one

play06:32

definitely is statistically different

play06:34

than zero in a positive direction lag -

play06:37

statistically different from zero in the

play06:38

negative direction

play06:39

like three does not cut it because it's

play06:42

below the top air band lag for does cut

play06:46

it statistically different from zero in

play06:48

the negative direction and let's say all

play06:50

these lags in between do not cut it but

play06:52

the lag at twelve or one year ago

play06:55

well months ago does cut it and it's

play06:57

very strong okay and let's just say that

play06:59

all the lags after twelve are

play07:01

statistically below zero they don't cut

play07:04

it so we're only concerned with these

play07:06

four that do cut it okay so what might a

play07:10

good model look like a good model might

play07:12

look like

play07:12

of course we first start out with the

play07:14

thing we're trying to predict which is M

play07:16

sub T we have a coefficient here they

play07:21

debate or not the intercept and then we

play07:23

have beta one and of course the first

play07:26

flag is M sub t minus 1 plus beta 2 and

play07:31

sub t minus 2 then 3 didn't cut it so we

play07:34

have 4 plus beta for M sub t minus 4 and

play07:39

then we had one more theta 12 and sub t

play07:43

minus Bob and we need to include that

play07:45

error term so me box this model in a

play07:48

different color

play07:49

purple here so this based on our

play07:53

evidence could be a good model to help

play07:56

us predict the quantity of milk demanded

play07:58

today based on the quantity of milk

play08:00

demanded a month ago two months ago four

play08:02

months ago and 12 months ago okay and we

play08:05

deduced that based on the PA CF plot

play08:07

which again is just measuring the direct

play08:10

correlation the price of milk some

play08:12

number of lives ago along the price I'm

play08:14

sorry quantity of milk some months ago

play08:16

on the quantity of milk today that is

play08:19

the basics of an AR model and the reason

play08:21

I liked it so much is just its

play08:23

simplicity its simplicity starting from

play08:26

the concept of it predicting something

play08:28

based on past values of that thing to

play08:30

figuring out a model based on this p ACF

play08:34

which is very intuitive to think about

play08:36

going from there to actually creating

play08:38

your model and testing okay this was a

play08:41

very gentle introduction to a our models

play08:44

of course there's many other factors

play08:46

going into this but we will save those

play08:48

for in a future video okay so until next

play08:51

time

Rate This

5.0 / 5 (0 votes)

Связанные теги
Time SeriesForecastingAR ModelRegressionPredictive AnalyticsDemand PlanningMilk SalesStatistical ModelingAutocorrelationData Analysis
Вам нужно краткое изложение на английском?