Probability Distributions

IIT Madras - B.S. Degree Programme
23 Dec 202122:23

Summary

TLDRThis business analytics lecture focuses on probability distributions, emphasizing fitting distributions to data. It explains discrete and continuous distributions, contrasting probability mass with density functions. The session explores three data analysis approaches: trace-driven simulation, theoretical distribution fitting, and empirical distribution creation. The advantages of theoretical distributions over empirical ones are discussed, noting the limitations of relying solely on observed data for predictive modeling.

Takeaways

  • 📊 Probability distributions are statistical models that show possible outcomes for a given event or action.
  • 📈 For discrete variables, distributions are represented by possible values with corresponding probabilities; for continuous variables, by a density function.
  • 📚 The focus of the session is on fitting distributions to data rather than just describing them.
  • 📉 Trace driven simulation uses actual collected data directly in simulations without fitting a theoretical distribution first.
  • 🔍 Fitting a theoretical distribution involves checking how well it represents the data, such as normal or uniform distributions.
  • 🛠 If theoretical distributions do not fit well, empirical distributions can be created from the collected data itself.
  • 🔑 Empirical distributions are built from the data collected and are not an attempt to fit a pre-existing model to the data.
  • 📝 Building an empirical distribution involves arranging data in ascending order and defining a distribution function from rank order statistics.
  • 📊 For grouped data, a piecewise linear function can represent the distribution function, estimating the proportion of observations in each interval.
  • 🔑 The building blocks of any distribution include density functions, distribution functions, and moments around the mean.
  • 🔬 Empirical distributions are useful when no theoretical distribution fits the data well, but they are limited by the range of the collected data.

Q & A

  • What is the primary focus of the second session of the business analytics course?

    -The primary focus of the second session is to discuss probability distributions, specifically how to fit a distribution to a given set of data.

  • What are the two main types of probability distributions discussed in the script?

    -The two main types of probability distributions discussed are discrete and continuous distributions.

  • How are discrete random variables represented in a probability distribution?

    -For discrete random variables, the probability distribution is represented by all possible values of the random variable along with the corresponding probabilities for each value.

  • What is the difference between the representation of a discrete and a continuous probability distribution?

    -A discrete probability distribution is represented by probability masses, while a continuous distribution is represented by a density function, where the y-axis represents the probability density instead of the probability itself.

  • What is the significance of the normal distribution in the context of grades of a course?

    -The normal distribution signifies that grades are expected to follow a bell-shaped curve, with a few very high and very low marks, and a majority of students scoring in the middle range.

  • What is meant by 'trace driven simulation' in the context of using business data?

    -Trace driven simulation refers to the direct use of collected data in simulations without fitting a theoretical distribution to the data first. It involves using the actual data points, such as monthly sales volumes, directly in the analysis.

  • What is a 'theoretical distribution' and how does it differ from an empirical distribution?

    -A theoretical distribution is a pre-defined statistical distribution, such as the normal, uniform, binomial, Poisson, or exponential distribution. It differs from an empirical distribution, which is built from the actual data collected, rather than being a pre-defined model.

  • Why might one choose to create an empirical distribution instead of using a theoretical one?

    -One might choose to create an empirical distribution if the collected data does not fit well with any of the available theoretical distributions, allowing for a custom distribution that better represents the data.

  • What are the building blocks needed to characterize a normal distribution?

    -The building blocks needed to characterize a normal distribution include the density function and distribution function, from which parameters like mean, standard deviation, and moments around the mean can be estimated.

  • How can one build an empirical distribution from ungrouped data?

    -To build an empirical distribution from ungrouped data, one can arrange the data in ascending order, calculate rank order statistics, and then define a distribution function based on these ordered values.

  • What are the limitations of using empirical distributions compared to theoretical distributions?

    -Empirical distributions are limited by the range of data used to create them and may not accurately represent values outside of this range. They can also be biased towards the pattern of the collected data and are not as versatile as theoretical distributions for generating new values for simulations.

Outlines

00:00

📊 Introduction to Probability Distributions and Data Fitting

This paragraph introduces the concept of probability distributions as statistical models that represent possible outcomes of events. It distinguishes between discrete and continuous distributions, with the former using probability mass functions and the latter using density functions. The speaker emphasizes the importance of fitting distributions to given data, such as grades following a normal distribution or sales being uniformly distributed, and sets the stage for discussing how to apply this in business analytics.

05:07

📈 Methods of Utilizing Business Data in Analysis

The second paragraph delves into how business data can be used in simulations, either through trace-driven simulation, where actual data points are directly used, or by fitting theoretical distributions like normal, uniform, binomial, Poisson, or exponential to the data. The paragraph also introduces the concept of empirical distributions, which are created when theoretical distributions do not fit the data well, and discusses the process of building these distributions from collected data.

10:09

📊 Building Empirical Distributions from Data

This section explains how to construct empirical distributions from ungrouped and grouped data. For ungrouped data, the process involves arranging data in ascending order and creating a distribution function based on rank order statistics. For grouped data, a piecewise linear function can be defined using the intervals and the count of data points within each interval. The paragraph highlights the limitations of empirical distributions, such as being restricted by the range of collected data and the potential for bias.

15:09

🔱 Comparing Data Utilization Approaches and Theoretical Distribution Fitting

The fourth paragraph compares the three approaches to using data: trace-driven simulation, fitting theoretical distributions, and creating empirical distributions. It discusses the use of the first approach for model validation and the preference for theoretical distributions over empirical ones due to their flexibility and lack of bias towards collected data. The speaker also touches on the challenges of empirical distributions, such as their inability to simulate values outside the observed range.

20:14

📉 The Limitations and Considerations of Empirical Distributions

The final paragraph focuses on the limitations of empirical distributions, such as their potential bias towards the pattern of collected data and the restriction to the range of observed values. It also discusses the importance of considering theoretical distributions when appropriate, especially in fields like reliability engineering where specific distributions, such as the Weibull distribution, are commonly used and tested for fit.

Mindmap

Keywords

💡Probability Distribution

A probability distribution is a statistical model that describes the likelihood of various outcomes in an event or experiment. In the video, it is described as showing the possible outcomes and their corresponding probabilities for discrete or continuous random variables. Examples include normal and uniform distributions.

💡Discrete Random Variable

A discrete random variable is one that has specific, countable values. The video explains that for such variables, the probability distribution shows all possible values and their associated probabilities. For instance, the probability of a random variable taking on values like 1, 2, etc.

💡Continuous Random Variable

A continuous random variable can take on any value within a given range. Unlike discrete random variables, their probability distributions are represented by density functions. The video uses sales data and grades as examples, illustrating how continuous variables are visualized with a density function.

💡Density Function

A density function represents the probability density of a continuous random variable. In the video, it is explained that while we can't assign probabilities to specific values of continuous variables, we can discuss their density over an interval. Examples include normal and uniform distribution graphs.

💡Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric around the mean, showing that data near the mean are more frequent in occurrence. The video uses the example of grades to explain how this distribution assumes a bell-shaped curve.

💡Uniform Distribution

A uniform distribution is one where all outcomes are equally likely within a given range. In the video, it is exemplified with sales data, suggesting that sales next month could be uniformly distributed between $100,000 and $200,000, indicating equal probability for any sales value within this range.

💡Trace Driven Simulation

Trace driven simulation uses actual collected data directly in simulations without fitting it to a theoretical distribution. The video mentions using 36 months of sales data directly in analysis as an example of this approach.

💡Theoretical Distribution

Theoretical distributions are standard statistical models like normal, binomial, or exponential distributions used to fit data. The video discusses fitting these distributions to collected data and evaluating the goodness of fit to ensure the model is appropriate.

💡Empirical Distribution

An empirical distribution is built from actual data rather than fitting it to a theoretical model. The video explains how sales data can be used to create an empirical distribution when theoretical distributions do not fit well, using observed values to define the distribution.

💡Goodness of Fit

Goodness of fit measures how well a theoretical distribution matches observed data. In the video, it is highlighted as a critical step in determining whether the chosen theoretical distribution is appropriate for the data.

Highlights

The session focuses on fitting probability distributions to given data, a crucial aspect of business analytics.

Probability distributions are statistical models representing possible outcomes of events or actions.

Discrete random variables are associated with probabilities for each possible value, while continuous variables are represented by a density function.

The difference between discrete and continuous distributions lies in the representation of probability versus density.

Normal distribution is often assumed for grades in academic settings, indicating a well-shaped curve with many average scores and fewer extremes.

Uniform distribution is used in business settings, such as sales predictions, where all values within a range are equally likely.

Trace driven simulation uses actual collected data directly in simulations without fitting any distributions.

Fitting a theoretical distribution involves selecting a known distribution like normal, uniform, or Poisson and checking its fit to the data.

Empirical distributions are created when theoretical distributions do not fit the data well, using the actual data to build a distribution.

Building an empirical distribution involves arranging data in ascending order and defining a distribution function from the data.

For grouped data, a piecewise linear function can be created to represent the distribution function of an empirical distribution.

Empirical distributions are built from collected data and are not fitted to the data but rather constructed directly from it.

The building blocks of any distribution include density functions, distribution functions, and moments around the mean.

Trace driven simulation is mainly used to validate existing models by comparing model outputs with actual outcomes.

Theoretical distributions are generally preferred over empirical ones due to their flexibility and lack of bias towards collected data patterns.

Empirical distributions may be limited by the range of the data used to build them, potentially restricting the simulation of values outside observed ranges.

There are compelling reasons to use specific theoretical distributions, such as the Weibull distribution in reliability engineering.

Transcripts

play00:13

Hi this is the second session of the business analytics course and we are going to discuss

play00:20

probability distributions.

play00:22

Most importantly we are going to discuss how are we going to fit a distribution to a given

play00:28

data.

play00:29

So, first of all let us do a recap, what are probability distributions?

play00:32

We already have discussed it in other courses but what do we think what do we recall as

play00:38

probability distributions.

play00:41

So, essentially probability distributions are some kind of a statistical model that

play00:48

shows possible outcomes of a particular event or course of action that event may take.

play00:57

So, essentially probability distributions for a discrete random variable may look like

play01:03

all the possible values of the random variable along with the corresponding probabilities

play01:07

that the random variable will take on that particular value.

play01:10

And for a continuous distributions we generally represent that by a density function.

play01:16

For example if you recall we may have said that the x axis represents the values of the

play01:23

random variable the y axis represents the probability and for a discrete random variable

play01:28

we will say that what is the probability that x takes on a value equal to one and then we

play01:32

would have said some probability what is the probability that x takes on a particular value

play01:36

2 and we would have said some probability.

play01:38

So, this is how the probability distribution looks like for a discrete random variable.

play01:44

Now for a continuous random variable we still have the same format which essentially means

play01:52

that x axis still represents the value of the random variable y axis represents some

play01:58

form of probability but we do not say probability we usually if you recall we say density function.

play02:04

Density function and then we would have drawn something like this for potential values of

play02:10

the random variable x.

play02:13

What is the difference here in the earlier diagram we had discrete probability masses

play02:18

because the random variable was discrete.

play02:21

Here we have continuous values of the random variable and therefore we can't really say

play02:25

that there is a probability mass sitting at a particular point.

play02:28

For example let us say that we are still talking about x taking on a value equal to 2.

play02:33

We cannot say that this is the probability, this is only the probability density.

play02:38

So, we only talk about density and for density we need a small interval to actually define

play02:44

some probability.

play02:45

So, you recall all of that.

play02:47

So, the focus of this session is not to re-describe density functions and probability distributions.

play02:55

The focus of this session is to go one step beyond and say that well I have data now and

play03:01

how do I fit some distributions to data or what do I do with that data.

play03:06

So, for example let us say that we in academic settings we hear this quite a lot.

play03:15

So, grades of a course follow a normal distribution what do I mean by that.

play03:19

So, what do I mean by that essentially grades.

play03:22

So, a random variable is grades here.

play03:26

So, the grades out of 100.

play03:27

Let us say so, random variable is grades and then it follows a normal distribution which

play03:33

essentially means that we are going to follow assume that this is a nice well-shaped curve

play03:39

and then some people are going to get a very high mark some people are going to get very

play03:43

low marks unfortunately and there are whole bunch of people who are going to be in between.

play03:47

So, that is what we mean by normal distribution once again the y axis represents the density.

play03:54

So, this is just a recall or sometimes in the business settings we may say something

play04:02

like this: sales next month are expected to be uniformly distributed.

play04:07

So, what do we mean by that.

play04:09

So, I may say that sales can be as low as a hundred thousand dollars, sales can be as

play04:16

high as two hundred thousand dollars this is sales next month, sales in the next month

play04:21

So, it can be hundred thousand dollars or it can be two hundred thousand dollars but

play04:26

instead of assuming a normal distribution.

play04:28

So, on x axis is sales here is your 100000, here is your 200000 and we are saying that

play04:38

it is uniformly distributed.

play04:41

So, you know what uniform distribution is once again y axis represents the density.

play04:46

So, these are essentially probability distributions normal distribution uniform distribution.

play04:52

We have taken two examples of continuous distribution but you get the idea.

play04:59

So, that's how we define probability distributions that's how we use probability distributions.

play05:06

So, now how are we going to go about using data.

play05:14

So, let us say that I have business data that I have collected, the business data may be

play05:19

about sales volumes the business data may be about the defaulters on loans or the business

play05:26

data may be the salary hikes that the employees got in a particular year.

play05:31

It may be about any business context for this kind of a data we can directly use the data

play05:40

and use it in our simulations there is no need to fit any distributions.

play05:44

This is typically called trace driven simulation.

play05:47

So, let us say that we have collected sales volume over a period of time.

play05:50

Let us say we have a monthly sales volume for the last three years which essentially

play05:55

means that I have 36 values in my data-set.

play05:57

So, instead of first fitting a distribution to the 36 values and then using the distribution

play06:06

in my further analysis I can directly use these 36 values in my analysis.

play06:11

So, if I want to simulate I will simulate directly using these 36 values, this is generally

play06:17

called trace driven simulation.

play06:21

The second method is to actually fit a theoretical distribution.

play06:25

What do you mean by theoretical distribution, theoretical distribution is all these things

play06:28

that we spoke about earlier normal distribution, uniform distribution, binomial distribution

play06:33

for discrete, Poisson distribution for discrete, exponential distribution for continuous these

play06:39

are all theoretical distributions.

play06:41

So, what we may do is for the sales volume data that we may have, sales volume data that

play06:47

I may have i may try to fit, quote unquote fit a distribution to my data.

play06:55

And obviously I cannot simply say OK normal distribution fits very well I have to go beyond

play07:00

that and I have to actually check whether the fit that I have assumed is actually good.

play07:04

And I am using these terms in a very deliberate way because these are precisely the technical

play07:10

terms which are going to be helpful later on.

play07:12

So, we always are going to say we are going to fit a distribution we are going to check

play07:17

how good is this fitment.

play07:20

Now let us say that our business data that we have collected is a particularly tricky

play07:25

data set and it does not fit very well with lot of theoretical distributions or the other

play07:33

way around, theoretical, most of the theoretical distributions do not fit to our data.

play07:37

What are we going to do?

play07:39

Well it is not the end of the world instead of trying to fit already available distributions

play07:45

like a negative binomial or a double exponential.

play07:49

Instead of fitting those kind of already available distributions to the data what you can do

play07:54

is we can actually create our own distributions.

play07:57

I mean this is like making rules as we go along typically Kelvin category but we create

play08:06

our own distributions and those distributions are called empirical distributions.

play08:11

So, the sales volume data that I already spoke about using that data we say that well what

play08:18

would be the distribution where these 36 values could have come from.

play08:22

So, using these 36 values we build our own empirical distribution and use that distribution

play08:27

in our future analysis.

play08:30

Now what are these empirical distributions have you discussed empirical distributions

play08:35

in your earlier courses most probably you have.

play08:38

So, let us quickly recall that.

play08:41

So, what are these empirical distributions?

play08:43

Empirical distributions are essentially distributions built from the data that we already have collected.

play08:49

We are not fitting a distribution to the data we are actually building a distribution from

play08:54

the data that we have collected please notice the difference.

play08:58

So, let us go beyond.

play09:01

So, how does one build a distribution?

play09:03

First of all what are the building blocks when we say we are building a distribution.

play09:07

How do we build a distribution for example normal distribution let us take simplest,

play09:13

normal.

play09:14

If we were to say that I want to characterize a normal distribution what would we need to

play09:18

characterize a normal distribution well we will need the building blocks.

play09:22

So, what are these building blocks?

play09:23

So, essential building blocks of any distribution are the density functions, the distribution

play09:30

functions, and we may also want to define some moments, the first moment around the

play09:36

mean, the second moment around the mean which can be built using density also.

play09:41

So, we have to estimate these parameters.

play09:43

So, essentially defining a distribution means identifying a density function or a distribution

play09:50

function from the density function you can identify the building blocks like moments

play09:55

around the mean.

play09:57

Mean standard deviation and so on.

play09:59

Let us take an example of how to build a empirical distribution.

play10:03

So, let us say the data is ungrouped.

play10:05

So, let us say that we have collected X1 X2 X3 values.

play10:09

So, the X1 value, X2 value, X3 value and let us say all the way to X36.

play10:16

These are our 36 sales volume data for 36 months in our data set.

play10:23

Now what we are going to do is we are going to arrange them in a ascending order.

play10:27

So, X1 value was the first value that was recorded which was the first month but what

play10:33

we are going to do now is we are going to arrange it in a ascending order where the

play10:37

smallest value is called X bracket 1, OK X bracket 1, second smallest value is called

play10:43

X bracket 2 and the largest value is called X bracket n in our case X bracket 36, may

play10:50

not be the sales volume in the 36th month it is actually the maximum possible sales

play10:58

volume that we have found in our data set.

play11:02

So, these are called rank order statistics let us not worry about rank order statistics.

play11:11

So, once we have arranged the data in an ascending order you can actually define a distribution

play11:16

function in this way.

play11:18

This is not our own creation.

play11:20

These are, these definitions are usually available in any standard statistics textbook, all right!

play11:27

So, this is one way.

play11:28

I mean by no means we are saying that this is the only way of defining a distribution

play11:32

function.

play11:33

Now once we get a distribution function we all know how to get a density function and

play11:37

from density function we know how to get moments around the mean.

play11:41

This is for ungroup data.

play11:43

So, this is for ungroup data.

play11:47

Now if the data were grouped meaning that I only know that in this interval I have ten

play11:53

values in the other interval I have some eight values in some other interval I have some

play11:58

five values if I have group data.

play12:02

So, let us say that intervals we define k intervals.

play12:07

So, we have intervals k such intervals and I know that in each interval I have some n1,

play12:13

n2, n3 values.

play12:14

So, in the first interval I have n1 values in the second interval I have n2 values third

play12:19

interval I have n3 values kth interval I have nk values and that gives me my total sample

play12:24

size of n.

play12:25

So, what we can do is we can create a piece wise linear function G using this definition

play12:34

where each G of aj is essentially proportion of the samples, proportion of the observations

play12:45

up to that point up to that interval.

play12:48

So, once again a very non unique way of defining a distribution function.

play12:53

Once again notice that this is a distribution function why do I know that this is a distribution

play12:56

function because the value lesser than the smallest value is 0 and the value beyond the

play13:03

highest value is 1 which is a typical definition of a distribution function which goes from

play13:09

zero to one.

play13:12

And once again our usual methods are going to kick in where we have a distribution function

play13:18

from there we get the density function and so on.

play13:21

So, these are examples of how we can build empirical distribution.

play13:27

Let us go back why are we saying that why did we build these empirical distributions

play13:34

in the first place.

play13:36

We are saying that we have data we have collected data that data may be for any context it may

play13:41

be sales for our marketing data it may be financial analysis data it may be stock price

play13:47

data.

play13:48

So, let us say that a technical analyst wants to analyze wants to invest in the stock market.

play13:55

Now what are technical analysts well figure out why do not you search for it and then

play14:01

we will describe it in the next sessions.

play14:03

So, technical analysts let us say that they want to invest and for their investment decisions

play14:09

they have collected stock prices for the last three months.

play14:16

Let us say that I have actually tick level data, tick level data means I get data not

play14:22

every hour of a trading day, I may get data every minute or every second.

play14:28

So, I have huge data sets I mean that data set will be huge.

play14:32

Now I want to decide whether the stock is going to move up or move down.

play14:37

Now I have to predict whether I have whole massive data set of all the stock prices up

play14:44

to that point for the last three months and now I am saying tomorrow the market opens

play14:47

at 10 o'clock what is going to be the opening price of this particular stock for which I

play14:54

have collected data.

play14:55

Now how are you go about doing this we said the first option is to just use the three

play15:00

months of data that you have collected plain data that you have collected use the same

play15:04

values.

play15:06

That would be called trace driven simulation.

play15:09

Second approach would be for the three month data that you have collected why do not you

play15:13

fit a distribution and there has been enough and more research on what is a good fit for

play15:18

a stock price data.

play15:21

Obviously everybody wants to crack that problem and very clearly that I have not solved that

play15:27

problem because if I had cracked that problem I would not be sitting here it is already

play15:31

11 o'clock I would be using my distribution and playing with the market.

play15:37

So, you can fit a distribution for the three months of data that you have collected and

play15:41

I have a whole bunch of candidate distributions available.

play15:45

Normal distribution, uniform distribution, log normal distribution, weibull distribution,

play15:51

the full family, not the full family, the full forest.

play15:55

And the third way is well the three months of data that I have collected is for a fairly

play16:01

weird stock, none of the distributions amicably fit the data and therefore I want to define

play16:08

my own distributions.

play16:10

And therefore we got into the empirical distributions.

play16:16

Therefore we got into the empirical distributions.

play16:18

So, these are two examples of how to build empirical distributions from the data that

play16:23

we have.

play16:24

Now let us go back and go to step number two what if I want to fit theoretical distributions

play16:29

how do I go about doing that.

play16:33

So, before we do that let us quickly take a look at how these three approaches compare

play16:39

with each other.

play16:40

Usually approach one which is using the plane three months data is usually used to validate

play16:48

the models.

play16:49

We already have a model you already have the output and you want to validate whether that

play16:53

output is correct or not.

play16:54

So, what you do is you push these three months of data into your model and your model generates

play17:00

an output and you compare that output with the reality with the existing system which

play17:06

is what happens tomorrow check and whether that matches.

play17:10

So, essentially our trace driven simulation is mainly used to fit to validate a model

play17:17

that you already may have built using something, some different approach.

play17:23

So, you have some prior knowledge how to build models for stock prices you have already done

play17:27

that now you want to check whether that model is correct or not.

play17:31

And therefore you feed into that model these three months of data whatever comes out of

play17:35

this model should match with what happened in reality or so should come close to each

play17:39

other.

play17:42

The drawback of this approach is you are going to test your model only with the data that

play17:47

you have collected.

play17:48

So, for example going back to the sales volume data you only have 36 values.

play17:53

So, your model is going to be tested only using the 36 values that you have actually

play17:58

observed and fed into the model that may not be enough that may not be enough.

play18:04

Even with the three months of minute level data on the stock prices let us say the stock

play18:10

price was fairly stable during these three months there was no turbulence in the market.

play18:15

So, how will you test whether your model works very well in the turbulent period.

play18:20

Now this data that you have collected will not give you that simulation because this

play18:25

data was collected from a fairly stable stock period, a stock market period.

play18:31

So, those are some of the problems.

play18:34

Approaches 2 and 3 building your distributions or using a theoretical distribution kind of

play18:40

avoid this problems.

play18:42

Because what you can do is once you have built a distribution you can generate values from

play18:46

those distributions which are not restricted to the 36 values that you have actually observed

play18:51

in your sample.

play18:53

So, compared to approach 1 I would say approach 2 and 3 are preferable that way.

play19:00

However if you can actually find a theoretical distribution that fits your data I would generally

play19:06

avoid building empirical distributions.

play19:09

Therefore I would say that theoretical distributions are preferred over empirical distributions.

play19:16

The problem with empirical distributions is very similar to the problem that we have for

play19:22

approach one.

play19:23

Now when you build an empirical distribution from the data that you have the distribution,

play19:29

the shape of the distribution is completely governed by the data that you have used to

play19:34

build the distribution functions.

play19:37

Remember your distribution functions, your distribution functions are built from the

play19:41

data that you have.

play19:42

So, the shape of the distribution will be completely governed by your data.

play19:45

Now once again if the data is of a particular pattern then quite likely that the distribution

play19:54

will be biased towards that.

play19:58

The other problem is the distribution that we built usually are restricted by the smallest

play20:05

and the largest value.

play20:07

So, here the distribution is 0 for all the values lesser than the smallest value that

play20:14

you have observed.

play20:15

The distribution is 1 beyond or the maximum value that you have observed in the sample

play20:20

which may not be true.

play20:22

This is the smallest value that I have observed in the sample does not mean that sales cannot

play20:26

be lower than this.

play20:28

This is the maximum value that I have observed in the sample does not mean that my sales

play20:31

cannot be more than that.

play20:33

However, the distribution that you build using these data will pretty much say so.

play20:39

The distribution that we have built will say that probability of finding a sales volume

play20:43

lesser than the smallest value is zero.

play20:47

And indirectly speaking probability of finding a sales value sales volume bigger than the

play20:53

maximum value is again 0 almost 0.

play20:56

So, those are the problems.

play20:58

So, we are still not able to go beyond whatever we have observed in our sample.

play21:05

So, that's, those are the problems.

play21:11

So, if you want to test the validity of our system from an empirical, from a data that

play21:19

comes from an empirical distribution we may have problem because we cannot simulate values

play21:24

which are outside of the range that was fed into.

play21:27

So, those are some of the issues with empirical distributions.

play21:32

Now there may be some compelling reasons for using a particular theoretical distributions.

play21:37

For example let us say that you have data about reliability.

play21:40

Now reliability engineering has a very high importance for weibull distribution.

play21:46

So, for any data that comes about distribution or the reliability I would actually I would

play21:54

like to test whether it fits the weibull family is it coming close there.

play22:00

So, those cases also I mean theoretical distributions why not test it before?

play22:11

So, those are the, that's the difference between fitting a theoretical distribution and fitting

play22:16

an empirical distribution.

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
Probability DistributionsBusiness AnalyticsData FittingTheoretical ModelsEmpirical MethodsNormal DistributionUniform DistributionTrace DrivenStock MarketReliability EngineeringSimulation Techniques
Besoin d'un résumé en anglais ?