MINI-LESSON 8: Power Laws (maximally simplified)
Summary
TLDRThis lecture delves into the concept of power laws and the Pareto distribution, illustrating their prevalence through the 80/20 rule. It contrasts power laws with Gaussian and sub-exponential distributions, highlighting their unique properties like constant ratio in tail behavior and scale-free nature. The speaker explains the implications of power laws in real-world scenarios, such as wealth distribution and company sizes, emphasizing their intuitive understanding despite the absence of traditional statistical measures like mean and variance in certain cases.
Takeaways
- 📚 The lecture discusses the concept of power laws and the Pareto distribution, introducing them in the context of different distribution classes.
- 🔍 The Pareto principle, often cited as the 80/20 rule, is highlighted as a real-world example of power law distribution, illustrating how a small percentage of people own a large percentage of resources.
- 📈 Power laws are characterized by a constant ratio in the tail of the distribution, unlike Gaussian distributions where the ratio changes as you move further from the mean.
- 📊 The script differentiates between three main classes of distributions: Gaussian, subexponential, and power laws, each with their own properties and implications for statistical inference.
- 🤔 The Gaussian distribution is safe for statistical inference due to its convergence properties, but power laws present challenges due to their fat tails and lack of traditional statistical measures like mean and variance.
- 📉 Power law distributions are intuitive and can be understood through the concept of recursion, where the same 80/20 rule can be applied repeatedly to smaller subsets.
- 📝 The script explains that power laws can be mathematically expressed as a constant times x to the power of negative alpha for large x, which is key to understanding their behavior.
- 📊 The log-log plot is a useful tool for identifying power laws, as it should show a straight line with a slope of alpha for distributions that follow a power law.
- 🔢 The 'alpha' parameter in power laws determines the thickness of the tail; lower alpha values indicate fatter tails and higher values result in thinner tails approaching Gaussian behavior.
- 🚫 Power laws with alpha less than or equal to one have no mean, and those with alpha less than or equal to two have no variance, challenging traditional statistical analysis.
- 🌐 The lecture concludes by emphasizing the importance of understanding power laws for statistical inference and for developing a more empirical and mathematically solid view of the world.
Q & A
What is the Pareto Principle often associated with?
-The Pareto Principle is often associated with the idea that 80% of the effects come from 20% of the causes, specifically in the context of wealth distribution where 20% of the people own 80% of the land.
What does the term 'fractal' imply in the context of the Pareto Principle?
-In the context of the Pareto Principle, 'fractal' implies that the 80/20 distribution can be recursively applied at different levels, such as within the 20% itself, leading to further refinements like 50% of the land being owned by 1% of the people.
What are the three main classes of distributions mentioned in the script?
-The three main classes of distributions mentioned are Gaussian (including lower Gaussian like binomial and upper Gaussian like the actual Gaussian distribution), subexponential (including rank and log normal distributions), and power laws (including Pareto, Student's t, and stable distributions).
What is special about the probability ratios in a power law distribution compared to a Gaussian distribution?
-In a power law distribution, the probability ratios are constant regardless of the value of x, whereas in a Gaussian distribution, these ratios change as x increases, showing an acceleration of decline but not maintaining a constant ratio.
Why is the Gaussian distribution considered safe for statistical inference?
-The Gaussian distribution is considered safe for statistical inference because, under the central limit theorem, the sum of a large sample tends to converge to a Gaussian distribution, allowing for more predictable and reliable statistical analysis.
What does the term 'thick tail' or 'semi-fat tail' refer to in the context of distributions?
-In the context of distributions, 'thick tail' or 'semi-fat tail' refers to distributions that have heavier tails than Gaussian distributions but do not follow a power law, meaning that extreme events are less likely but not as rare as in a Gaussian distribution.
How is the concept of 'scale and variance' related to power law distributions?
-The concept of 'scale and variance' in power law distributions refers to the property that the ratio of exceedance probabilities remains constant regardless of the scale of x, indicating a consistent pattern of distribution without a specific mean or variance.
What does the 'Lindy effect' imply about the life expectancy of non-perishable items or concepts?
-The Lindy effect implies that the life expectancy of non-perishable items or concepts is proportional to their current age, suggesting that the longer something has already existed, the longer it is expected to continue to exist.
How does the script differentiate between Gaussian and power law distributions in terms of life expectancy?
-The script differentiates by stating that in Gaussian distributions, life expectancy shrinks as the age increases beyond certain thresholds, while in power law distributions, life expectancy remains constant or even increases, reflecting the 'Lindy effect'.
What mathematical property characterizes power law distributions?
-Power law distributions are characterized by the property that the probability of exceeding a value x, for large x, can be approximated as a constant times x to the power of negative alpha, which is indicative of the distribution's tail behavior.
What are the implications of different alpha values in power law distributions?
-Different alpha values in power law distributions indicate the fatness of the tail: lower alpha values result in fatter tails (more extreme events), while higher alpha values result in thinner tails. Alpha equal to one indicates a Cauchy distribution with no mean, and alpha values of two or lower imply no variance.
Outlines
📚 Introduction to Power Laws and Pareto Distribution
The speaker begins by expressing excitement about discussing power laws and the Pareto distribution. They introduce the concept by referencing the 80/20 rule, which originated from Vilfredo Pareto's observation of wealth distribution in Italy. The talk aims to contrast power laws with other distributions, highlighting the fractal nature of power laws where the same proportions recur at different scales. The speaker also outlines the three main classes of distributions: Gaussian, subexponential, and power laws, with a focus on the Pareto distribution's unique characteristics and its prevalence in real-world phenomena.
📉 Understanding Power Laws and Their Intuitive Nature
This paragraph delves into the intuitive aspects of power laws, contrasting them with the Gaussian distribution. The speaker explains the constant ratio characteristic of power laws, as opposed to the accelerating decline seen in Gaussian distributions. They provide examples of wealth distribution and the predictability of power laws, even without a defined mean or variance. The summary also touches on the concept of scale and variance in the context of power laws, emphasizing the predictability and the constant probability ratio regardless of the scale.
📈 Gaussian vs. Power Law: Life Expectancy and Conditional Expectation
The speaker compares Gaussian properties with power law characteristics, focusing on life expectancy as an example. In Gaussian distributions, the conditional expectation of life expectancy decreases as the age threshold increases. However, with power laws, the conditional expectation remains a multiple of the threshold, indicating a different approach to understanding life spans and aging. The paragraph also discusses the implications of these distributions in real-world scenarios, such as the life expectancy of companies and the concept of 'shock' mortality in the sub-exponential class.
📊 Mathematical Characterization of Power Laws
This section provides a mathematical explanation of power laws, defining them as a tail behavior where the probability of exceeding a value x decreases as a constant times x to the negative power of alpha. The speaker describes how to identify power laws through log-log plots and the implications of different alpha values on the distribution's properties. The summary highlights the unique statistical properties of power laws, such as the absence of mean and variance when alpha is less than or equal to one and two, respectively.
🌟 Conclusion: Power Laws in Statistical Inference and Worldview
In the concluding paragraph, the speaker summarizes the importance of understanding power laws in statistical inference and shaping our worldview. They emphasize the limitations of traditional statistical measures like mean and variance in the context of power law distributions and encourage a shift towards more empirical and mathematically solid approaches. The speaker also mentions their book on statistical consciousness and fat tails, suggesting that understanding these concepts can greatly enhance our analysis and interpretation of real-world data.
Mindmap
Keywords
💡Power Laws
💡Pareto Distribution
💡80/20 Law
💡Fractal
💡Gaussian Distribution
💡Subexponential
💡Scale Invariance
💡Tail of Distribution
💡Lindy Effect
💡Sigma
Highlights
Power laws and the Pareto distribution are introduced as key concepts in understanding statistical distributions.
The 80/20 rule is associated with the Pareto principle, illustrating the uneven distribution of resources like land ownership in Italy.
Fractal nature of power laws is explained, showing how the 80/20 principle can be recursively applied.
Three main classes of distributions are presented: Gaussian, subexponential, and power laws.
Gaussian distributions are contrasted with their rapid convergence to a mean in large samples.
Subexponential distributions are characterized, including rank and log-normal distributions.
Power law distributions are detailed, including Pareto, Student's t, and stable distributions.
The concept of scale and variance in power laws is introduced, showing a constant ratio regardless of the value of x.
Gaussian distributions are critiqued for their lack of scalability in ratios as values increase.
The practical implications of power laws in wealth distribution and company sizes are discussed.
The lack of a mean in power law distributions is highlighted, yet their predictability is emphasized.
A simplified characterization of power laws is provided through the probability of exceeding x.
The Lindy effect is introduced in the context of the three classes of distributions, affecting life expectancy and aging.
Conditional expectation in Gaussian distributions is contrasted with the constant life expectancy in power law distributions.
The mathematical expression for power laws is given, emphasizing the constant ratio for large x values.
The importance of the alpha parameter in power laws is explained, affecting the fatness of the tail.
The implications of alpha values on statistical measures like mean, variance, and kurtosis are discussed.
The transcript concludes with a summary of the importance of understanding power laws for statistical inference and worldview.
Transcripts
friends hello again
extremely happy as you can see to do
these uh lectures
uh mini lectures i'm gonna try to
explain uh power laws the pareto
distribution
very quickly put it in perspective with
other classes of distribution
namely the three main ones that i'll
present in a minute
but let me tell you why it's very
intuitive and very well known
a lot of people associate
power laws with the 80 20
law principle uh guideline
that we want 80 percent of the people in
italy
own 20 of the land and 20 of the land
owned 80 percent sorry twenty percent of
people owned eighty percent of the land
so
that is how it was born with the work of
pareto vi
fredo pareto it is fractal
we'll see what fractal means fractal in
a sense that you can recurse
these 20 also have an 80 20 you apply it
to them
you do it again the last 20 you do an 80
20 on it
till you get to about 50 or 1.
so 50 of the land is owned by one
percent of the people
compatible with that if you have the
same proportion being preserved and
we'll see how
these get preserved with
the notion of power law and power law
decline of the tail of the distribution
so
uh one uh
mentioned as i said the class of
distribution
okay now you have a class of
distributions
this is going to be
we have mainly three classes
the first one i would call it gaussian
it's not really gaussian but the
distribution under summation as we saw
with the law of love
with the central limit theorem things
converge to the gaussian rather rapidly
within the gaussian of course we have
the lower i'll call it the lower
gaussian like binomial
and upper gaussian is a gaussian proper
okay because a binomial doesn't go to
infinity these are finite
sorry infinite infinite support delivers
on plus infinity minus infinity
that's just this goes on finite support
and below the binomial you have the
bernoulli
and of course below the bernoulli you
have the so-called degenerative
distribution so
this is the gaussian in these classes
distribution you can do
a lot of statistical inference you're
safe
thing other summation where you have a
large sample the average of that sample
will be summed
you converge everybody's happy so
sorry if you're not seeing very well
here
let me raise it the razorboard a little
okay
second class let's call this class
subexponential sub
expo financial
okay and of course we draw the line here
between
two classes power laws
and let's call it the sub exponential
but not power law
okay we won't give it a name in here you
have
your rank and log normal
nasty distribution
we go like normal say we have the
exponential
laplace in between maybe you have the
gamma
which is okay i mean they're all related
with prioritization or mild
transformation
and then you have a bunch of things
this is let's call the semi-fat tail
thick tail but not power loss now within
power laws
you have
pareto distribution the pareto
distribution
or distribution with pareto entails
student t
or something more general the stable
distribution
within here you have the levy some
parameterization of stable distribution
you have the koshi because she actually
questioned
could be student t parameterized or
stable distribution parameterized
you have these distributions okay so
give me a second to erase the board and
uh let's forget about this technique if
it's too technical for you forget about
it
because power laws are very intuitive
very easy to understand
okay let me let me uh
show you what happens with the gaussian
well you have decline
probability of exceeding three sigma
three standard deviation or three
whatever
okay is one and seven hundred and forty
times
probability of exceeding four
is one in 32k times
probably exceeding five sigma one
and 3.5 million probability exceeding
six sigmas
one in
one billion you see what you notice is
acceleration of decline
but there is one central attribute to
this
it's as follows the ratio of three
sigmas to six sigmas
you see is going to be much much much
much
bigger the problem than five sigma to
ten sigma
then sigma is 1 and i don't know 13 to
the 10th
something so as you go higher
with a gaussian that ratio
becomes different with the power law you
don't have that
you have a constant ratio the two sigmas
divided by four sigmas or two whatever
two dollars divided by four dollars or
two billion dollars
is the same as if you uh
take another ratio five divided by ten
it's the same probability ratio
let's say i tell you
how many people are richer than
okay and i take one million
one in 62 and a half people
okay let's assume i know maybe not wrong
maybe
soon with hyperinflation it'd be a
hundred percent of people
or 99 percent of the people but
let's assume that work say richer than 2
million
1 in 250
richer than four million one in
one thousand so you know automatically
we're going to have richer than
eight million it's important and four
thousand
you double that number you multiply by
four yes it is
very simply that's what a power law
makes things very easy
and you can predict and incidentally
this distribution here
does not have a variance but you can
understand it perfectly
you can you can even make you can make
it simpler say with a stiffer
power law one that corresponds more to
what we have
in uh real life
uh richer than 1 million say
i'll start with 62 and a half
one and 125
one in 250 one end
500 there's a steeper power law this one
does not have a mean
guess what we understand we know how it
works
it doesn't have a mean but you know we
can work with it we can live with it
so the
way we're going to simplify again
characterize power laws
it's as follows
[Applause]
probability of exceeding x
over probability of exceeding n times x
so here we use 2 times x before in the
example
for a gaussian
or gaussian basin depends on n and x
depends on n and depends on x
for a power law
depends only on
n you don't care where x is
let me give you another example of the
gaussian
okay the probability
the probability
this is x sorry x
to x and ratio
when x is one two sigma over one sigma
is about seven
four sigma over two sigma is about seven
eighteen
four sigma over eight sigma is something
that was over 10 10.
so it increases for power law depending
how parameterized
you see
the ratio four sigma
two sigma
four times we saw earlier four times
four times
this invariant this is what we call
scale and variance
of the parallel solution tells you the
following
the ratio of millionaires so
two billion over half millionaires is
the same as ratio of billionaires
over half millionaires approximately
it's not going to be perfect
and sometimes this only holds for x
large
like if you take salaries you will not
find your income
you will not find that property you know
at the bottom but you'll find that
property high up
now for gaussian variables you don't
have that property
the ratio of people who are uh you know
three meters tall over one and a half
meters tall is not the same
as the ratio of six there's no six meter
but you see
because of the rapidity of the client
so you can apply the sales of companies
stuff like that this is the mark of a
power law
i'm not going to go much further because
i've written a lot on it
but now let's talk about lindy in the
three classes of distribution
now let me discuss lindy in that context
let's talk about the gaussian properties
of gaussian which is what i call
conditional expectation
e of x conditional x
higher than k okay
and let's take this example life
expectancy
what is the expected life expectancy
was x higher than zero
let's say it's 80 years okay
let's say now what is the life
expectancy
of conditional on being 80 years or
older okay
you reach 80 years what's the life
effect of see
it's going to be 92 so in other words an
extra 12
plus 12. okay actually if you're healthy
much more than 92
incidentally if you lift weight
if you don't hang around economists if
you don't if you don't read
the right things if you don't own
bitcoin you live long or is a bad
bitcoin
all right so let's say what is life
effectively
of x conditional x higher than 100
that's going to be say one or two and a
half plus
two and a half years so as you see the
extra last effects he shrinks
four things in the gaussian domain
and then of course what is the life
expectancy
assuming x is higher than 220
well he just died the second
right so in other words when x when when
this
went sorry uh x so
when k goes up
okay the life effect of c goes to k
you see so that's the problem
of the gaussian if i tell you what is
the average deviation
higher than 10 sigma it is 10 sigma
now for power law you don't have this
property
okay so
we keep this and let's say now we're
increasing numbers okay
typically for a power log
what is the expectancy say let's say we
start here expectancy of
x higher than zero it's going to be
ten
now x
we increase okay x equal 10
maybe 20. it's a multiple of 10.
now higher than 20 40 higher than 40
80. and effectively you see that what is
the
uh this the ratio here is two times
you can see one and a half times see
that what's the average size of a
company higher than a billion
it's one and a half billion uh no no
sorry it's two billion or something like
that
well it's a constant what's the average
market move more than
uh two sigma
this is three sigma more than
five sigmas seven and a half sigmas
whatever however you define sigma
this is what's interesting is that
they're invariant so these this is
the power law behavior
that it produces ex
an expectation that is
a multiple of
that that threshold okay
people grasp it when i present it that
way when i tell them if a technology is
older than 10 years
it's going to be 20 years older than 20
years maybe 40 years
because it has to be multiple higher now
when we talk about sub-exponential class
but not
our law intermediate class guess what
this is a constant
some people say the crocodile has almost
that
whenever you spot a crocodile
say it has 20 years to live
if it's 10 years old it has 20 years to
live if it's 20 years old it has 20 more
years to live
it's like what we call poisson arrival
if things only die from shock there's no
aging there may be poison arrival in
that class
the sub the exponential class
so the borderline which is somewhat
exponential with that powerball
has this property okay the exponential
class
having constant life expectancy uh
if the life of pregnancy is distributed
according to this
and you see it in a lot of things
where you may have a combination
to give an example there was a paper
recently
an actually generated debate that found
that
humans have visibly some decay because
their mortality rate increases with time
that there is aging but once you reach a
certain age
it becomes 50 a year so no matter how
old the person is
whether 1 10 or 120
the person would have a say two and a
half more years to go
regardless so it becomes flat means
there's no more aging
and that produces completely different
distribution of humans it means that
human life is unlimited but short
that's what the title of the book was
the paper was it generated some
uh debates because some people weren't
happy with the data
but the argument can be used for
uh things in nature that we see are very
old
like turtles and stuff like that that
maybe they're they
they're not immortal you're never
immortal because their accidents will
kill you
but these accidents are memoryless
they're plason distributed
with poisonous very shock you get this
life effectively so i have very rapidly
explained
the power laws
the difference between these let me get
a little more mathematical now
let's say the probability exceeding x
for power law is going to be something
like
or converge for x very large
x some constant time x to the minus
alpha
okay for x very large
so you find this property some people
use something complicated you see in my
book
the slowly moving uh
slowly moving function and and which
tells you that log of x
if i take log of probability of
exceeding x
predicting log is gonna be minus alpha
log of something
so mathematically we can express power
laws as follows
a power law is for the tail not the
whole probability distribution
the probability of exceeding x for x
very large
is going to be approximated some
constant
times x to the minus alpha
okay so let's forget about what's
in front of this some scaling uh
parameter but that's the property of
power law
now if i log take the log
log of exceedance probability of
survival function however you want to
call it
is going to be minus alpha approximately
of course
plus something log of x
and this equation is a central
one that allows us comes from
scalability
okay whenever you see log
scalable so it gives us the following
take a log log plot
log of x log of probability of exceeding
x
you're going to have some whatever
something like this
and then it becomes
a constant with slope alpha
so typically that's how we identify this
power law
sometimes it falls here because you
don't have enough observation
by definition you don't have a lot in
the tails okay
and that alpha
the lower the alpha alpha equal one you
have koshi
okay the lower the alpha
the the the more the fatter the tail
and the higher the alpha say alpha
equals three we get something like the s
p 500 so between two and a half and
three
the returns not the price okay uh
something like alpha equals two you got
earthquakes size of cities somewhere
between one and two the gaussian
it becomes vertical so alpha is
effectively that tail alpha is
effectively infinite
for the gaussian one property you need
to note
is that when alpha equal one no mean
when alpha equals two or lower
okay no
uh no variance
when alpha equals four or lower no
kurtosis
or infinite depends on whether the
moment is odd or even
but this is crazy because it tells you
you can do statistics we can understand
everything but the notions like various
kurtosis and even mean
don't have any more meaning but yet we
can do as we showed
as i showed you you can we can figure
out the properties we can work with
wealth we can
work with problems the the traditional
notions that we saw
in the beginning just do not apply
let me stop here because i don't want to
get too
mathematical and be mathematical in
something
where i can relax and and
express things in math in an unhabited
way
the masses incidentally in my book on
the fact that
the whole book is about statistical
conscious fat tales but most of it is
about characterization of fat tails the
different classes different
uh colors if you want the fat tails
and uh and and this one i called alpha
equals one or lower i called the forget
about it it was
spelled the way you should be
pronouncing it because you can't do
traditional statistics at all
there's no mean but sometimes you have
the illusion that there is a mean
so thanks a lot for attending this
lecture
remember this is a central lecture
central lecture from this with his
intuition
we can go very very far in redoing
statistical
inference and redoing um
the the the the our re rev
redesigning our views of the world in on
more empirical basis
and and of course more solid
mathematical basis thank you and have a
great
day
Ver Más Videos Relacionados
MIN-LESSON 8b: (Power Laws) Q& A
MINI-LESSON 4: CLT, The Central Limit Theorem, a nontechnical presentation.
How to understand power - Eric Liu
Standard Deviation - Explained and Visualized
Statistics-Left Skewed And Right Skewed Distribution And Relation With Mean, Median And Mode
Tutorial 4- Measure Of Central Tendency- Mean, Median And Mode In Hindi
5.0 / 5 (0 votes)