The most important ideas in modern statistics

Very Normal
29 Oct 202318:25

Summary

TLDRThis video script discusses eight revolutionary ideas in statistics that have shaped the field from 1970 to 2021. It highlights the importance of counterfactual causal inference, the bootstrap method, simulation-based inference, overparameterization, regularization techniques, multi-level models, and the role of computational power. The script emphasizes the evolution of statistical practice, the significance of robust inference, and the innovative use of plots and visuals for data analysis. It also touches on the concept of adaptive decision analysis and the impact of robust statistics in providing trustworthy analyses despite potential assumption violations.

Takeaways

  • πŸ“Š Statistics is an evolving field with influential ideas shaping its trajectory.
  • πŸ“ˆ Andrew Gelman and Aki Vehtari are authorities in Bayesian statistics, known for their work on Bayesian data analysis.
  • πŸ” The concept of counterfactual causal inference allows making causal statements from observational data.
  • πŸ”„ The bootstrap method is a versatile algorithm for estimating the sample distribution of a statistic using a single dataset.
  • πŸ’» The rise of computational power has highlighted the importance of computation in statistics, enabling complex simulations and analyses.
  • πŸ”§ Overparameterized models and neural networks offer extreme flexibility in modeling a wide range of phenomena.
  • πŸ”§ Regularization techniques are used to prevent overfitting in flexible models by maintaining a degree of simplicity.
  • πŸ“ˆ Multi-level models, also known as hierarchical or mixed effect models, are used to aggregate information and provide more nuanced analyses.
  • πŸ”„ The Expectation-Maximization (EM) algorithm and the Metropolis algorithm are key statistical algorithms that address complex estimation problems.
  • πŸ”§ Adaptive decision analysis allows for the modification of experiments based on interim data, improving the design and decision-making process.
  • πŸ” Robust inference provides trustworthy statistical analyses even when assumptions are violated, offering more confidence in the results.

Q & A

  • What is the main focus of the video?

    -The video discusses eight innovations in statistics that have significantly shaped the field, making it accessible to a general audience.

  • Who are Andrew Gelman and Aki Vitter, and why are they considered authorities in statistics?

    -Andrew Gelman and Aki Vitter are renowned statisticians known for their work in Bayesian statistics. They are considered authorities because they have written extensively on Bayesian data analysis, which is highly respected among practitioners.

  • What is the difference between experimental data and observational data in statistics?

    -Experimental data comes from controlled experiments where researchers can manipulate variables, allowing for causal claims. Observational data, on the other hand, comes from observing real-world scenarios where researchers cannot control who receives a treatment, limiting them to making only correlational claims.

  • How does counterfactual causal inference help in dealing with observational data?

    -Counterfactual causal inference allows statisticians to make adjustments to observational data, getting closer to causal statements by considering what would have happened in an alternate reality where the treatment was not applied.

  • What is the bootstrap method and why is it significant?

    -The bootstrap is an algorithm for estimating the sample distribution of a statistic by resampling with replacement from the original data set. It is significant because it simplifies the process of creating confidence intervals and is applicable to many kinds of statistics, highlighting the importance of computation in statistics.

  • What is the role of simulations in statistics?

    -Simulations allow statisticians to assess experiments and new statistical models without actually conducting them, saving resources and time. They can be used to evaluate the power and type I error of experimental designs, such as clinical trials.

  • Why is increasing the number of parameters in a statistical model beneficial?

    -Increasing the number of parameters in a model provides more flexibility, allowing it to better represent complex real-world scenarios. This can lead to more accurate predictions and a better understanding of the data.

  • What is regularization in the context of statistical models?

    -Regularization is a technique used to prevent overfitting in extremely flexible models by enforcing simplicity. It helps balance the complexity of the model, ensuring it does not just approximate the data but represents a more general phenomenon.

  • How do multi-level models differ from simpler statistical models?

    -Multi-level models, also known as hierarchical or mixed effect models, assume additional structure over the parameters, allowing for the aggregation of data from different levels or groups. This structure is useful for combining information from various sources and can be applied in both frequentist and Bayesian frameworks.

  • What is the expectation maximization (EM) algorithm and its significance?

    -The EM algorithm is a statistical algorithm used for estimation problems, particularly when the model contains latent variables or unobserved data. It allows for the estimation of parameters in complex models that cannot be solved directly.

  • What is robust inference and its importance in statistics?

    -Robust inference provides trustworthy statistical analyses even when assumptions are violated. It ensures that the results of statistical analyses, such as confidence intervals or estimated values, remain reliable even if the underlying assumptions are not perfectly met.

Outlines

00:00

πŸ“Š Introduction to Statistical Innovations

The video script begins by discussing the importance of statistics in various aspects of life and introduces eight innovations that have shaped the field. The speaker, Christian, aims to make statistics accessible and highlights the authority of Andrew Gelman and Aki Vittachi, authors of a thought-provoking essay on the most important statistical ideas in the past 50 years. The script also touches on the limitations of observational data and the introduction of counterfactual causal inference, which allows for causal statements in observational studies.

05:00

πŸ” The Bootstrap Method and Computational Power

This paragraph delves into the bootstrap method, a technique for estimating the sample distribution of a statistic using a single dataset. It emphasizes the significance of computation in statistics and how the rise of computational power has facilitated the use of simulations and the development of complex statistical models. The script also mentions the importance of understanding statistical parameters and the role of overparameterization in allowing models to capture more complex realities, such as those represented by neural networks.

10:01

πŸ“ˆ Multi-Level Models and Bayesian Approaches

The third paragraph discusses multi-level models, also known as hierarchical or mixed effect models, which are used to aggregate data from multiple sources and incorporate prior knowledge into the analysis. It highlights the flexibility of Bayesian methods, particularly in handling small sample sizes and the ability to choose different priors for various levels of the model. The paragraph also touches on the importance of computers and computational power in the advancement of statistical algorithms and the development of more complex models.

15:01

πŸ”¬ Algorithms and Robust Inference

This section introduces two key statistical algorithms: the Expectation-Maximization (EM) algorithm for estimation problems and the Metropolis algorithm for generating samples from complex probability distributions. It also discusses adaptive decision analysis, which allows for the modification of experiments based on interim data, and robust inference, which provides trustworthy analyses even when assumptions are violated. The paragraph concludes with a discussion on the importance of visualizing data through plots and the impact of new technologies on the evolution of statistics.

Mindmap

Keywords

πŸ’‘Statistics

Statistics is a field of research that deals with the collection, analysis, interpretation, presentation, and organization of data. In the video, it's highlighted as an evolving discipline with influential ideas that have shaped its current practice, including the use of computational power and innovative methodologies.

πŸ’‘Biostatistics

Biostatistics is the application of statistical methods to the biological sciences, particularly in the context of health and medicine. The video emphasizes the importance of biostatistics students being familiar with revolutionary ideas in the field, which have a significant impact on research and practice.

πŸ’‘Counterfactual Causal Inference

This framework allows for the analysis of observational data to make causal statements, similar to those from controlled experiments. It involves comparing the observed outcome with a hypothetical alternative outcome that would have occurred under different conditions. In the video, this concept is crucial for understanding causal effects in real-world scenarios where controlled experiments are not possible.

πŸ’‘Bootstrap

The bootstrap is a resampling technique used to estimate the sampling distribution of a statistic without the need for multiple datasets or complex mathematical derivations. It involves repeatedly sampling with replacement from the original dataset to create new datasets and calculate the statistic of interest. The video highlights the bootstrap's simplicity and wide applicability in statistics.

πŸ’‘Overparameterization

Overparameterization refers to the practice of adding a large number of parameters to a statistical model to increase its flexibility. This can be seen in neural networks, where each connection is associated with a parameter. The video discusses the trade-off between model complexity and the ability to approximate a wide range of functions, and the use of regularization techniques to prevent overfitting.

πŸ’‘Multi-level Models

Also known as hierarchical or mixed effect models, these statistical models account for additional structure over the parameters, allowing for the aggregation of data from different levels or groups. The video explains how multi-level models can be used to combine information from various sources and the importance of choosing appropriate priors for model parameters.

πŸ’‘Expectation Maximization (EM) Algorithm

The EM algorithm is a statistical method used for estimating parameters in models where the likelihood function is difficult to maximize directly. It involves iteratively performing an expectation (E) step and a maximization (M) step. The video mentions the EM algorithm as an example of how computational power has enabled the development of complex statistical models and algorithms.

πŸ’‘Metropolis Algorithm

The Metropolis algorithm, and its descendants, are used to generate samples from complex probability distributions, even when the posterior distribution is not easily described by a formula. This algorithm is significant for Bayesian statistics, as it allows for the simulation of data from distributions that are otherwise intractable. The video discusses the importance of such algorithms in statistical practice.

πŸ’‘Adaptive Decision Analysis

This concept involves modifying an experiment based on interim data, allowing for early stopping of a trial if certain conditions are met, such as lack of efficacy or promising results. The video explains how adaptive decision analysis can improve the design and outcomes of clinical trials, making them more efficient and responsive to preliminary findings.

πŸ’‘Robust Inference

Robust inference provides trustworthy statistical analyses even when some assumptions are violated. It allows for the use of statistical methods that are less sensitive to outliers or deviations from expected distributions. The video mentions the sample median as a robust estimator compared to the mean, which is more influenced by outliers.

πŸ’‘Propensity Score Matching

Propensity score matching is a technique used to create comparable groups in observational studies, aiming to mimic the conditions of a randomized controlled trial. It involves estimating a score that predicts the likelihood of receiving a treatment and using this score to match individuals in treatment and control groups. The video discusses the importance of correct model specification for the technique to be effective and the existence of robust versions that can handle model misspecification.

Highlights

Statistics is a field of research with influential ideas that have changed its trajectory.

Christian aims to make statistics accessible for practical application in daily life.

Andrew Gelman and Aki Vihavainen published an essay on the most important statistical ideas in the past 50 years.

Gelman and Vihavainen are authorities in Bayesian statistics, known for their work on Bayesian data analysis.

The essay discusses statistical innovations from 1970 to 2021, focusing on modern statistics.

Counterfactual causal inference allows making causal statements from observational data.

The bootstrap method is a general algorithm for estimating the sample distribution of a statistic using a single dataset.

Simulation-based inference uses computational power to assess experiments and statistical models without actual data collection.

Overparameterized models and neural networks provide extreme flexibility in modeling complex phenomena.

Regularization techniques help balance complexity in extremely flexible models.

Multi-level models, also known as hierarchical or mixed effect models, are used to aggregate data from multiple sources.

The expectation-maximization (EM) algorithm is used for estimating parameters in complex models with latent classes.

The Metropolis algorithm and its descendants enable generation of samples from complex probability distributions.

Adaptive decision analysis allows for modifying experiments based on interim data collection.

Robust inference provides trustworthy statistical analyses even when assumptions are violated.

Propensity score matching is a technique used to estimate causal effects by matching similar individuals in treatment and control groups.

Plots and visuals are essential tools for examining data and assessing statistical models.

The tidyverse framework, popularized by Hadley Wickham, simplifies data manipulation and visualization in R.

Transcripts

play00:00

most people only ever interact with

play00:02

statistics for a limited part of their

play00:03

lives but statistics is a field of

play00:06

research like other areas statistics has

play00:09

evolved influential ideas have come and

play00:11

changed the trajectory of Statistics as

play00:13

a student of biostatistics it's my

play00:16

responsibility to be familiar with these

play00:18

revolutionary ideas in the field in this

play00:20

video we'll talk about eight Innovations

play00:22

and statistics that have shaped how we

play00:24

know it today and I'll do my best to

play00:26

explain what these Innovations are and

play00:28

why they were so impactful and a way

play00:30

that makes sense to a general audience

play00:32

if you're new to the channel welcome my

play00:34

name is Christian my goal is to make

play00:36

statistics accessible to more people so

play00:38

that they can apply to their daily lives

play00:40

in 2021 Andrew Gman and Aki vitari

play00:43

published an article in the Journal of

play00:44

the American statistical Association or

play00:47

jassa jasa is one of the most

play00:49

prestigious journals in the field of

play00:50

Statistics so publishing here is a big

play00:53

deal but instead of a research

play00:55

manuscript Gman and vitari publish an

play00:57

essay this essay is titled what are the

play01:00

the most important statistical ideas in

play01:01

the past 50 years and this article is

play01:04

what motivates this video but two

play01:06

statisticians do not make up an entire

play01:08

field of Statistics so what gives these

play01:10

two authors the authority to answer such

play01:12

a question the essay was meant to be

play01:14

thought-provoking not authoritative

play01:17

though I would argue that both Andrew

play01:18

Gelman and Aki vitari are in fact

play01:20

authorities in the field they are widely

play01:22

known among practitioners of beian

play01:24

Statistics since they basically wrote

play01:26

the Bible on it beian data analysis as

play01:29

of the writing of this video Andrew

play01:31

Gelman is a professor at Columbia

play01:33

University in both statistics and

play01:35

political science Aki vitari is a

play01:37

professor of computational probabilistic

play01:39

modeling at alter University in Finland

play01:42

Andrew Gman also maintains a fantastic

play01:44

blog on statistics political science and

play01:47

her intersection which I highly

play01:48

recommend the article considers

play01:50

statistical innovations that happened

play01:52

from around 1970 2021 so this is the

play01:55

time period for which I call Modern

play01:58

statistics without further ado let's

play02:00

have a look at the

play02:02

list in an Ideal World all data comes

play02:05

from experimental data where a

play02:07

researcher can control who receives an

play02:09

intervention and who doesn't when we can

play02:12

do this in a carefully controlled manner

play02:14

such as in an RCT we can claim cause and

play02:17

effect between an intervention and some

play02:19

outcome of interest but we live in the

play02:21

real world and the real world gives us

play02:23

observational data sometimes where we

play02:25

can't control who receives a treatment

play02:27

and who doesn't we can still perform

play02:29

statistic analyses on observational data

play02:32

we cannot make the same causal claims

play02:33

about them only correlational claims

play02:36

this was until counterfactual causal

play02:38

inference came onto the scene this

play02:40

framework allows us to take

play02:41

observational data and make adjustments

play02:44

in a way that gets us closer to causal

play02:46

statements how this works is the topic

play02:48

of an entire other video so I'll give

play02:50

you the basic breakdown let's consider a

play02:52

world where I have an upcoming test I

play02:54

can choose to study a bit more or I can

play02:56

choose not to in this reality I choose

play02:58

to study and I get some score on the

play03:00

test later I'll denote this as y sub one

play03:03

if a supernatural statistician wanted to

play03:05

know if this decision caused the change

play03:07

in my score then they would have to

play03:09

examine another reality they would have

play03:11

to find the reality where I didn't

play03:13

choose to study and measured the test

play03:15

score of that version of me who didn't

play03:17

study I'll call that outcome y subz the

play03:21

only difference between these two

play03:22

versions of me is that I chose to study

play03:24

in one but not in another this

play03:27

unobserved version of myself is called

play03:29

the counteract factual because this

play03:31

version of me is counter to what

play03:33

actually or factually happened then the

play03:36

causal effect of me studying on my test

play03:38

score is the difference between y sub

play03:40

one and Y Sub 0 the fundamental problem

play03:43

in causal inference is that we can only

play03:45

ever observe one reality and therefore

play03:48

one outcome in essence it's a missing

play03:50

data problem the counter effectual

play03:52

framework is important because it helped

play03:54

give statisticians a way to formalize

play03:56

causal effects in mathematical models

play03:58

this is significant because several

play04:00

fields of study are prone to having more

play04:02

observational data such as economics and

play04:07

psychology if you've been with my

play04:09

channel for a while you may be familiar

play04:11

with this one already that video delves

play04:13

into more technical detail but I'll

play04:15

briefly explain what it is in this video

play04:17

the bootstrap is a general algorithm for

play04:19

estimating the sample distribution of a

play04:21

statistic ordinarily this would require

play04:24

Gathering multiple data sets which no

play04:26

one has time for or it would require a

play04:28

mathematical derivation which I don't

play04:30

have time for rather than do either of

play04:32

these the bootstrap takes the

play04:34

interesting approach of reusing data

play04:36

from a single data set the bootstrap

play04:38

generates several bootstrap data sets by

play04:40

sampling withd replacement from the

play04:42

original for each of these bootstrap

play04:44

data sets a statistic of interest is

play04:46

calculated and their distribution can be

play04:49

derived from this entire collection this

play04:51

is incredibly valuable not only because

play04:53

it's super simple and therefore easy for

play04:55

more people to use it's applicable to

play04:57

many kinds of statistics we can use boot

play05:00

trrap to create confidence intervals for

play05:02

Point parameters like a regression

play05:03

coefficient or we could create

play05:05

confidence bands for coefficient

play05:07

functions like we might see in

play05:09

functional data analysis the bootstrap

play05:11

is significant not only because of its

play05:13

usefulness but because it highlights the

play05:15

significance of computation in

play05:17

statistics a quote from one of my heroes

play05:20

is very relevant here you see killbots

play05:22

have a preset kill limit knowing their

play05:24

weakness i s wave after wave of my own

play05:27

men at them until they reach their limit

play05:29

and shut down instead of human life

play05:32

statisticians can do a lot just by using

play05:34

Wave After Wave of our own computer's

play05:36

processing power the rise of

play05:38

computational power has made it easier

play05:40

to perform simulations and simulated

play05:42

data allows us to assess experiments and

play05:44

new statistical models for example

play05:47

simulations can be used to assess power

play05:49

and type on error of experimental

play05:51

designs for clinical trials without

play05:53

actually needing to run them this means

play05:55

a lot of money and effort is safe for

play05:57

pharmaceutical companies another example

play05:59

of simulation based inference come from

play06:01

Bean statistics beans encode knowledge

play06:04

in the form of Prior or probability

play06:06

distributions on parameters using these

play06:09

priors we can actually simulate data

play06:11

from a prior distribution and check if

play06:14

the resulting data we collected actually

play06:16

makes sense here this is called a prior

play06:18

predictive check the same can be done

play06:20

for the posterior distribution of a

play06:22

parameter which makes it a posterior

play06:24

predictive check and these are

play06:26

incredibly useful for validating our

play06:28

models

play06:31

to understand this idea we need some

play06:32

context on statistical parameters one

play06:35

way to view parameters is that they are

play06:36

representations of ideas that are

play06:38

important to us within statistical

play06:40

models in a two sample T Test the mean

play06:43

parameter represents the difference of

play06:45

two groups such as a placebo and

play06:47

treatment group in linear regression

play06:49

we're interested in the coefficient

play06:50

associated with treatment which

play06:52

represents the associated change to the

play06:54

outcome that the treatment has

play06:56

statistical models are approximations of

play06:58

the real world but we can we can

play06:59

actually change our models to match the

play07:01

real world a little better one way we

play07:03

can do this is by increasing the number

play07:05

of parameters there are in the model

play07:07

consider the simple linear regression it

play07:10

tells you that the distribution of an

play07:11

outcome shifts according to this

play07:13

coefficient but what if we expect this

play07:16

change to vary over time in the current

play07:18

model there's no parameter for time so

play07:20

this model simply can't capture this

play07:22

complexity we can move up a level by

play07:24

incorporating more parameters into the

play07:26

model and adding a coefficient for both

play07:29

time and the interaction between time

play07:31

and treatment what if we suspect that

play07:33

each individual in the study will react

play07:35

differently to the treatment the current

play07:37

model tells us that this single

play07:38

parameter will explain the change for

play07:40

this population on average to give

play07:43

everyone their own subject specific

play07:44

effect we can make the model even more

play07:47

complex and turn it into a mixed effect

play07:49

model more parameters more flexibility

play07:52

overparameterized models take this idea

play07:54

to the extreme make the model extremely

play07:57

flexible by adding tons and tons of

play07:58

paramet

play08:00

neural networks are a prime example of

play08:02

this each Edge in the neural network is

play08:04

associated with a parameter or weight

play08:06

along with some extra bias parameters we

play08:09

can easily overp parameterize by making

play08:11

these networks very large and by doing

play08:14

so the universal approximation theorem

play08:16

tells us that these networks can

play08:17

approximate a wide variety of functions

play08:20

and this extra flexibility is important

play08:22

because it lets us Model A Wider range

play08:24

of phenomena that simpler models just

play08:26

can't handle one problem with extremely

play08:28

flexible models is that they may start

play08:30

to approximate the data itself rather

play08:32

than representing a more General

play08:34

phenomena we can learn from

play08:36

statisticians employ a regularization

play08:38

techniques which help to balance out

play08:39

this complexity by enforcing that these

play08:41

models maintain some degree of

play08:45

Simplicity multi-level models also known

play08:48

as hierarchical or mixed effect models

play08:51

are models that assume additional

play08:52

structure over the parameters for

play08:55

example multi-level models are commonly

play08:57

used to aggregate several n one trials

play08:59

together each individual is associated

play09:02

with their own treatment effect which

play09:04

will denote data J to indicate that each

play09:06

individual has their own effect these

play09:08

individuals form the second level of the

play09:11

model the first level can be thought of

play09:13

as describing the distribution or

play09:15

structure of these individual effects

play09:17

and in N1 context the first level might

play09:20

be a normal distribution centered at

play09:22

some population treatment effect Theta

play09:24

with some variance Sigma squ in

play09:27

different context the units of the

play09:29

second level of the model could be

play09:30

different things in a study taking place

play09:32

over many locations these may be

play09:34

different hospitals or cities something

play09:37

to indicate a cluster of related units

play09:39

in a Baska trial each second level unit

play09:41

is a specific disease and we suspect

play09:44

that their treatment effects will be

play09:45

similar because they share a common

play09:47

mutation in meta analyses the second

play09:50

level units could be estimated effects

play09:52

from Individual research studies entry

play09:54

Gman says that he used the multi-level

play09:56

model as a way to combine different

play09:58

sources of information into a single

play10:00

analysis this kind of structure is

play10:02

incredibly common in statistics so

play10:05

that's why multi-level models take a

play10:06

spot on the list multi-level models can

play10:08

be both frequentist and beijan so why is

play10:11

beian specifically mentioned in the

play10:13

article but my guess is that the bean

play10:15

framework allows us to incorporate prior

play10:17

knowledge into the models this is

play10:19

especially helpful when deciding on

play10:20

prior for the first level parameters

play10:23

especially on the variants if you choose

play10:24

a wide uninformative prior it encourages

play10:27

the resulting model to treat second

play10:29

level units as being independent of each

play10:31

other on the other hand choosing a

play10:33

narrow and formed prior allows us to

play10:35

pull data together which can help us

play10:37

estimate treatment effects for second

play10:39

level units with small sample sizes

play10:41

being able to choose different priors

play10:44

give statistician much part flexibility

play10:46

in the modeling

play10:48

process a recurring theme among the top

play10:51

eight ideas is the importance of

play10:53

computers and computational power to the

play10:55

development of Statistics advances in

play10:57

technology have allowed more complex

play10:59

models to be invented for harder

play11:01

problems to account for this several

play11:03

important statistical algorithms have

play11:05

been invented to help solve them an

play11:07

algorithm is just a set of steps that

play11:09

can be followed so a statistical

play11:11

algorithm is an algorithm designed to

play11:13

help some statistical problem but there

play11:16

are so many types of statistical

play11:18

problems out there it's hard to get an

play11:20

appreciation for how useful these

play11:21

algorithms are so I'll explain two to

play11:24

give you a taste the expectation

play11:26

maximization algorithm or em algorithm

play11:29

is famously known from the 1977 paper in

play11:32

the Journal of the royal statistical

play11:34

Society another heavy-hitting journal on

play11:36

statistics the EM algorithm solves an

play11:39

estimation problem which is where we

play11:41

need to use data to compute educated

play11:43

guesses about the values of parameters

play11:45

in a model maximum likelihood estimation

play11:48

is another example of an estimation

play11:49

approach what makes the EM algorithm

play11:52

distinct is that it tries to estimate

play11:53

the parameters in a model that we can't

play11:55

solve directly one instance where this

play11:57

can happen is in the case of mixture

play11:59

models with so-called latent classes in

play12:02

this type of model we have data that may

play12:04

come from one of several groups but we

play12:06

don't have the group labels to tell us

play12:08

who belongs where without delving into

play12:10

the details the eem algorithm gives us a

play12:13

way to still estimate the parameters in

play12:15

this model despite not knowing these

play12:17

classes the second example is the

play12:19

Metropolis algorithm and its more modern

play12:21

Descendants the Metropolis algorithm is

play12:23

interesting because its roots actually

play12:25

stem from physics as opposed to

play12:27

statistics the propis algorithm is

play12:30

significant because it lets us generate

play12:31

samples from very complex probability

play12:34

distributions random number generation

play12:36

According to some distribution may seem

play12:38

weird but it's important for

play12:39

statisticians to be able to do so the

play12:42

posterior distribution that comes from

play12:43

Bas rule concern ugly if we turn away

play12:46

from conveniences like conjugate

play12:47

families the posterior can be so ugly

play12:50

that we can't even derive an equation

play12:51

for it but despite this we can still

play12:54

generate samples from a complicated

play12:56

posterior thanks to the Metropolis

play12:57

algorithm even if we don't have a

play12:59

formula for the posterior distribution

play13:02

we can still use the generated data to

play13:04

recover important quantities about the

play13:06

distribution such as the mean the

play13:08

quantiles and credible intervals these

play13:10

two algorithms are just two examples

play13:12

mentioned in the article there are many

play13:13

I couldn't cover and still more that

play13:15

have been developed since this article

play13:17

was

play13:19

written when statisticians designed

play13:21

experiments it used to be a said and

play13:23

forget type of thing fure out the sample

play13:25

size and just run the experiment to

play13:27

completion but Midway through the

play13:29

experiment we might need to stop it

play13:31

under a frequentist framework this would

play13:33

hurt our power and our P value

play13:35

interpretation but in modern times we

play13:38

have a way to account for this adaptive

play13:40

decision analysis is the idea that maybe

play13:42

we don't have to wait for the entire

play13:44

experiment to finish instead we can

play13:47

adapt our experiment Based on data we

play13:49

collect in the interim before it

play13:51

finishes in the context of clinical

play13:53

trials we may decide to stop a trial

play13:55

early if preliminary evidence suggest

play13:57

that the treatment sucks conversely if a

play14:00

treatment shows early promise we can

play14:02

even stop based on efficacy these

play14:04

changes still have to be decided ahead

play14:05

of time to make sure that we make good

play14:07

decisions overall and that the trial is

play14:10

well

play14:12

designed statisticians have to make a

play14:14

lot of assumptions if these assumptions

play14:16

are right or at least plausible then we

play14:18

can feel comfortable trusting the

play14:20

results of statistical analyses stuff

play14:23

like confidence intervals or estimated

play14:25

values but of course assumptions will

play14:27

always be right and it's often hard to

play14:29

even know if they actually are or not

play14:31

and that's where robust inference comes

play14:33

in robust statistics still provides

play14:36

trustworthy statistical analyses even in

play14:38

the face of violated assumptions if we

play14:41

have a robust model then we don't have

play14:43

to be SOI on possibly shaky assumptions

play14:46

the sample median is often cited as a

play14:48

robust estimator for a typical value in

play14:50

a distribution compared to the mean we

play14:52

often hear that the mean is unduly

play14:54

influenced by outliers in a data set and

play14:56

this is true but what assumption do

play14:58

outliers violate many times we assume a

play15:01

distribution to be normal normal

play15:03

distributions have the property that

play15:04

most of their probability is

play15:06

concentrated near the mean you often

play15:08

hear this phrase as the 68 9599 rule

play15:11

outliers challenge this concentration if

play15:14

there's a possibility that there can be

play15:16

many outliers it poses a danger that the

play15:18

data may come from a so-called heavy

play15:20

tail distribution where extreme events

play15:22

are more likely and this would violate

play15:24

the normal distribution assumption in

play15:26

causal inference there's a technique

play15:28

called prop it score matching bity score

play15:30

matching is used to try to match people

play15:32

in a treatment group to people in a

play15:34

control group who are very similar to

play15:36

them by doing this you can produce

play15:38

estimates that better resemble a causal

play15:40

effect propensity score matching

play15:42

requires two models one model to

play15:44

estimate the effect of the treatment on

play15:46

the outcome and another to produce a

play15:48

score that is used to match people

play15:50

together both of these models have to be

play15:52

correctly specified for the results to

play15:54

be useful correct specification

play15:56

essentially means that we choose the Cor

play15:58

correct model for its purpose but this

play16:00

is almost never the case to account for

play16:03

this there are robust versions of

play16:04

propensity score matching that allow for

play16:07

one of these models to be wrong the less

play16:09

assumptions we have to make the better

play16:11

we have to make sure our models can

play16:12

actually account for

play16:15

this yes you read that right we're done

play16:17

with the theory we're done with the

play16:19

computation we're going back to plots

play16:21

and visuals plots give us a way to

play16:23

examine our data and assess our

play16:25

statistical models it's just easier to

play16:27

learn from your data if you can look at

play16:29

it rather than just have it in a CSV but

play16:31

it's undeniable that the skill is

play16:33

important part in any statisticians or

play16:35

data scientists toolkit there's even an

play16:37

entire Paradigm of art programming

play16:39

dedicated to formalizing exploratory

play16:41

data analysis there are people who code

play16:44

in boring base R and then there's people

play16:46

who code using the tidyverse framework

play16:49

popularized by the god Hadley Wickham

play16:51

the tidyverse set of packages makes it

play16:53

extremely easy to get your data into R

play16:55

clean it and visualize it I highly

play16:58

recommend learning it and I hope to have

play16:59

a more in-depth video on it in the

play17:02

future what does it mean for an idea to

play17:05

be important at first I thought that a

play17:07

statistical idea would be important if

play17:09

the paper that introduced it was cited

play17:11

many times but this was not the case the

play17:14

author specifically mentioned avoiding

play17:16

citation counts rather they view

play17:18

important ideas as those that influence

play17:21

the development of ideas that have

play17:23

influenced statistical practice I highly

play17:25

recommend reading the original article

play17:27

it's free to read online and atro

play17:28

gelman's blog you can just Google most

play17:31

important ideas and statistics and look

play17:33

for his name this video only covers part

play17:36

of the article it's full of citations so

play17:38

readers can pick it up and read more

play17:40

about a particular bullet point that

play17:41

they were interested in other articles

play17:43

have even performed actual statistical

play17:46

analyses to answer this question if you

play17:48

think the author's missed a cool idea

play17:49

tell me about it in the comments I hope

play17:51

that I've showed you that statistics

play17:53

didn't stop with the two sample T tests

play17:55

and the new regression new technologies

play17:58

create new types of data so statistics

play18:00

needs to innovate to keep up if you

play18:02

think I've earned it please like the

play18:03

video and subscribe to the channel for

play18:05

more I've also started a newsletter to

play18:07

accompany the YouTube channel so that

play18:09

people can get my videos delivered

play18:11

straight to their inbox I'll see you all

play18:13

on the next

play18:14

[Music]

play18:24

one

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
StatisticalInnovationsCausalInferenceBootstrapMethodComputationalStatisticsOverparameterizationNeuralNetworksRegularizationMultiLevelModelsAdaptiveDecisionAnalysisRobustStatisticsDataVisualizationExploratoryDataAnalysisStatisticalAlgorithmsEMAlgorithmMetropolisAlgorithmPropensityScoreMatchingStatisticalPracticeAndrewGelmanAkiVitariModernStatisticsStatisticalEducationStatisticalResearchStatisticalMethodsStatisticalSoftwareDataScience