MINI LECTURE 13 - Claims that Violence Has Dropped Are Not Statistical
Summary
TLDRThe script discusses common statistical errors when analyzing data from 'Extremistan', where large outliers are common. It critiques Stephen Pinker's claims about declining violence without sufficient data. The speaker emphasizes the need for large sample sizes in extreme environments and the importance of robust historical data. They also highlight the challenges in quantitative historiography, such as inflated estimates and the revision of historical events, and propose a method of synthetic history to address these issues.
Takeaways
- 📊 The script discusses the challenges of using statistical methods to analyze data from 'Extremistan', where traditional statistical techniques may not apply due to the presence of extreme outliers.
- 📚 It critiques Stephen Pinker's work, suggesting that his claims about the decline of violence are not scientifically substantiated with sufficient data.
- 🔍 The importance of distinguishing between 'Mediocristan' and 'Extremistan' is highlighted, where 'Mediocristan' refers to distributions with thin tails that quickly converge to a Gaussian distribution, while 'Extremistan' requires much larger sample sizes for statistical claims.
- 📉 The concept of 'fat tails' in statistical distributions is explained, indicating that events with large impacts are much more common than what traditional statistical models predict.
- ⏳ The mean inter-arrival time for events causing extreme violence (like those affecting 20 million or 50 million people) is discussed, emphasizing the long periods that may elapse between such events.
- 🚫 It is argued that one cannot claim a decrease in violence based on current data, as the nature of violence is extremely fat-tailed, and the sample size of historical data is insufficient.
- 🗓️ The script touches on the unreliability of historical records and the inflation of numbers over time, which can distort the analysis of past violent events.
- 🔬 A methodological approach using synthetic histories and bootstrapping is introduced to account for variability in historical data and to ensure robust statistical findings.
- 📈 The script mentions the technical problem of working with data that does not strictly follow a power law due to bounded support and introduces a log transformation to address this issue.
- 🌐 The findings from the analysis of violence are compared to other areas such as pandemics and financial markets, suggesting that pandemics may have even fatter tails than wars.
- 📝 The speaker emphasizes the need for careful definition of events and robustness checks in historiography to ensure that statistical claims are valid and not influenced by potential inaccuracies in historical data.
Q & A
What is the main issue discussed in the script regarding statistical claims about violence?
-The script discusses the issue of making statistical claims about a decrease in violence without sufficient data, highlighting that in 'extremistan', where data points are few and far between, the law of large numbers doesn't apply effectively, making it difficult to establish a statistically significant trend.
What is the critique of Stephen Pinker's approach in his book regarding the decline of violence?
-The critique is that Pinker did not establish a solid statistical basis for his claim that violence has declined. The script suggests that his theories are based on limited data points and may not be scientifically robust, especially when compared to the requirements of statistical analysis in 'extremistan'.
What is the concept of 'Mediocrestan' and 'Extremistan' as mentioned in the script?
-In the script, 'Mediocrestan' refers to a class of distributions that are thin-tailed and quickly converge to the Gaussian basin, where the law of large numbers works effectively. 'Extremistan', on the other hand, refers to distributions with fat tails, where data points are rare and extreme, requiring a much larger sample size for statistical claims.
What does the script suggest about the reliability of historical data on violence?
-The script suggests that historical data on violence can be unreliable due to factors such as inflated numbers for political reasons, miscalculations based on tax records, and the difficulty in accurately estimating the magnitude of historical events like wars and rebellions.
How does the script address the problem of defining 'events' in the context of violence?
-The script addresses the problem by defining events in terms of 'exceedance', such as events that kill more than a certain number of people relative to today's population. It emphasizes the importance of being careful with the definition of events to avoid inaccuracies in statistical analysis.
What is the significance of the mean inter-arrival time mentioned in the script?
-The mean inter-arrival time is significant as it provides an average measure of how long it takes for a large-scale violent event to occur. The script uses this measure to argue that a much larger sample size, such as a century or more, is needed to make a statistical claim about a decrease in such events.
How does the script discuss the issue of autocorrelation in the context of violent events?
-The script mentions that the autocorrelation function of violent events is mostly noise, indicating that the occurrence of one event does not predict the occurrence of another. This suggests that violent events are memoryless and do not follow a predictable pattern.
What is the method used in the script to account for variations in historical estimates of violent events?
-The script describes a method of creating synthetic historical paths by using both the lower and higher estimates for the number of casualties in violent events, as well as randomizing between these estimates. This approach is used to test the robustness of statistical findings across different scenarios.
What is the technical problem mentioned in the script regarding the use of power laws in analyzing violence?
-The technical problem is that power laws require a random variable to have support between a number and infinity, but in the case of violence, there are both lower and upper bounds. The script mentions a solution involving log transformation to work with a non-power law that has power law properties under transformation.
How does the script compare the fat-tailed nature of pandemics to wars and financial derivatives?
-The script suggests that pandemics have an even fatter tail than wars, implying that they are rarer but potentially more extreme events. It also mentions that financial derivatives and the size of cities have fat tails but are less extreme than pandemics.
Outlines

此内容仅限付费用户访问。 请升级后访问。
立即升级Mindmap

此内容仅限付费用户访问。 请升级后访问。
立即升级Keywords

此内容仅限付费用户访问。 请升级后访问。
立即升级Highlights

此内容仅限付费用户访问。 请升级后访问。
立即升级Transcripts

此内容仅限付费用户访问。 请升级后访问。
立即升级浏览更多相关视频

The Effects of Outliers on Spread and Centre (1.5)

Statistics made easy ! ! ! Learn about the t-test, the chi square test, the p value and more

Statistical inference for spatial autocorrelation: Moran’s I & LISA

Errors in Sampling and Data Collection

CONCEITOS FUNDAMENTAIS DA ESTATÍSTICA

Audit Sampling: Sampling and Non-Sampling Risk CPA Exam
5.0 / 5 (0 votes)