Shapiro Wilk in SPSS

Prof. Essa

28 Feb 202506:09

Summary

TLDRThis video explains the Shapiro-Wilk test, a statistical method used to determine if a sample comes from a normally distributed population. It covers setting up the test in SPSS, interpreting descriptive statistics, and examining Shapiro-Wilk results, emphasizing that small sample sizes (≤50) are ideal for this test. Through practical examples with height and purchase data, the video demonstrates how to identify normality, recognize outliers, and interpret histograms and QQ plots. Key insights include the sensitivity of the test to sample size and guidance on when to use the Kolmogorov-Smirnov test for larger datasets, offering viewers clear, actionable steps for assessing normality.

Takeaways

😀 The Shapiro-Wilk test is used to determine if a sample comes from a normal distribution.
😀 The null hypothesis for the Shapiro-Wilk test is that the data follows a normal distribution.
😀 Sample size has a bias in the Shapiro-Wilk test; larger samples are more likely to produce statistically significant results.
😀 It is recommended to use the Shapiro-Wilk test for small sample sizes (n ≤ 50).
😀 If the sample size is large (n ≥ 250), consider using the Kolmogorov-Smirnov (KS) test instead of Shapiro-Wilk.
😀 To perform the test in SPSS, move variables to the ‘Dependent List’ and choose ‘Explore’ under Descriptive Statistics.
😀 The ‘Options’ menu in SPSS allows you to control how missing values are treated, either by excluding cases listwise or pairwise.
😀 Visualizations such as histograms and QQ plots help confirm the results of the Shapiro-Wilk test by showing the distribution of the data.
😀 A small p-value (typically < 0.05) from the Shapiro-Wilk test indicates the data is not normally distributed, leading to rejection of the null hypothesis.
😀 For a sample of heights, the Shapiro-Wilk test did not show significant evidence against normality, meaning the data is likely normally distributed.
😀 For a sample of recent purchases, the Shapiro-Wilk test showed a tiny p-value, suggesting the data is not normally distributed, confirmed by skewed histograms and QQ plots.
😀 Be aware of extreme outliers in your data, as they can influence the results of normality tests and visualizations.

Q & A

What is the purpose of the Shapiro-Wilk test?
-The Shapiro-Wilk test is used to determine whether a random sample comes from a normally distributed population. It tests the null hypothesis that the data is normally distributed.
Why is the Shapiro-Wilk test recommended for small sample sizes?
-The test is sensitive to sample size. For small samples (≤50), it reliably detects non-normality. Large samples tend to produce statistically significant results even for minor deviations from normality.
What is the null hypothesis in the Shapiro-Wilk test?
-The null hypothesis states that the population from which the sample is drawn is normally distributed.
How should missing data be handled in the test setup?
-You can handle missing data in two ways: 'exclude cases listwise' excludes any case with missing values, while 'exclude cases pairwise' calculates statistics using all non-missing values for each variable individually.
What do the significance values in the Shapiro-Wilk test indicate?
-A small p-value (typically <0.05) indicates that the data significantly deviates from normality, leading to rejection of the null hypothesis. A larger p-value suggests the data is likely normally distributed.
How can histograms and QQ plots help in evaluating normality?
-Histograms provide a visual sense of the data distribution, while QQ plots show how closely data points follow a normal distribution line. Together, they visually confirm the Shapiro-Wilk test results.
What can extreme variance and range in data suggest about normality?
-Extremely high variance or a large range, especially compared to similar datasets, may indicate skewed data or outliers, suggesting that the data may not be normally distributed.
Why might the Shapiro-Wilk test and KS test give different results?
-The Shapiro-Wilk test is more suitable for small samples, while the KS test is better for large samples (≥250). Differences in sensitivity to sample size and distribution shape can lead to different outcomes.
How should outliers be assessed when testing for normality?
-Outliers can distort normality tests. Checking extreme values in the data and visualizing distributions with histograms or box plots can help identify potential outliers that influence the test results.
What did the example results for heights and purchases demonstrate?
-The heights dataset had a p-value above 0.05, indicating approximate normality, while the purchases dataset had a tiny p-value, indicating significant deviation from normality due to skewness and extreme values.