Machine learning for daily realised volatility prediction - Alexandra Gkolia

UCL Financial Computing

23 Sept 202209:48

Summary

TLDRThis Master's project in Computational Finance uses machine learning to predict daily realized volatility, aiming to improve on traditional models like GARCH. The project explores LightGBM, LSTM, and Gradient Boosting models with data from 1,000 U.S. stocks over 10 years, including the 2008 financial crisis. Key findings include LightGBM achieving an 88.99% accuracy, outperforming previous models by 5%. Feature importance analysis highlighted external factors like treasury bill rates and firm-specific data as crucial for prediction. The study concludes that machine learning provides more flexible and accurate predictions than conventional approaches, with potential for further improvement through feature optimization and hybrid models.

Takeaways

😀 The project focused on predicting daily realized volatility using machine learning, aiming to improve out-of-sample performance compared to traditional models like GARCH.
😀 The study used a 10-year dataset (2004–2013) of 1,000 U.S. stocks, which included daily and firm-specific features.
😀 Unlike previous studies that used monthly data, this project worked with daily data, which led to improved model performance.
😀 Key machine learning models used included Gradient Boosted Regression Trees (GBRT), Long Short-Term Memory (LSTM) networks, and LightGBM (a more efficient version of GBRT).
😀 The LightGBM model outperformed the others, achieving an accuracy of 88.99%, representing a 5% improvement over the previous study by Philippovich and colleagues (2021).
😀 External features, such as treasury bill rates and median market volatility, were found to be important predictors of volatility and improved model accuracy.
😀 It was more challenging to predict volatility in high-volatility stocks, and models trained on low-volatility stocks tended to perform better.
😀 Feature selection was an essential part of the process, where highly correlated features were reduced, and only relevant predictors were kept.
😀 The project demonstrated that machine learning models do not require assumptions about data distribution, offering an advantage over traditional volatility models.
😀 The study identified areas for future improvement, including more aggressive feature selection, better hyperparameter optimization, and exploring hybrid and non-linear models like NARX for better accuracy.

Q & A

What was the main goal of your project?
-The main goal of the project was to predict daily realized volatility using machine learning methods. The focus was on improving prediction accuracy over traditional volatility models like GARCH, which often struggle with out-of-sample performance.
How is realized volatility defined in your project?
-Realized volatility in this project is defined as the square root of the sum of squared daily returns over a 21-day period. This measure captures how much an asset's price fluctuates over the specified time window.
Why did you choose to work with machine learning models instead of traditional volatility models?
-Traditional models like GARCH and linear realized volatility models often have poor out-of-sample performance, particularly during periods of market stress (e.g., the 2008 financial crisis). Machine learning models, on the other hand, can potentially offer more accurate predictions by learning complex patterns from data.
What are some of the key differences between your approach and the original paper you based your project on?
-The key difference is that while the original paper by Philippovich et al. used monthly data, I worked with daily frequency data. Additionally, I used 10 years of data (2004-2013), which includes the 2008 financial crisis, a period where traditional models often fail to perform well.
How did you handle the data for your analysis?
-I used data from the top 1,000 U.S. stocks by market capitalization. Data processing included standardizing the dataset, removing outliers (first and 199th percentiles), and selecting features by eliminating highly correlated variables to improve model performance.
Which machine learning models did you investigate for predicting volatility?
-I investigated several models, including Gradient Boosted Regression Trees (GBRT), LightGBM (an efficient variant of GBRT), Long Short-Term Memory (LSTM) networks, and a bucketing approach that separated stocks into high and low volatility groups.
Which model performed best in your project, and what was its accuracy?
-The best-performing model was LightGBM, which achieved an accuracy of 88.99% in the best case. This was a 5% improvement over the model results presented in the original paper by Philippovich et al.
What challenges did you encounter when predicting volatility for high-volatility stocks?
-The models struggled more with predicting volatility for high-volatility stocks compared to low-volatility stocks. This may be due to the increased unpredictability and more complex behavior of high-volatility stocks during turbulent market periods.
How did external factors impact the performance of your models?
-External factors, such as the treasury bill rate, overnight returns, and firm size, played a significant role in improving prediction accuracy. Even without using volatility features themselves, the models achieved 66% accuracy, showing that external factors are crucial for predicting volatility.
What are some potential areas for further improvement in your approach?
-Further improvements could involve more aggressive feature selection to reduce the feature space and enhance efficiency. Additionally, optimizing hyperparameters more effectively and investigating hybrid models, such as NARX (Non-Linear Auto-Regressive Exogenous) models, could also improve performance.

Outlines

plate

This section is available to paid users only. Please upgrade to access this part.

Mindmap

plate

This section is available to paid users only. Please upgrade to access this part.

Keywords

plate

This section is available to paid users only. Please upgrade to access this part.

Highlights

plate

This section is available to paid users only. Please upgrade to access this part.

Transcripts

plate

This section is available to paid users only. Please upgrade to access this part.

Browse More Related Video

ML.NP1.1 Diabetes Prediction Part - 1

Pengantar Data sains Pert. 4

End to End Heart Disease Prediction with Flask App using Machine Learning by Mahesh Huddar

L8 Part 02 Jenis Jenis Learning

Decision Tree and Logistic Regression using RapidMiner Studio ( Gyanadipta Mohanty 19BCE1224)

Generative AI For Developers | Generative AI Series

Rate This

★

★

★

★

★

5.0 / 5 (0 votes)

Related Tags

Machine LearningVolatility PredictionComputational FinanceStock MarketLSTMGradient BoostingData ScienceFinancial ModelingQuantitative AnalysisFinancial CrisisFeature Engineering