Stock Market Forecasting
Tech Stack: Python, Pandas, NumPy, Matplotlib, Scikit-learn, yfinance, Statsmodels, TensorFlow/Keras
This project compares statistical and deep learning approaches for stock price forecasting, implementing ARIMA, SARIMA, and LSTM models to predict daily prices for three major tech stocks (GOOGL, AAPL, AMZN) and achieving 80-90% performance improvements with deep learning.
Problem Statement
Stock price prediction is one of the most challenging problems in financial analysis due to market volatility and complex temporal dependencies. Traditional statistical models often struggle with these complexities, while deep learning approaches show promise for capturing sequential patterns. This project systematically compares these methodologies to determine which approach delivers superior forecasting accuracy for short-term price prediction.
Overview
- Goal: Compare forecasting performance across different model architectures.
- Data: Daily stock price data for GOOGL, AAPL, and AMZN via yfinance API.
- Models: ARIMA, SARIMA, LSTM Neural Networks.
- Evaluation: RMSE, MAE across all three stocks.
- Best RMSE: LSTM achieved 2.90 (AMZN), 3.37 (GOOGL), 4.38 (AAPL).
Technical Approach
Data Engineering: Collected and preprocessed historical stock data with proper scaling and normalization for neural network training.
Feature Engineering: Created lag features, rolling statistics (moving averages, volatility), and technical indicators to capture market dynamics.
Statistical Modeling: Implemented ARIMA and SARIMA models using Statsmodels, with proper stationarity testing and parameter optimization.
Deep Learning Architecture: Designed LSTM neural network with multiple layers, dropout regularization, and optimized sequence length for temporal pattern recognition.
Model Validation: Used time-based train/validation/test splits to prevent data leakage and ensure realistic evaluation.
Key Achievements
- Successfully implemented and compared three different forecasting approaches.
- Demonstrated LSTM's superior ability to capture non-linear temporal dependencies.
- Built robust evaluation framework with multiple performance metrics.
- Created scalable pipeline for multi-stock analysis and comparison.
- Achieved consistent performance improvements across all three tech stocks.
Results & Analysis
The LSTM model dramatically outperformed traditional statistical methods across all three stocks:
- AAPL: LSTM RMSE (4.38) vs ARIMA (26.54) - 83% improvement
- AMZN: LSTM RMSE (2.90) vs ARIMA (33.42) - 91% improvement
- GOOGL: LSTM RMSE (3.37) vs ARIMA (27.11) - 88% improvement