← Back to Blog

Predicting Stock Markets with Machine Learning: Building Stocky

Stocky ML Stock Prediction Platform

Can machine learning really predict stock market movements? I built a full-stack ML platform to find out. Spoiler alert: markets are harder to predict than you'd think—but the journey taught me invaluable lessons about feature engineering, model deployment, and realistic ML expectations.

The Challenge

Stock market prediction has captivated developers and data scientists for decades. The promise is tantalizing: use historical price data and technical indicators to forecast tomorrow's movements, potentially making profitable trading decisions. But the reality is far more nuanced.

I set out to build Stocky—an educational stock prediction platform that combines 30+ technical indicators with Random Forest machine learning to predict whether a stock will go UP or DOWN the next day. The goal wasn't to get rich (markets are efficient, after all), but to understand the full pipeline from data collection to model deployment in a production web application.

52%
Model Accuracy
30+
Technical Indicators
2,700+
Training Days
300
Decision Trees

Architecture: Full-Stack ML Pipeline

Stocky is built as a complete machine learning web application with three main layers:

1. Frontend: React + Tailwind CSS

The user interface needed to be fast, responsive, and intuitive. I used React 19 with Tailwind CSS for rapid development and a modern look. Key features include:

  • Lazy component loading: 12+ heavy components (Portfolio, Paper Trading, Risk Metrics) loaded on-demand using React.lazy() to keep initial bundle size small
  • Interactive charts: Recharts library for beautiful, responsive price history visualization
  • Real-time predictions: Instant feedback with confidence scores and probability breakdowns
  • Dark mode: Theme toggle with localStorage persistence for user preference
  • Error recovery: Comprehensive error boundaries and retry logic with exponential backoff

2. Backend: Flask + Python ML

The Flask backend serves as the bridge between the frontend and machine learning models. I designed 14 REST API endpoints covering:

  • Core predictions: Single and batch predictions with confidence scoring
  • Market data: Historical OHLCV data via yfinance API integration
  • Risk analytics: Sharpe ratio, volatility, Beta, max drawdown, VaR calculations
  • News integration: Real-time stock news feeds for context
  • User authentication: JWT-based auth with secure password hashing (werkzeug)
  • Search functionality: Stock symbol/name autocomplete
# Example: Intelligent Model Routing
def get_appropriate_model(symbol):
    if '-USD' in symbol: # Cryptocurrency
        return load_model('crypto_model.pkl')
    elif is_etf(symbol):
        return load_model('spy_model.pkl')
    else: # Individual stocks
        return load_model('enhanced_model.pkl')

3. ML Models: Random Forest + Feature Engineering

The heart of Stocky is its machine learning system. I trained multiple Random Forest classifiers on 10 years of market data (2015-2025), with each model specialized for different asset types.

Feature Engineering: The Secret Sauce

Raw price data alone isn't enough for meaningful predictions. The key is engineering features that capture market behavior patterns. I implemented 30 technical indicators across five categories:

Price-Based Features

  • Returns: Daily, 2-day, and 5-day percentage changes
  • Momentum: 10-period price momentum and acceleration
  • Gaps: Opening price vs. previous close (market sentiment)
  • Intraday range: High-low spread relative to close

Moving Averages

  • Simple Moving Averages (SMA): 5, 20, and 50-period averages
  • Exponential Moving Averages (EMA): 12 and 26-period for trend detection
  • Ratio features: SMA_5/SMA_20 and SMA_20/SMA_50 to capture trend strength
  • Distance metrics: How far current price is from key moving averages

Volatility Indicators

  • Bollinger Bands: Upper, lower, and position within bands
  • ATR (Average True Range): Measures market volatility
  • Historical volatility: Standard deviation of returns

Momentum Oscillators

  • RSI (Relative Strength Index): Overbought/oversold conditions (0-100)
  • MACD: Moving Average Convergence Divergence with signal line and histogram
  • Stochastic Oscillator: Price position relative to range
  • Williams %R: Alternative momentum indicator
  • Rate of Change (ROC): Velocity of price changes

Volume Analysis

  • Volume Ratio: Current volume vs. 20-day average
  • On-Balance Volume (OBV): Cumulative volume-based pressure
  • Money Flow Index (MFI): Volume-weighted RSI
  • Volume Change: Day-over-day volume differences
Pro Tip: Feature engineering is more impactful than model selection. I spent 70% of my time crafting features and only 30% tuning models. Quality features beat complex algorithms every time in finance ML.

The Hard Truth: 52% Accuracy

After training on 2,700 days of SPY data with 300 decision trees and extensive hyperparameter tuning, my model achieved 51.88% test accuracy. That's barely better than a coin flip (50%). Why?

  • Market efficiency: Public technical indicators are already priced in
  • External factors: News, earnings, macro events dominate short-term moves
  • Non-stationarity: Market behavior patterns change over time (2008 ≠ 2020 ≠ 2025)
  • Black swan events: COVID-19, financial crises aren't predictable from price patterns
  • Noise vs. signal: Intraday volatility is mostly random walk

This was actually the most valuable lesson: realistic ML expectations. Media hype suggests AI can predict anything with 90%+ accuracy. Reality is messier. Financial markets are partially efficient, making consistent prediction nearly impossible using only technical analysis.

What I Learned: The goal of Stocky isn't to make money—it's educational. Understanding WHY prediction is hard is more valuable than claiming false accuracy. Honest ML means acknowledging limitations.

Technical Challenges & Solutions

Challenge 1: Large Model File Deployment

Problem: scikit-learn models are 14-79 MB—too large for Git repositories without special handling.

Solution: Implemented Git LFS (Large File Storage) with automatic fallback download from GitHub if model isn't found locally. Backend checks for .pkl files on startup and auto-downloads from LFS URL if missing.

Challenge 2: API Rate Limiting

Problem: yfinance API throttles repeated requests for market data.

Solution: Built intelligent caching system with symbol-timeframe keys (e.g., "AAPL-1mo"). Historical data cached after first request, providing instant subsequent loads. Added batch prediction endpoint to get multiple symbols with single API call.

Challenge 3: Frontend Performance

Problem: Loading 20+ React components caused slow initial page load (3-4 seconds).

Solution: Implemented React.lazy() for code splitting. Only core prediction UI loads initially; advanced panels (Portfolio, Paper Trading, Risk Metrics) loaded on-demand when user clicks tabs. Reduced initial bundle from 500KB to 180KB.

Challenge 4: Cross-Origin Resource Sharing (CORS)

Problem: React frontend (Vercel) couldn't call Flask backend (Railway) due to CORS restrictions.

Solution: Configured Flask-CORS with specific origin whitelist, credential support, and preflight headers. Added environment variable for dynamic API_URL configuration between local dev and production.

Deployment: Railway + Vercel

Stocky uses a dual-platform deployment strategy:

  • Backend (Railway): Python Flask app with Gunicorn server, environment variables for database and secrets, automatic deployment on Git push
  • Frontend (Vercel): React app with optimized build, environment variable for API URL, serverless edge network for fast global access

This separation allows independent scaling and deployment of frontend vs. backend, with clear separation of concerns.

Key Features Beyond Predictions

Watchlist Management

Users can save favorite stocks with bulk operations (add/remove multiple symbols), export to CSV, and automatic prediction updates when viewing watchlist.

Paper Trading Simulator

Practice trading strategies with virtual $10,000 account. Execute buy/sell orders, track P&L, and visualize performance—all without risking real money.

Risk Analytics Dashboard

Calculate professional risk metrics: Sharpe ratio (risk-adjusted returns), Beta (market correlation), maximum drawdown, volatility, and Value-at-Risk (VaR).

Portfolio Tracking

Monitor hypothetical holdings, calculate returns, track performance vs. benchmarks, and visualize asset allocation.

Lessons Learned

  1. Feature engineering > Model complexity: Random Forest with good features beats neural networks with raw data
  2. Market prediction is inherently limited: 52% accuracy is realistic; beware anyone claiming 90%+
  3. Production ML requires DevOps: Model versioning, caching, error handling, monitoring—not just training
  4. User experience matters: Fast load times, error recovery, loading states, responsive design
  5. Full-stack integration is hard: CORS, authentication, state management, deployment—lots of moving parts
  6. Educational honesty: Acknowledge limitations; don't oversell capabilities

What's Next?

Future enhancements I'm considering:

  • LSTM/Transformer models: Deep learning for time-series prediction
  • Sentiment analysis: Integrate news/social media sentiment
  • Backtesting engine: Evaluate strategy performance over historical periods
  • Real-time WebSockets: Live market data updates
  • Advanced charting: TradingView integration for technical analysis

Conclusion

Building Stocky taught me that machine learning isn't magic—it's a tool with real limitations, especially in unpredictable domains like finance. The 52% accuracy is honest: barely better than random chance. But the journey was invaluable for learning:

  • Full-stack ML deployment (Flask + React + scikit-learn)
  • Feature engineering and domain expertise importance
  • Production-grade error handling and caching
  • Realistic ML expectations and ethical communication
  • User-focused application design

The real prediction? Markets will remain unpredictable. But understanding WHY through hands-on ML projects makes you a better developer and data scientist.

Try Stocky

Explore stock predictions and see machine learning in action. Remember: for educational purposes only—not investment advice!