Stocky - ML Stock Prediction Platform
Educational stock market prediction tool using machine learning and technical analysis
Project Overview
Stocky is a full-stack machine learning web application that predicts next-day stock price movements using advanced technical analysis and Random Forest classification. The platform provides UP/DOWN directional predictions with confidence scores for stocks, ETFs, and cryptocurrencies, serving as an educational tool to understand market prediction limitations and machine learning applications in finance.
Built with a Python Flask backend and React frontend, Stocky demonstrates production-grade ML deployment, featuring intelligent model routing, real-time market data integration, and comprehensive risk analytics. The application supports user authentication, watchlist management, paper trading, and portfolio tracking, all wrapped in a mobile-responsive interface with dark mode support.
Architecture Overview
System Flow
- User Request: Frontend sends prediction request with stock symbol
- API Routing: Flask backend receives request, validates symbol
- Data Fetching: yfinance API downloads latest market data
- Feature Engineering: Calculate 30 technical indicators (RSI, MACD, Bollinger Bands, etc.)
- Model Selection: Intelligent routing picks appropriate model (crypto vs stock)
- Prediction: Random Forest classifier generates UP/DOWN prediction with confidence
- Response: JSON payload returned with prediction, confidence, and metadata
- Visualization: React displays result with charts, risk metrics, and historical data
Key Features
🎯 Smart Predictions
UP/DOWN directional forecasts with probability-based confidence scores for stocks, ETFs, and cryptocurrencies using ensemble ML models.
📊 Technical Analysis
30+ indicators including RSI, MACD, Bollinger Bands, Stochastic Oscillator, ATR, MFI, OBV, and momentum-based features.
👁️ Watchlist Manager
Save favorite stocks, bulk operations, export functionality, and automatic prediction updates for tracked symbols.
📈 Portfolio Tracking
Monitor hypothetical holdings, calculate returns, track performance metrics, and visualize portfolio allocation.
📄 Paper Trading
Simulated trading environment with virtual account, order execution, and P&L tracking without risking real capital.
⚠️ Risk Analytics
Calculate Sharpe ratio, volatility, maximum drawdown, Beta, and Value-at-Risk (VaR) for comprehensive risk assessment.
📰 Real-time News
Integrated news feeds for selected stocks with sentiment context to inform trading decisions.
🔐 User Authentication
JWT-based authentication system with secure password hashing, user profiles, and data persistence.
📱 Mobile Responsive
Fully optimized for mobile devices with touch-friendly interfaces and adaptive layouts.
🌙 Dark Mode
Toggle between light and dark themes with smooth transitions and persistent user preferences.
⚡ Performance
Lazy component loading, data caching, code splitting, and optimized bundle size for fast load times.
🔄 Batch Processing
Get predictions for multiple symbols simultaneously with single API call for efficiency.
Technology Stack
Backend (Python)
- Flask 2.3 - Web framework
- scikit-learn - ML models
- pandas/numpy - Data processing
- yfinance - Market data API
- Peewee - SQLite ORM
- PyJWT - Authentication
- Gunicorn - Production server
- XGBoost/CatBoost - Ensemble models
Frontend (JavaScript)
- React 19 - UI framework
- Tailwind CSS - Styling
- Axios - HTTP client
- Recharts - Data visualization
- Lucide React - Icons
- Context API - State management
- React.lazy - Code splitting
- localStorage - Persistence
Machine Learning
- Random Forest (300 estimators)
- Gradient Boosting
- Feature Selection (SelectKBest)
- Time-series Cross-Validation
- Class Balancing (market bias)
- Model Versioning
- Ensemble Voting
- Hyperparameter Tuning
Deployment
- Railway - Backend hosting
- Vercel - Frontend hosting
- Git LFS - Model file storage
- Environment Variables
- CORS Configuration
- Production Optimization
- Auto-deployment (CI/CD)
- Error Monitoring
Machine Learning Implementation
Model Architecture
Stocky uses a Random Forest Classifier with 300 decision trees, trained on approximately 2,700 days of SPY (S&P 500 ETF) historical data from 2015-2025. The model achieves 51.88% test accuracy, slightly better than random chance (50%), which is realistic given the inherent unpredictability of financial markets.
Feature Engineering (30 Technical Indicators)
Price-Based Features:
• Daily Returns, 2-day/5-day returns
• Price momentum (10-period)
• Price acceleration
• Gap detection (open vs previous close)
Moving Averages:
• SMA_5, SMA_20, SMA_50
• EMA_12, EMA_26
• Ratio-based (SMA_5_20_Ratio, SMA_20_50_Ratio)
• Distance metrics (Price_to_SMA5, Price_to_SMA20)
Volatility Indicators:
• Bollinger Bands (position, upper, lower)
• Average True Range (ATR)
• Historical volatility (std dev)
Momentum Indicators:
• RSI (Relative Strength Index)
• MACD + Signal + Histogram
• Stochastic Oscillator
• Williams %R
• Rate of Change (ROC)
Volume Indicators:
• Volume Ratio (current/20-day avg)
• On-Balance Volume (OBV)
• Money Flow Index (MFI)
• Volume Change
Intelligent Model Routing
The system automatically selects the optimal prediction model based on asset type:
- Cryptocurrencies (-USD suffix): Uses specialized high-volatility models trained on BTC/XRP data
- Stocks & ETFs: Uses enhanced SPY model with 300-tree Random Forest
- Fallback Mechanism: Auto-downloads from Git LFS if model not found locally
Model Performance Metrics
Educational Note: The 52% accuracy reflects realistic market prediction limitations. Stock markets are influenced by countless factors beyond technical indicators—macro events, news, sentiment, and randomness. This project demonstrates ML application in finance while acknowledging prediction challenges.
API Endpoints
Core Prediction Endpoints
GET /api/predict/<symbol> - Get UP/DOWN prediction with confidence
GET /api/historical/<symbol> - OHLCV historical data
GET /api/info/<symbol> - Company information
POST /api/predict/batch - Batch predictions (multiple symbols)
GET /api/news/<symbol> - Real-time stock news
GET /api/search?q=<query> - Search stocks by name/symbol
Advanced Features
GET /api/risk-metrics/<symbol> - Sharpe, volatility, beta, VaR, max drawdown
GET /api/assets - List of supported assets
GET /api/model/info - Model metadata (type, features, estimators)
GET /api/health - Health check with version info
Authentication Endpoints
POST /api/auth/register - Create user account
POST /api/auth/login - Login (returns JWT token)
GET /api/auth/me - Get user profile (requires token)
PUT /api/auth/update - Update profile (requires token)
POST /api/auth/follow/<user_id> - Follow user (requires token)
POST /api/auth/unfollow/<user_id> - Unfollow user (requires token)
Technical Highlights
🎨 Lazy Component Loading
12+ React components loaded on-demand using React.lazy() and Suspense, reducing initial bundle size and improving page load performance significantly.
💾 Smart Data Caching
Historical data cached by symbol-timeframe keys, providing instant subsequent loads and reducing API calls to rate-limited yfinance endpoints.
🔄 Error Recovery System
Comprehensive error handling with exponential backoff retry logic, network status detection, toast notifications, and error boundaries to prevent crashes.
📱 Adaptive Charting
Dynamic interval selection based on timeframe (5-min for intraday, daily for months, weekly for years) with intelligent data point reduction.
🔐 JWT Authentication
Token-based API authentication with secure password hashing (werkzeug), localStorage persistence, and Context API state management.
🚀 Production Deployment
Separate Railway (backend) and Vercel (frontend) deployments with environment variables, auto-deployment on Git push, and Git LFS for model files.
Development Process
Challenges & Solutions
Challenge: Large Model File Deployment
Problem: ML model files (14-79 MB) too large for standard Git repositories
Solution: Implemented Git LFS (Large File Storage) with automatic fallback download mechanism from GitHub on first API request
Challenge: Prediction Accuracy
Problem: Initial models showed only 46% accuracy (worse than random)
Solution: Enhanced feature engineering with 30 indicators, class balancing, hyperparameter tuning, and specialized crypto models—reached 52% accuracy
Challenge: Frontend Performance
Problem: Heavy components caused slow initial page load
Solution: Implemented lazy loading with React.lazy(), code splitting, data caching, and intelligent chart data reduction to max 200 points
Challenge: API Rate Limiting
Problem: yfinance API throttles repeated requests
Solution: Built caching system with symbol-timeframe keys, batch prediction endpoint, and intelligent data reuse across components
Future Enhancements
- Deep learning models (LSTM, Transformer) for time-series prediction
- Sentiment analysis integration from news/social media
- Backtesting system to evaluate strategy performance
- Real-time WebSocket updates for live market data
- Advanced charting with TradingView integration
- Multi-timeframe analysis (1min, 5min, hourly, daily)
- Options pricing and Greeks calculations
- Portfolio optimization algorithms
Lessons Learned
- Market Prediction is Hard: Even with 30 features and advanced ML, achieving >55% accuracy is extremely difficult—markets are inherently unpredictable
- Feature Engineering Matters: Quality indicators (RSI, MACD, Bollinger) more impactful than model complexity
- Production Deployment Challenges: Git LFS, environment variables, CORS, and model file management require careful planning
- Performance Optimization: Lazy loading, caching, and code splitting essential for React apps with heavy components
- User Experience: Error recovery, loading states, and responsive design crucial for production applications
- Authentication Security: JWT tokens + secure hashing + HTTPS required for protecting user data
- API Design: RESTful endpoints, batch processing, and versioning important for scalable backend architecture