Skip to content

lu8848/MarketRegimeNet

Repository files navigation

MarketRegimeNet

An end-to-end machine learning trading system — ensemble of transformer models (LuTransformer, Informer), hybrid RNN model (LSTM-GRU) and LightGBM with temperature calibration and Kelly Criterion position sizing, backed by a FastAPI live trading bot. Validated on SPY 5-minute bars.


Technical Highlights

  • Four-model ensemble — two Transformer-based architectures (LuTransformer, Informer), a hybrid recurrent network (LSTM-GRU), and gradient boosting (LightGBM) — weighted per prediction horizon using squared AUC-PR scores
  • Post-hoc temperature calibration (LBFGS) converts overconfident neural network logits into reliable probabilities — required for correct Kelly position sizing
  • Cost-sensitive loss with an asymmetric cost matrix handles extreme class imbalance without oversampling
  • Walk-forward cross-validation with expanding windows prevents temporal data leakage that standard k-fold CV introduces in time series
  • Kelly Criterion ties probability calibration directly to position sizing — miscalibrated models automatically receive smaller allocations via a Brier score penalty term
  • Automated feature selection pipeline combines mutual information, F-statistic, and pairwise redundancy filtering to reduce 223 raw features to the top 100 and auto-updates the training config
  • Fault-tolerant training saves per-fold checkpoints so a multi-hour training run can resume from the last completed fold after any interruption

Backtested Performance

Holdout period: 2026-01-14 to 2026-02-19 (1,901 × 5-minute bars). Initial cash: $5,000. Kelly fraction: 1.0, conservatism: 1.0.

Metric Ensemble Buy & Hold
Total Return +0.36% −0.37%
Outperformance +0.73%
Sharpe Ratio 0.61
Max Drawdown 1.44%
AUC-PR (H0) 0.385
Brier Score (H0) 0.1134
Accuracy 67.8%
Total Trades 163

How It Works

Raw OHLCV (5-minute SPY)
        │
        ▼
Feature Engineering  ──  223 features
  ├── 158 QLib Alpha158 quantitative factors
  ├── ICT market structure (BOS, CHoCH, Order Blocks, FVG, Liquidity)
  ├── RSI, Momentum, Volatility, VWAP deviation
  └── Time & volume features
        │
        ▼
Walk-Forward Cross-Validation (3 folds, expanding window)
        │
        ├── LSTM-GRU Hybrid  (recurrent, captures sequential patterns)
        ├── LuTransformer         (Transformer — custom dual-tower SRU encoder)
        ├── Informer         (Transformer — ProbSparse attention, O(L log L))
        └── LightGBM         (gradient boosting on 14K flattened features)
                │
                ▼
    Temperature Calibration  ──  per-horizon T fitted via LBFGS
                │
                ▼
    Weighted Voting Meta-Learner  ──  squared AUC-PR weights per horizon
                │
                ▼
      Kelly Criterion Position Sizing
                │
                ▼
       Backtest  /  Live Trading API

Base Models

Four models with complementary inductive biases are trained independently and combined into a weighted voting ensemble.

LSTM-GRU Hybrid

A custom hybrid architecture that stacks GRU, LSTM, and GRU layers sequentially. GRUs are efficient at capturing short-term recurrence while LSTMs excel at retaining long-range memory — combining both layers allows the model to exploit patterns at multiple time scales within the 5-minute bar sequence.

LuTransformer — Custom Transformer (adapted from Hidformer)

A custom adaptation of Hidformer (Liu et al., 2024) retargeted for short-term intraday classification. The original Hidformer uses a hierarchical dual-tower architecture with a segment-and-merge token mixer designed for long-term forecasting. This implementation replaces the frequency-domain tower with a Segmented Recurrent Unit (SRU) encoder that tokenizes the input sequence into fixed-length segments and pools them hierarchically, producing a compact token representation before attention. This reduces the quadratic cost of standard self-attention while preserving local structure — important for intraday market data where adjacent bars are highly correlated.

Informer — Efficient Transformer

A Transformer using ProbSparse self-attention, which approximates the full attention matrix in O(L log L) by selecting the queries with the highest contribution. The encoder also applies convolutional distillation between layers to reduce sequence length progressively. This makes it practical for long input windows (seq_len = 205 bars) without the memory cost of full attention.

LightGBM

Gradient boosted decision trees operating on the full flattened feature matrix (205 timesteps × 80 features = 16,400 inputs per sample). LightGBM provides a non-sequential baseline with built-in feature importance, regularization (L1 + L2), and column subsampling to prevent overfitting on high-dimensional inputs. Feature importance from LightGBM also drives the feature selection step that reduces 223 raw features to the top 80.


Key Design Decisions

Weighted Voting with Squared Weights

Each base model receives a per-horizon weight based on its validation AUC-PR score, squared to amplify differences between strong and weak models:

w(i, h) = score(i, h)² / Σⱼ score(j, h)²

Squaring preserves relative ranking without completely discarding weaker models, and allows a different model to dominate at each prediction horizon.

Temperature Calibration

Neural networks are systematically overconfident — their raw softmax outputs do not reflect true event probabilities. After cross-validation, each model's logits are rescaled by a learned scalar temperature T fitted on pooled held-out validation data using LBFGS. A separate T is fitted per prediction horizon (H0–H5).

Accurate calibration is critical for Kelly Criterion: over-confident probabilities lead to overbetting and large drawdowns. Temperature is clamped to [0.1, 8.0] to prevent LBFGS from diverging to extreme values (T = 10,038 was observed without clamping, which collapses all predictions to uniform).

Cost-Sensitive Loss

The label distribution is extremely imbalanced: ~98% Slope, ~1% Peak, ~1% Bottom. Rather than reweighting classes, a cost matrix penalizes misclassifications asymmetrically — missing a Peak or Bottom costs far more than a false alarm:

Actual Peak Actual Slope Actual Bottom
Predicted Peak 0 500 5000
Predicted Slope 3000 0 3000
Predicted Bottom 5000 500 0

This forces the model to recall rare regimes at the cost of some precision, which is the correct trade-off for directional trading.

Kelly Criterion Position Sizing

Position sizes are computed from calibrated probabilities using a conservative Kelly formula that automatically reduces exposure when model calibration is poor:

f*  =  2p1                              # raw Kelly fraction
α   =  max(0, 1c · RMSE / |f*|)        # calibration penalty
f   =  α · f*                              # safe Kelly fraction

Low-confidence signals receive near-zero allocation without any hard probability threshold. The conservatism coefficient c can be tuned between 1.0 and 2.0.

pred_len = 6 for Training

Training with 6 prediction horizons instead of 1 increases the proportion of Peak and Bottom labels in the training set from ~0.83% to ~8% — a 10× increase that provides substantially more minority-class gradient signal per epoch. Only H0 is used in production; H1–H5 collapse to near-100% Slope predictions but are essential for training data efficiency.

Walk-Forward Cross-Validation

Standard k-fold cross-validation leaks future market data into the training set. A temporal splitter with 3 expanding windows ensures each validation fold only sees data from after the training window, giving realistic out-of-sample performance estimates.


Project Structure

├── Core/
│   ├── config.py                        # All hyperparameters and model configuration
│   ├── data_sources.py                  # Yahoo Finance / Alpaca / Polygon / Schwab
│   └── path_manager.py                  # Versioned model checkpoint resolution
│
├── Training/
│   ├── train_ensemble.py                # Entry point: full training pipeline
│   ├── TimeSeriesSimpleEnsemble.py      # Orchestrates CV + deployment training
│   ├── WeightedVotingMetaLearner.py     # Squared-weight ensemble combiner
│   ├── TemperatureCalibrationManager.py # Post-hoc probability calibration (LBFGS)
│   ├── DeepLearningBaseModelAdapter.py  # PyTorch Lightning wrapper for DL models
│   ├── LightGBMBaseModelAdapter.py      # LightGBM wrapper with feature flattening
│   ├── StockFeaturesCreator.py          # 223-feature engineering pipeline
│   ├── StockFeatureSelector.py          # LightGBM importance-based feature selection
│   └── TimeSeriesSplitter.py            # Walk-forward CV splitter
│
├── TimeSeriesLib/models/
│   ├── LSTM_GRU.py                      # Hybrid GRU → LSTM → GRU architecture
│   ├── LuTransformer.py                      # Dual-tower SRU Transformer
│   ├── LuTransformerEncoder.py               # SRU segmentation + hierarchical pooling
│   └── Informer.py                      # ProbSparse attention Transformer
│
├── features/
│   ├── QLibFeaturesCreator.py           # 158 QLib Alpha158 quantitative factors
│   ├── leak_free_ict_indicators.py      # ICT: BOS, CHoCH, Order Blocks, FVG, Liquidity
│   └── ...                              # RSI, Momentum, Volatility, VWAP
│
├── utilities/
│   ├── cost_sensitive_loss.py           # Asymmetric cost matrix loss function
│   ├── metrics.py                       # AUC-PR, Brier score
│   └── ldam_loss.py                     # LDAM+DRW (meta-learner training)
│
├── predict_ensemble.py                  # Prediction, backtesting, charting
└── TradeBot/                            # FastAPI live trading integration

Installation

git clone <repo-url>
cd MarketRegimeNet
pip install -r requirements.txt

Requires Python 3.10+ and a CUDA-capable GPU for deep learning model training.


Usage

Training

# Full training with walk-forward CV (3 folds)
python train_ensemble.py --ticker SPY

# Auto-update the holdout date to 30 days before today, then train
python train_ensemble.py --ticker SPY --auto-update-holdout

# Delete existing checkpoints and train from scratch
python train_ensemble.py --ticker SPY --from-scratch

# Run feature selection then train from scratch
python train_ensemble.py --ticker SPY --select-features

# Skip base model retraining — re-fit calibration and meta-learner only
python train_ensemble.py --ticker SPY --skip-base-models

# Train or resume training of model of the specific version, a new version is created by default.
python train_ensemble.py --ticker SPY --version v2.0_2026-03-01

# Train base models with all samples (including samples in validation set) without validation before deploy the model.
python train_ensemble.py --ticker SPY --deployment --version v1.0_2026-03-01

Training resumes automatically from the last completed fold if interrupted — no extra flag needed.

Prediction & Backtesting

# Backtest on holdout data with Kelly position sizing
python predict_ensemble.py --ticker SPY

# Download fresh data and predict on latest market state
python predict_ensemble.py --ticker SPY --use-fresh-data

# Half-Kelly (more conservative allocation)
python predict_ensemble.py --ticker SPY --kelly-fraction 0.5

# Raise conservatism coefficient (penalizes calibration error more aggressively)
python predict_ensemble.py --ticker SPY --kelly-conservatism 2.0

# Use a specific data file
python predict_ensemble.py --ticker SPY --data-file data/market/SPY_max_5m_data.csv

Feature Selection

# Analyze all features and write the top 80 to config.py
python Training/StockFeatureSelector.py --ticker SPY --update-config

Feature selection uses three methods in combination — mutual information, F-statistic, and target correlation — normalized and averaged into a single importance score. Features below the importance threshold or above the redundancy threshold are removed first, then pairwise highly-correlated features are pruned (keeping the higher-scoring one). The result is written back to Core/config.py automatically.


Configuration

Active base models and all hyperparameters are in Core/config.py. Toggle models by commenting or uncommenting their blocks:

'base_models': {
    'lstm_gru':  { 'type': 'lstm_gru',  'gru_hidden': 128, 'lstm_hidden': 128, 'dropout': 0.2 },
    'lutransformer':  { 'type': 'lutransformer',  'num_tokens': 32,  'segment_length': 16 },
    'informer':  { 'type': 'informer' },
    'lightgbm':  { 'type': 'lightgbm',  'n_estimators': 1000, ... },
}

Key global hyperparameters:

Parameter Value Description
seq_len 205 Input window (~17 hours of 5-minute bars)
pred_len 6 Prediction horizons H0–H5
d_model 128 Transformer embedding dimension
cv_folds 3 Walk-forward cross-validation folds
T_MIN / T_MAX 0.1 / 8.0 Temperature calibration clamp bounds

Model Versioning

Trained models are saved under data/models/SPY_5m_classification/<version>/ with:

  • Per-fold checkpoints and validation logits
  • Per-model, per-horizon temperature parameters
  • AUC-PR scores used for ensemble weighting
  • current_version.txt for active version tracking and rollback

Acknowledgments

  • Hidformer — LuTransformer is adapted from: Z. Liu, Y. Cao, H. Xu, Y. Huang, Q. He, X. Chen, X. Tang, X. Liu. Hidformer: Hierarchical dual-tower transformer using multi-scale mergence for long-term time series forecasting. Expert Systems With Applications, 239 (2024), 122412. https://doi.org/10.1016/j.eswa.2023.122412
  • LSTM-GRU — architecture inspired by: I. Akouaouch, A. Bouayad. A new deep learning approach for predicting high-frequency short-term cryptocurrency price. Bulletin of Electrical Engineering and Informatics, 14(1) (2025). https://doi.org/10.11591/eei.v14i1.7377
  • Time-Series-Library — base Informer implementation
  • Qlib Alpha158 — quantitative alpha factor library

About

An end-to-end machine learning trading system: ensemble of transformer models, hybrid RNN model and LightGBM with temperature calibration, and a live trading bot with Kelly Criterion position sizing.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors