MarketRegimeNet

An end-to-end machine learning trading system — ensemble of transformer models (LuTransformer, Informer), hybrid RNN model (LSTM-GRU) and LightGBM with temperature calibration and Kelly Criterion position sizing, backed by a FastAPI live trading bot. Validated on SPY 5-minute bars.

Technical Highlights

Four-model ensemble — two Transformer-based architectures (LuTransformer, Informer), a hybrid recurrent network (LSTM-GRU), and gradient boosting (LightGBM) — weighted per prediction horizon using squared AUC-PR scores
Post-hoc temperature calibration (LBFGS) converts overconfident neural network logits into reliable probabilities — required for correct Kelly position sizing
Cost-sensitive loss with an asymmetric cost matrix handles extreme class imbalance without oversampling
Walk-forward cross-validation with expanding windows prevents temporal data leakage that standard k-fold CV introduces in time series
Kelly Criterion ties probability calibration directly to position sizing — miscalibrated models automatically receive smaller allocations via a Brier score penalty term
Automated feature selection pipeline combines mutual information, F-statistic, and pairwise redundancy filtering to reduce 223 raw features to the top 100 and auto-updates the training config
Fault-tolerant training saves per-fold checkpoints so a multi-hour training run can resume from the last completed fold after any interruption

Backtested Performance

Holdout period: 2026-01-14 to 2026-02-19 (1,901 × 5-minute bars). Initial cash: $5,000. Kelly fraction: 1.0, conservatism: 1.0.

Metric	Ensemble	Buy & Hold
Total Return	+0.36%	−0.37%
Outperformance	+0.73%	—
Sharpe Ratio	0.61	—
Max Drawdown	1.44%	—
AUC-PR (H0)	0.385	—
Brier Score (H0)	0.1134	—
Accuracy	67.8%	—
Total Trades	163	—

How It Works

Raw OHLCV (5-minute SPY)
        │
        ▼
Feature Engineering  ──  223 features
  ├── 158 QLib Alpha158 quantitative factors
  ├── ICT market structure (BOS, CHoCH, Order Blocks, FVG, Liquidity)
  ├── RSI, Momentum, Volatility, VWAP deviation
  └── Time & volume features
        │
        ▼
Walk-Forward Cross-Validation (3 folds, expanding window)
        │
        ├── LSTM-GRU Hybrid  (recurrent, captures sequential patterns)
        ├── LuTransformer         (Transformer — custom dual-tower SRU encoder)
        ├── Informer         (Transformer — ProbSparse attention, O(L log L))
        └── LightGBM         (gradient boosting on 14K flattened features)
                │
                ▼
    Temperature Calibration  ──  per-horizon T fitted via LBFGS
                │
                ▼
    Weighted Voting Meta-Learner  ──  squared AUC-PR weights per horizon
                │
                ▼
      Kelly Criterion Position Sizing
                │
                ▼
       Backtest  /  Live Trading API

Base Models

Four models with complementary inductive biases are trained independently and combined into a weighted voting ensemble.

LSTM-GRU Hybrid

A custom hybrid architecture that stacks GRU, LSTM, and GRU layers sequentially. GRUs are efficient at capturing short-term recurrence while LSTMs excel at retaining long-range memory — combining both layers allows the model to exploit patterns at multiple time scales within the 5-minute bar sequence.

LuTransformer — Custom Transformer (adapted from Hidformer)

A custom adaptation of Hidformer (Liu et al., 2024) retargeted for short-term intraday classification. The original Hidformer uses a hierarchical dual-tower architecture with a segment-and-merge token mixer designed for long-term forecasting. This implementation replaces the frequency-domain tower with a Segmented Recurrent Unit (SRU) encoder that tokenizes the input sequence into fixed-length segments and pools them hierarchically, producing a compact token representation before attention. This reduces the quadratic cost of standard self-attention while preserving local structure — important for intraday market data where adjacent bars are highly correlated.

Informer — Efficient Transformer

A Transformer using ProbSparse self-attention, which approximates the full attention matrix in O(L log L) by selecting the queries with the highest contribution. The encoder also applies convolutional distillation between layers to reduce sequence length progressively. This makes it practical for long input windows (seq_len = 205 bars) without the memory cost of full attention.

LightGBM

Gradient boosted decision trees operating on the full flattened feature matrix (205 timesteps × 80 features = 16,400 inputs per sample). LightGBM provides a non-sequential baseline with built-in feature importance, regularization (L1 + L2), and column subsampling to prevent overfitting on high-dimensional inputs. Feature importance from LightGBM also drives the feature selection step that reduces 223 raw features to the top 80.

Key Design Decisions

Weighted Voting with Squared Weights

Each base model receives a per-horizon weight based on its validation AUC-PR score, squared to amplify differences between strong and weak models:

w(i, h) = score(i, h)² / Σⱼ score(j, h)²

Squaring preserves relative ranking without completely discarding weaker models, and allows a different model to dominate at each prediction horizon.

Temperature Calibration

Neural networks are systematically overconfident — their raw softmax outputs do not reflect true event probabilities. After cross-validation, each model's logits are rescaled by a learned scalar temperature T fitted on pooled held-out validation data using LBFGS. A separate T is fitted per prediction horizon (H0–H5).

Accurate calibration is critical for Kelly Criterion: over-confident probabilities lead to overbetting and large drawdowns. Temperature is clamped to [0.1, 8.0] to prevent LBFGS from diverging to extreme values (T = 10,038 was observed without clamping, which collapses all predictions to uniform).

Cost-Sensitive Loss

The label distribution is extremely imbalanced: ~98% Slope, ~1% Peak, ~1% Bottom. Rather than reweighting classes, a cost matrix penalizes misclassifications asymmetrically — missing a Peak or Bottom costs far more than a false alarm:

	Actual Peak	Actual Slope	Actual Bottom
Predicted Peak	0	500	5000
Predicted Slope	3000	0	3000
Predicted Bottom	5000	500	0

This forces the model to recall rare regimes at the cost of some precision, which is the correct trade-off for directional trading.

Kelly Criterion Position Sizing

Position sizes are computed from calibrated probabilities using a conservative Kelly formula that automatically reduces exposure when model calibration is poor:

f*  =  2p − 1                              # raw Kelly fraction
α   =  max(0, 1 − c · RMSE / |f*|)        # calibration penalty
f   =  α · f*                              # safe Kelly fraction

Low-confidence signals receive near-zero allocation without any hard probability threshold. The conservatism coefficient c can be tuned between 1.0 and 2.0.

pred_len = 6 for Training

Training with 6 prediction horizons instead of 1 increases the proportion of Peak and Bottom labels in the training set from ~0.83% to ~8% — a 10× increase that provides substantially more minority-class gradient signal per epoch. Only H0 is used in production; H1–H5 collapse to near-100% Slope predictions but are essential for training data efficiency.

Walk-Forward Cross-Validation

Standard k-fold cross-validation leaks future market data into the training set. A temporal splitter with 3 expanding windows ensures each validation fold only sees data from after the training window, giving realistic out-of-sample performance estimates.

Project Structure

├── Core/
│   ├── config.py                        # All hyperparameters and model configuration
│   ├── data_sources.py                  # Yahoo Finance / Alpaca / Polygon / Schwab
│   └── path_manager.py                  # Versioned model checkpoint resolution
│
├── Training/
│   ├── train_ensemble.py                # Entry point: full training pipeline
│   ├── TimeSeriesSimpleEnsemble.py      # Orchestrates CV + deployment training
│   ├── WeightedVotingMetaLearner.py     # Squared-weight ensemble combiner
│   ├── TemperatureCalibrationManager.py # Post-hoc probability calibration (LBFGS)
│   ├── DeepLearningBaseModelAdapter.py  # PyTorch Lightning wrapper for DL models
│   ├── LightGBMBaseModelAdapter.py      # LightGBM wrapper with feature flattening
│   ├── StockFeaturesCreator.py          # 223-feature engineering pipeline
│   ├── StockFeatureSelector.py          # LightGBM importance-based feature selection
│   └── TimeSeriesSplitter.py            # Walk-forward CV splitter
│
├── TimeSeriesLib/models/
│   ├── LSTM_GRU.py                      # Hybrid GRU → LSTM → GRU architecture
│   ├── LuTransformer.py                      # Dual-tower SRU Transformer
│   ├── LuTransformerEncoder.py               # SRU segmentation + hierarchical pooling
│   └── Informer.py                      # ProbSparse attention Transformer
│
├── features/
│   ├── QLibFeaturesCreator.py           # 158 QLib Alpha158 quantitative factors
│   ├── leak_free_ict_indicators.py      # ICT: BOS, CHoCH, Order Blocks, FVG, Liquidity
│   └── ...                              # RSI, Momentum, Volatility, VWAP
│
├── utilities/
│   ├── cost_sensitive_loss.py           # Asymmetric cost matrix loss function
│   ├── metrics.py                       # AUC-PR, Brier score
│   └── ldam_loss.py                     # LDAM+DRW (meta-learner training)
│
├── predict_ensemble.py                  # Prediction, backtesting, charting
└── TradeBot/                            # FastAPI live trading integration

Installation

git clone <repo-url>
cd MarketRegimeNet
pip install -r requirements.txt

Requires Python 3.10+ and a CUDA-capable GPU for deep learning model training.

Usage

Training

# Full training with walk-forward CV (3 folds)
python train_ensemble.py --ticker SPY

# Auto-update the holdout date to 30 days before today, then train
python train_ensemble.py --ticker SPY --auto-update-holdout

# Delete existing checkpoints and train from scratch
python train_ensemble.py --ticker SPY --from-scratch

# Run feature selection then train from scratch
python train_ensemble.py --ticker SPY --select-features

# Skip base model retraining — re-fit calibration and meta-learner only
python train_ensemble.py --ticker SPY --skip-base-models

# Train or resume training of model of the specific version, a new version is created by default.
python train_ensemble.py --ticker SPY --version v2.0_2026-03-01

# Train base models with all samples (including samples in validation set) without validation before deploy the model.
python train_ensemble.py --ticker SPY --deployment --version v1.0_2026-03-01

Training resumes automatically from the last completed fold if interrupted — no extra flag needed.

Prediction & Backtesting

# Backtest on holdout data with Kelly position sizing
python predict_ensemble.py --ticker SPY

# Download fresh data and predict on latest market state
python predict_ensemble.py --ticker SPY --use-fresh-data

# Half-Kelly (more conservative allocation)
python predict_ensemble.py --ticker SPY --kelly-fraction 0.5

# Raise conservatism coefficient (penalizes calibration error more aggressively)
python predict_ensemble.py --ticker SPY --kelly-conservatism 2.0

# Use a specific data file
python predict_ensemble.py --ticker SPY --data-file data/market/SPY_max_5m_data.csv

Feature Selection

# Analyze all features and write the top 80 to config.py
python Training/StockFeatureSelector.py --ticker SPY --update-config

Feature selection uses three methods in combination — mutual information, F-statistic, and target correlation — normalized and averaged into a single importance score. Features below the importance threshold or above the redundancy threshold are removed first, then pairwise highly-correlated features are pruned (keeping the higher-scoring one). The result is written back to Core/config.py automatically.

Configuration

Active base models and all hyperparameters are in Core/config.py. Toggle models by commenting or uncommenting their blocks:

'base_models': {
    'lstm_gru':  { 'type': 'lstm_gru',  'gru_hidden': 128, 'lstm_hidden': 128, 'dropout': 0.2 },
    'lutransformer':  { 'type': 'lutransformer',  'num_tokens': 32,  'segment_length': 16 },
    'informer':  { 'type': 'informer' },
    'lightgbm':  { 'type': 'lightgbm',  'n_estimators': 1000, ... },
}

Key global hyperparameters:

Parameter	Value	Description
`seq_len`	205	Input window (~17 hours of 5-minute bars)
`pred_len`	6	Prediction horizons H0–H5
`d_model`	128	Transformer embedding dimension
`cv_folds`	3	Walk-forward cross-validation folds
`T_MIN / T_MAX`	0.1 / 8.0	Temperature calibration clamp bounds

Model Versioning

Trained models are saved under data/models/SPY_5m_classification/<version>/ with:

Per-fold checkpoints and validation logits
Per-model, per-horizon temperature parameters
AUC-PR scores used for ensemble weighting
current_version.txt for active version tracking and rollback

Acknowledgments

Hidformer — LuTransformer is adapted from: Z. Liu, Y. Cao, H. Xu, Y. Huang, Q. He, X. Chen, X. Tang, X. Liu. Hidformer: Hierarchical dual-tower transformer using multi-scale mergence for long-term time series forecasting. Expert Systems With Applications, 239 (2024), 122412. https://doi.org/10.1016/j.eswa.2023.122412
LSTM-GRU — architecture inspired by: I. Akouaouch, A. Bouayad. A new deep learning approach for predicting high-frequency short-term cryptocurrency price. Bulletin of Electrical Engineering and Informatics, 14(1) (2025). https://doi.org/10.11591/eei.v14i1.7377
Time-Series-Library — base Informer implementation
Qlib Alpha158 — quantitative alpha factor library

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
Core		Core
TimeSeriesLib		TimeSeriesLib
TradeBot		TradeBot
Training		Training
docs		docs
features		features
scripts		scripts
utilities		utilities
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.qa.yml		docker-compose.qa.yml
docker-compose.yml		docker-compose.yml
predict_ensemble.py		predict_ensemble.py
qa.env.example		qa.env.example
train_ensemble.py		train_ensemble.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MarketRegimeNet

Technical Highlights

Backtested Performance

How It Works

Base Models

LSTM-GRU Hybrid

LuTransformer — Custom Transformer (adapted from Hidformer)

Informer — Efficient Transformer

LightGBM

Key Design Decisions

Weighted Voting with Squared Weights

Temperature Calibration

Cost-Sensitive Loss

Kelly Criterion Position Sizing

pred_len = 6 for Training

Walk-Forward Cross-Validation

Project Structure

Installation

Usage

Training

Prediction & Backtesting

Feature Selection

Configuration

Model Versioning

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MarketRegimeNet

Technical Highlights

Backtested Performance

How It Works

Base Models

LSTM-GRU Hybrid

LuTransformer — Custom Transformer (adapted from Hidformer)

Informer — Efficient Transformer

LightGBM

Key Design Decisions

Weighted Voting with Squared Weights

Temperature Calibration

Cost-Sensitive Loss

Kelly Criterion Position Sizing

pred_len = 6 for Training

Walk-Forward Cross-Validation

Project Structure

Installation

Usage

Training

Prediction & Backtesting

Feature Selection

Configuration

Model Versioning

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages