AlphaBacktestExperiment
The keystone "model used as alpha" experiment — train a model, register it, deploy it as
DeployedModelAlpha, run a backtest, and persist combined ML + trading metrics under one MLflow parent run.
When to use
Use AlphaBacktestExperiment whenever you want to answer the question
"how does this model perform when its predictions actually drive
trades?". The standard Experiment family computes IC / RMSE / MAE in
isolation; AlphaBacktestExperiment adds Sharpe / Sortino / hit-rate
and links them back to the trained ModelVersion so the Strategy
Browser, MLflow UI, and Postgres catalog all converge.
Shape
| Concept | Class / table |
|---|---|
| Orchestrator | alphaswarm.ml.alpha_backtest_experiment::AlphaBacktestExperiment |
| Combined metrics | alphaswarm.ml.alpha_metrics |
| Combined run row | MLAlphaBacktestRun (Alembic 0025) |
| Per-bar audit (opt-in) | MLPredictionAudit (Alembic 0025) |
| Celery task | alphaswarm.tasks.ml_tasks.run_alpha_backtest_experiment (queue ml) |
| REST | POST /ml/alpha-backtest-runs, GET /ml/alpha-backtest-runs[/{id}/predictions] |
Workflow
Metric vocabulary
The combined metrics blob persisted on MLAlphaBacktestRun.combined_metrics rolls up:
- ML-side:
ic_spearman,ic_pearson,icir,mae,rmse,hit_rate - Trading-side:
sharpe,sortino,calmar,max_drawdown,total_return,turnover_adj_sharpe - Combined scalar:
score = combined_score(ml_metrics, trading_metrics)— default weighting inalphaswarm/ml/alpha_metrics.pyprioritises Sharpe (0.45) but also rewards IC / IR / hit-rate so a high-IC model that fails to translate to PnL is penalised.
Calling from code
from alphaswarm.ml.alpha_backtest_experiment import AlphaBacktestExperiment
experiment = AlphaBacktestExperiment(
dataset_cfg=dataset_cfg,
model_cfg=model_cfg,
strategy_cfg=strategy_cfg,
backtest_cfg=backtest_cfg,
run_name="ridge-alpha-backtest",
train_first=True,
capture_predictions=True,
)
result = experiment.run()
print(result.combined_metrics)
Calling from REST
curl -XPOST http://localhost:8000/ml/alpha-backtest-runs \
-H 'content-type: application/json' \
-d @configs/ml/alpha_backtest/ridge_alpha_backtest.yaml
The response is a TaskAccepted envelope; subscribe to
/chat/stream/{task_id} for progress events.
Where this goes wrong
- Forgetting
train_first=Falsewhen re-using an existingdeployment_idwill trigger a re-train. Set it explicitly. - The combined-metric weights are heuristic — customise them per
strategy by passing
weights={...}tocombined_score. MLPredictionAuditis gated behindALPHASWARM_ML_PREDICTION_AUDIT_ENABLED; default isfalseto keep the table small. Enable it for forensic explainability.