PredictorHub
Status: Phase 5 shipped (Alembic 0044). Hub:
alphaswarm/ml/predictors/.
Why unify
The report calls out two empirical findings from the literature:
- XGBoost regression -- significantly superior accuracy at pure numerical return prediction (low-noise, structured features)
- LSTM classification -- demonstrably better at directional classification over medium-term 7-30 day horizons (sequence-aware, handles regime shifts)
The platform already had both models available under
alphaswarm/ml/models/, but they were registered with
different config keys, trained via different code paths, and
serialised inconsistently. Phase 5 consolidates them under a single
:class:PredictorSpec shape that the hub uses to pick the right
factory.
PredictorSpec
The spec is hash-locked Pydantic:
from alphaswarm.ml.predictors import PredictorSpec
# XGBoost regression — predict next-day return
spec_xgb = PredictorSpec(
name="xgb_returns_1d",
model_kind="xgboost",
label_kind="regression",
target_horizon="1d",
feature_columns=["mom_5", "mom_20", "rsi_14", "vol_20"],
target_column="ret_1d",
hyperparams={"max_depth": 6, "learning_rate": 0.05, "n_estimators": 500},
)
# LSTM classification — predict 20-day direction (binary)
spec_lstm = PredictorSpec(
name="lstm_direction_20d",
model_kind="lstm",
label_kind="classification",
target_horizon="20d",
feature_columns=["close", "volume", "rsi_14", "macd"],
target_column="dir_20d",
sequence_length=60,
hyperparams={"hidden_size": 64, "num_layers": 2, "dropout": 0.2},
classes=["down", "up"],
)
Re-snapshotting the spec into the persistence layer:
from alphaswarm.ml.predictors import persist_predictor_spec
row_id, created = persist_predictor_spec(spec_xgb)
print(row_id, created) # created=True the first time, False if hash unchanged
PredictorHub
from alphaswarm.ml.predictors import PredictorHub
hub = PredictorHub()
model = hub.build(spec_xgb)
model.fit(X_train, y_train)
preds = model.predict(X_test)
The hub picks the right factory from the
(model_kind, label_kind) registry. Adding a new model:
from alphaswarm.ml.predictors import register_predictor
@register_predictor(model_kind="transformer", label_kind="classification")
def my_transformer_factory(spec):
...
return TransformerClassifier(**spec.hyperparams)
Reference factories
The hub ships four reference factories matching the report's recommendations:
model_kind | label_kind | Underlying class |
|---|---|---|
xgboost | regression | :class:XGBModel from :mod:alphaswarm.ml.models.tree |
xgboost | classification | :class:XGBModel (with binary or multi-class objective) |
lstm | classification | :class:LSTMModel from :mod:alphaswarm.ml.models.torch.lstm |
lstm | regression | :class:LSTMModel (regression head) |
Hash-locked versioning
The Phase 5 predictor_spec_versions table mirrors the spec-version
pattern used by AgentSpec / BotSpec / RLExperimentSpec /
AnalysisSpec. Re-running persist_predictor_spec with an unchanged
spec returns created=False; a single byte change to the spec body
(new feature, new hyperparam) produces a fresh row. This means every
"how was this model trained?" question has a precise answer pinned
by the SHA-256 hash.
Wiring into agents
Phase 5 exposes the hub through the existing
/ml/test endpoints (REST) and three
DataMCP tools (agent-facing):
data.ml.predictors.list-- list registered specsdata.ml.predictors.train-- snapshot a spec + traindata.ml.predictors.deploy_pair-- A/B-test two trained models
Agents query the catalogue first, snapshot a spec, train, and deploy without an ORM import.