Model serving

AlphaSwarm ships three serving adapters. All three share the same ModelDeployment protocol so call-sites, the CLI (alphaswarm serve ...), and the REST API speak one vocabulary regardless of the runtime underneath.

Backend	Adapter	CLI	Best for
MLflow Models	`MLflowServeDeployment`	`alphaswarm serve mlflow <uri>`	any flavor logged with `mlflow.log_model`, low-throughput research
Ray Serve	`RayServeDeployment`	`alphaswarm serve ray <uri>`	horizontally scaled batch inference
TorchServe	`TorchServeDeployment`	`alphaswarm serve torchserve <uri>`	low-latency PyTorch endpoints + batching

Model URIs

All adapters accept three URI shapes:

Filesystem path — ./data/models/alpha_v1.pkl
MLflow run — runs:/<run-id>/<artifact>
MLflow registry — models:/<name>/<stage> or models:/<name>/<version>

MLflow URIs are resolved via alphaswarm.mlops.serving.base.resolve_model, which optionally downloads the artifact locally when a backend needs filesystem access (TorchServe packaging) or passes the URI through (MLflow Serve).

PreprocessingSpec propagation

Every adapter honours the PreprocessingSpec attached to the model. At inference time:

MLflow Serve — flavor-specific (pyfunc handlers are expected to re-apply preprocessing inside the model class).
Ray Serve — the generated deployment loads the pickle and delegates to model.predict(df); when model.preprocessing_spec is set, the apply call happens in __call__ before predict.
TorchServe — the auto-generated AqpBaseHandler checks for a preprocessing_spec attribute and runs spec.apply(df) before every call.

Quick start

# Train something and log to MLflow
python scripts/train_agent.py --config configs/ml/lgbm.yaml

# Serve the latest production version via MLflow
alphaswarm serve mlflow models:/alphaswarm-lgbm/Production --port 5001

# Or via Ray Serve
alphaswarm serve ray models:/alphaswarm-lgbm/Production --num-replicas 4

# Or package for TorchServe
alphaswarm serve torchserve models:/alphaswarm-lstm/Production --model-name alphaswarm-lstm

Kubernetes

Manifests and Helm values for deploying each backend to the rpi_kubernetes cluster live under deploy/kubernetes/serving/ and are described in alphaswarm_docs/docs/how-to/mlops/k8s-deployment.md (Phase 5).

Model URIs​

PreprocessingSpec propagation​

Quick start​

Kubernetes​

Model URIs

PreprocessingSpec propagation

Quick start

Kubernetes