Saltar al contenido principal

Model serving

AlphaSwarm ships three serving adapters. All three share the same ModelDeployment protocol so call-sites, the CLI (alphaswarm serve ...), and the REST API speak one vocabulary regardless of the runtime underneath.

BackendAdapterCLIBest for
MLflow ModelsMLflowServeDeploymentalphaswarm serve mlflow <uri>any flavor logged with mlflow.log_model, low-throughput research
Ray ServeRayServeDeploymentalphaswarm serve ray <uri>horizontally scaled batch inference
TorchServeTorchServeDeploymentalphaswarm serve torchserve <uri>low-latency PyTorch endpoints + batching

Model URIs

All adapters accept three URI shapes:

  1. Filesystem path./data/models/alpha_v1.pkl
  2. MLflow runruns:/<run-id>/<artifact>
  3. MLflow registrymodels:/<name>/<stage> or models:/<name>/<version>

MLflow URIs are resolved via alphaswarm.mlops.serving.base.resolve_model, which optionally downloads the artifact locally when a backend needs filesystem access (TorchServe packaging) or passes the URI through (MLflow Serve).

PreprocessingSpec propagation

Every adapter honours the PreprocessingSpec attached to the model. At inference time:

  • MLflow Serve — flavor-specific (pyfunc handlers are expected to re-apply preprocessing inside the model class).
  • Ray Serve — the generated deployment loads the pickle and delegates to model.predict(df); when model.preprocessing_spec is set, the apply call happens in __call__ before predict.
  • TorchServe — the auto-generated AqpBaseHandler checks for a preprocessing_spec attribute and runs spec.apply(df) before every call.

Quick start

# Train something and log to MLflow
python scripts/train_agent.py --config configs/ml/lgbm.yaml

# Serve the latest production version via MLflow
alphaswarm serve mlflow models:/alphaswarm-lgbm/Production --port 5001

# Or via Ray Serve
alphaswarm serve ray models:/alphaswarm-lgbm/Production --num-replicas 4

# Or package for TorchServe
alphaswarm serve torchserve models:/alphaswarm-lstm/Production --model-name alphaswarm-lstm

Kubernetes

Manifests and Helm values for deploying each backend to the rpi_kubernetes cluster live under deploy/kubernetes/serving/ and are described in alphaswarm_docs/docs/how-to/mlops/k8s-deployment.md (Phase 5).