Kubernetes deployment
AlphaSwarm ships Kustomize manifests under alphaswarm_platform/deploy/k8s/base/
that can be applied to any cluster. The manifests under base/serving/
add three model-serving backends on top of the existing api, worker,
paper-trader, and streaming-ingester Deployments.
Image targets
The Dockerfile builds five targets:
| Target | Entrypoint | Used by |
|---|---|---|
base | — | shared base layer |
paper | alphaswarm paper run | paper-trader.yaml |
ingester | alphaswarm-stream-ingest | ingester-*.yaml |
api (default) | uvicorn alphaswarm.api.main:app | api.yaml, worker.yaml |
serving | alphaswarm serve <backend> | serving/*.yaml |
ml-train | alphaswarm-train | CI training jobs, Ray Tune sweeps |
Build all five at once:
for target in paper ingester api serving ml-train; do
docker build --target "$target" -t "alphaswarm-${target}:latest" .
done
Deploying to a Kubernetes cluster
AlphaSwarm is cluster-agnostic. The alphaswarm_platform/deployments/kubernetes/ tree provisions
every shared dependency (MLflow in alphaswarm-mlops, MinIO + Postgres + Redis
- ChromaDB in
alphaswarm-data-services, Kafka + Schema Registry + Flink inalphaswarm-streaming, kube-prometheus-stack + Tempo + Loki + OTel + Phoenix inalphaswarm-observability, and so on). To deploy AlphaSwarm:
# From the alphaswarm root
# 1. Install the operators / Helm releases that AlphaSwarm CRDs depend on.
bash alphaswarm_platform/scripts/cluster_install/install-redpanda.sh
bash alphaswarm_platform/scripts/cluster_install/install-kube-prometheus-stack.sh
bash alphaswarm_platform/scripts/cluster_install/install-opentelemetry-operator.sh
bash alphaswarm_platform/scripts/cluster_install/install-spark-operator.sh
bash alphaswarm_platform/scripts/cluster_install/install-flink.sh
# 2. Apply the AlphaSwarm base kustomization (creates alphaswarm-* namespaces and
# the workload manifests).
kubectl apply -k alphaswarm_platform/deployments/kubernetes/base/
Selecting which model to serve
The three serving backends all read a single model_uri from the
alphaswarm-serving-env ConfigMap. Change it once and bounce the Deployments:
kubectl -n alphaswarm create configmap alphaswarm-serving-env \
--from-literal=model_uri=models:/alphaswarm-alpha/Production \
--from-literal=ray_serve_name=alphaswarm-alpha \
--dry-run=client -o yaml | kubectl apply -f -
kubectl -n alphaswarm rollout restart deploy mlflow-serve ray-serve torchserve
Observability
- Every Deployment exports traces to
http://otel-collector:4317(OTLP gRPC), matching therpi_kubernetescollector conventions. - Prometheus picks up metrics via the
ServiceMonitorresources inalphaswarm_platform/deploy/k8s/base/serving/servicemonitor.yaml. - AlphaSwarm's own metric surface is defined in
alphaswarm/mlops/metrics.py:alphaswarm_train_duration_seconds,alphaswarm_backtest_sharpe,alphaswarm_paper_pnl,alphaswarm_serve_requests_total,alphaswarm_serve_latency_seconds.
Secrets
The alphaswarm-broker-secrets Secret supplies Alpaca / IBKR / Tradier
credentials. For the serving stack no secrets are required unless the
MLflow tracking URI needs auth — set MLFLOW_TRACKING_TOKEN in
alphaswarm-env or a dedicated Secret.