Kubernetes deployment

AlphaSwarm ships Kustomize manifests under alphaswarm_platform/deploy/k8s/base/ that can be applied to any cluster. The manifests under base/serving/ add three model-serving backends on top of the existing api, worker, paper-trader, and streaming-ingester Deployments.

Image targets

The Dockerfile builds five targets:

Target	Entrypoint	Used by
`base`	—	shared base layer
`paper`	`alphaswarm paper run`	`paper-trader.yaml`
`ingester`	`alphaswarm-stream-ingest`	`ingester-*.yaml`
`api` (default)	`uvicorn alphaswarm.api.main:app`	`api.yaml`, `worker.yaml`
`serving`	`alphaswarm serve <backend>`	`serving/*.yaml`
`ml-train`	`alphaswarm-train`	CI training jobs, Ray Tune sweeps

Build all five at once:

for target in paper ingester api serving ml-train; do
  docker build --target "$target" -t "alphaswarm-${target}:latest" .
done

Deploying to a Kubernetes cluster

AlphaSwarm is cluster-agnostic. The alphaswarm_platform/deployments/kubernetes/ tree provisions every shared dependency (MLflow in alphaswarm-mlops, MinIO + Postgres + Redis

ChromaDB in alphaswarm-data-services, Kafka + Schema Registry + Flink in alphaswarm-streaming, kube-prometheus-stack + Tempo + Loki + OTel + Phoenix in alphaswarm-observability, and so on). To deploy AlphaSwarm:

# From the alphaswarm root
# 1. Install the operators / Helm releases that AlphaSwarm CRDs depend on.
bash alphaswarm_platform/scripts/cluster_install/install-redpanda.sh
bash alphaswarm_platform/scripts/cluster_install/install-kube-prometheus-stack.sh
bash alphaswarm_platform/scripts/cluster_install/install-opentelemetry-operator.sh
bash alphaswarm_platform/scripts/cluster_install/install-spark-operator.sh
bash alphaswarm_platform/scripts/cluster_install/install-flink.sh

# 2. Apply the AlphaSwarm base kustomization (creates alphaswarm-* namespaces and
#    the workload manifests).
kubectl apply -k alphaswarm_platform/deployments/kubernetes/base/

Selecting which model to serve

The three serving backends all read a single model_uri from the alphaswarm-serving-env ConfigMap. Change it once and bounce the Deployments:

kubectl -n alphaswarm create configmap alphaswarm-serving-env \
  --from-literal=model_uri=models:/alphaswarm-alpha/Production \
  --from-literal=ray_serve_name=alphaswarm-alpha \
  --dry-run=client -o yaml | kubectl apply -f -

kubectl -n alphaswarm rollout restart deploy mlflow-serve ray-serve torchserve

Observability

Every Deployment exports traces to http://otel-collector:4317 (OTLP gRPC), matching the rpi_kubernetes collector conventions.
Prometheus picks up metrics via the ServiceMonitor resources in alphaswarm_platform/deploy/k8s/base/serving/servicemonitor.yaml.
AlphaSwarm's own metric surface is defined in alphaswarm/mlops/metrics.py: alphaswarm_train_duration_seconds, alphaswarm_backtest_sharpe, alphaswarm_paper_pnl, alphaswarm_serve_requests_total, alphaswarm_serve_latency_seconds.

Secrets

The alphaswarm-broker-secrets Secret supplies Alpaca / IBKR / Tradier credentials. For the serving stack no secrets are required unless the MLflow tracking URI needs auth — set MLFLOW_TRACKING_TOKEN in alphaswarm-env or a dedicated Secret.

Image targets​

Deploying to a Kubernetes cluster​

Selecting which model to serve​

Observability​

Secrets​

Image targets

Deploying to a Kubernetes cluster

Selecting which model to serve

Observability

Secrets