Saltar al contenido principal

Kubernetes deployment

AlphaSwarm ships Kustomize manifests under alphaswarm_platform/deploy/k8s/base/ that can be applied to any cluster. The manifests under base/serving/ add three model-serving backends on top of the existing api, worker, paper-trader, and streaming-ingester Deployments.

Image targets

The Dockerfile builds five targets:

TargetEntrypointUsed by
baseshared base layer
paperalphaswarm paper runpaper-trader.yaml
ingesteralphaswarm-stream-ingestingester-*.yaml
api (default)uvicorn alphaswarm.api.main:appapi.yaml, worker.yaml
servingalphaswarm serve <backend>serving/*.yaml
ml-trainalphaswarm-trainCI training jobs, Ray Tune sweeps

Build all five at once:

for target in paper ingester api serving ml-train; do
docker build --target "$target" -t "alphaswarm-${target}:latest" .
done

Deploying to a Kubernetes cluster

AlphaSwarm is cluster-agnostic. The alphaswarm_platform/deployments/kubernetes/ tree provisions every shared dependency (MLflow in alphaswarm-mlops, MinIO + Postgres + Redis

  • ChromaDB in alphaswarm-data-services, Kafka + Schema Registry + Flink in alphaswarm-streaming, kube-prometheus-stack + Tempo + Loki + OTel + Phoenix in alphaswarm-observability, and so on). To deploy AlphaSwarm:
# From the alphaswarm root
# 1. Install the operators / Helm releases that AlphaSwarm CRDs depend on.
bash alphaswarm_platform/scripts/cluster_install/install-redpanda.sh
bash alphaswarm_platform/scripts/cluster_install/install-kube-prometheus-stack.sh
bash alphaswarm_platform/scripts/cluster_install/install-opentelemetry-operator.sh
bash alphaswarm_platform/scripts/cluster_install/install-spark-operator.sh
bash alphaswarm_platform/scripts/cluster_install/install-flink.sh

# 2. Apply the AlphaSwarm base kustomization (creates alphaswarm-* namespaces and
# the workload manifests).
kubectl apply -k alphaswarm_platform/deployments/kubernetes/base/

Selecting which model to serve

The three serving backends all read a single model_uri from the alphaswarm-serving-env ConfigMap. Change it once and bounce the Deployments:

kubectl -n alphaswarm create configmap alphaswarm-serving-env \
--from-literal=model_uri=models:/alphaswarm-alpha/Production \
--from-literal=ray_serve_name=alphaswarm-alpha \
--dry-run=client -o yaml | kubectl apply -f -

kubectl -n alphaswarm rollout restart deploy mlflow-serve ray-serve torchserve

Observability

  • Every Deployment exports traces to http://otel-collector:4317 (OTLP gRPC), matching the rpi_kubernetes collector conventions.
  • Prometheus picks up metrics via the ServiceMonitor resources in alphaswarm_platform/deploy/k8s/base/serving/servicemonitor.yaml.
  • AlphaSwarm's own metric surface is defined in alphaswarm/mlops/metrics.py: alphaswarm_train_duration_seconds, alphaswarm_backtest_sharpe, alphaswarm_paper_pnl, alphaswarm_serve_requests_total, alphaswarm_serve_latency_seconds.

Secrets

The alphaswarm-broker-secrets Secret supplies Alpaca / IBKR / Tradier credentials. For the serving stack no secrets are required unless the MLflow tracking URI needs auth — set MLFLOW_TRACKING_TOKEN in alphaswarm-env or a dedicated Secret.