Operations runbook — Kubernetes deployment
End-to-end walkthrough for shipping AlphaSwarm to any Kubernetes cluster (EKS,
AKS, GKE, vanilla k3s, or the Raspberry Pi k3s cluster owned by
rpi_kubernetes). AlphaSwarm is fully self-contained: every shared service it
depends on (Postgres, Redis, Kafka, MinIO, MLflow, observability stack,
etc.) ships in alphaswarm_platform/deployments/kubernetes/. There is no implicit
dependency on rpi_kubernetes or any other repository.
Prerequisites
kubectl1.30+ with a current context pointing at the target cluster.- Cluster admin (you'll create namespaces + RBAC).
- A container registry the cluster can pull from (Docker Hub / ECR / ACR / GCR).
- An ingress controller (
ingress-nginxrecommended) andcert-managerwith aletsencrypt-prodClusterIssuerfor the AlphaSwarm TLS hosts. - Auth0 tenant configured per
alphaswarm_docs/architecture/decisions/003-auth0-zero-trust.md
(default tenant
alphaswarm-fund.us.auth0.com). - Cluster operators / CRDs installed via alphaswarm_platform/scripts/cluster_install/ (Strimzi, Spark Operator, OpenTelemetry Operator, Phoenix, Redpanda, etc.) - run the relevant installer before applying the AlphaSwarm base kustomization.
Targeted runbooks
- Two-node tower+laptop bootstrap: tower-cluster-deploy.md
- Blue/green domain cutover: alphaswarm-fund-blue-green-cutover.md
Step 1 — provision Auth0 (one-time)
$env:AUTH0_DOMAIN = "your-tenant.us.auth0.com"
$env:AUTH0_M2M_CLIENT_ID = "..."
$env:AUTH0_M2M_CLIENT_SECRET = "..."
$env:ALPHASWARM_SYNC_URL = "https://api.alphaswarm.enterprise.com/_internal/auth0/sync"
python alphaswarm_platform/build/scripts/provision_auth0.py --dry-run # preview
python alphaswarm_platform/build/scripts/provision_auth0.py # apply
This idempotently creates the API resource server, the four roles, and the post-login Action.
Step 2 — generate the K8s ConfigMap + Secret scaffold
make generate-config ENV=k8s
Produces:
alphaswarm_platform/deployments/kubernetes/base/configmaps/alphaswarm-config.yaml(commit this)alphaswarm_platform/deployments/kubernetes/base/secrets/alphaswarm-secrets.yaml.template(DO NOT commit values — CI/CD or external-secrets-operator patches real values)
Step 3 — build + push images
$env:IMAGE_TAG = "rc-$(git rev-parse --short HEAD)-$(Get-Date -Format yyyy-MM-dd)"
make build-client IMAGE_TAG=$env:IMAGE_TAG
make build-cp IMAGE_TAG=$env:IMAGE_TAG
# Optional (only if the Dockerfiles exist in alphaswarm_platform/build/docker/*)
make build-worker IMAGE_TAG=$env:IMAGE_TAG
make build-ingestion IMAGE_TAG=$env:IMAGE_TAG
docker login
docker push docker.io/julianwiley/alphaswarm-client:$env:IMAGE_TAG
docker push docker.io/julianwiley/alphaswarm-controller:$env:IMAGE_TAG
docker push docker.io/julianwiley/alphaswarm-worker:$env:IMAGE_TAG
docker push docker.io/julianwiley/alphaswarm-ingestion:$env:IMAGE_TAG
If make build-worker or make build-ingestion reports a missing Dockerfile,
pin those image tags to known-good prebuilt registry tags in the target overlay
before applying.
Step 3b — one-shot Alembic migration (cluster)
After alphaswarm-api is pullable on the cluster, run:
kubectl apply -f alphaswarm_platform/deployments/kubernetes/base/jobs/alembic-upgrade.yaml
kubectl -n alphaswarm wait --for=condition=complete job/alphaswarm-alembic-upgrade --timeout=900s
kubectl -n alphaswarm logs job/alphaswarm-alembic-upgrade
The Job uses the same alphaswarm-config / alphaswarm-secrets env as alphaswarm-core and targets
postgresql.alphaswarm-data-services.svc.cluster.local (the AlphaSwarm-owned Postgres in the
alphaswarm-data-services namespace). Re-apply only when you need a fresh
upgrade head (delete the previous Job first: kubectl -n alphaswarm delete job alphaswarm-alembic-upgrade).
alembic/env.py widens alembic_version.version_num to VARCHAR(128) automatically
before migrations run (revision slugs longer than 32 characters otherwise fail at
0039_extended_instrument_taxonomy).
Brownfield Postgres (pre-Alembic or partial schema)
If alembic upgrade head fails with DuplicateTable / DuplicateColumn, the database
was created outside Alembic tracking. From a workstation with the API image and a
port-forward to cluster Postgres:
kubectl -n alphaswarm-data-services port-forward svc/postgresql 15432:5432
$env:ALPHASWARM_POSTGRES_DSN = "postgresql+psycopg2://alphaswarm:alphaswarm@host.docker.internal:15432/alphaswarm"
# Optional: stamp to the highest revision whose objects already exist, then upgrade.
# $env:ALPHASWARM_ALEMBIC_STAMP_REVISION = "0015_dbt_foundation"
bash scripts/cluster_alembic_upgrade.sh
Use ALPHASWARM_POSTGRES_DSN (maps to settings.postgres_dsn) — not a raw DATABASE_URL
alias. Migration 0040_normalized_identifiers_backfill can take several minutes on
large instruments tables.
Postgres prerequisites (alphaswarm-data-services)
Migration 0045_pgvector_foundation requires the vector extension in the alphaswarm
database. On existing clusters (init script applied before the alphaswarm DB was added),
run once as the Postgres superuser:
kubectl -n alphaswarm-data-services exec deploy/postgresql -- \
psql -U postgres -d alphaswarm -c "CREATE EXTENSION IF NOT EXISTS vector;"
Fresh installs use the AlphaSwarm-owned alphaswarm_platform/deployments/kubernetes/base-services/postgres-shared/
manifests, whose init SQL creates the alphaswarm role/database and enables
vector there.
Step 4 — pin the image tag in the target overlay
Edit alphaswarm_platform/deployments/kubernetes/overlays/<env>/kustomization.yaml:
images:
- name: docker.io/julianwiley/alphaswarm-client
newTag: rc-abcdef01-2026-05-19
...
Docker Hub pull secret (private repos)
Deployments reference dockerhub-pull-secret. Create it in both workload
namespaces before rollout:
$env:DOCKERHUB_USER = "<dockerhub-username>"
$env:DOCKERHUB_TOKEN = "<dockerhub-access-token>" # hub.docker.com → Account Settings → Security
kubectl create secret docker-registry dockerhub-pull-secret `
--docker-server=https://index.docker.io/v1/ `
--docker-username=$env:DOCKERHUB_USER `
--docker-password=$env:DOCKERHUB_TOKEN `
-n alphaswarm --dry-run=client -o yaml | kubectl apply -f -
kubectl create secret docker-registry dockerhub-pull-secret `
--docker-server=https://index.docker.io/v1/ `
--docker-username=$env:DOCKERHUB_USER `
--docker-password=$env:DOCKERHUB_TOKEN `
-n alphaswarm-admin --dry-run=client -o yaml | kubectl apply -f -
Public repositories can omit the secret by removing imagePullSecrets from
the deployment manifests.
Step 5 — apply
# Dry-run first
kubectl apply -k alphaswarm_platform/deployments/kubernetes/overlays/tower-dev --dry-run=server
# Apply
kubectl apply -k alphaswarm_platform/deployments/kubernetes/overlays/tower-dev
# Verify
kubectl -n alphaswarm get pods,svc,hpa,pdb
kubectl -n alphaswarm-admin get pods,svc
Step 6 — populate the Secret
If you're not using external-secrets-operator, populate the placeholder Secret manually:
kubectl -n alphaswarm create secret generic alphaswarm-secrets `
--from-literal=ALPHASWARM_DATABASE_PASSWORD="<value>" `
--from-literal=ALPHASWARM_AUTH_M2M_CLIENT_SECRET="<value>" `
--from-literal=ALPHASWARM_SESSION_COOKIE_SECRET="<value>" `
--dry-run=client -o yaml | kubectl apply -f -
For external-secrets-operator users, point an ExternalSecret at your secret store (Vault / SSM / Key Vault / Secret Manager) and let the operator create the K8s Secret.
Step 7 — DNS + TLS
The Ingresses expect:
alpha-swarm.ai->alphaswarm-clientService in thealphaswarmnamespaceapi.alpha-swarm.ai->alphaswarm-coreService in thealphaswarmnamespacemanage.alpha-swarm.ai->alphaswarm-cpService in thealphaswarm-adminnamespace
Point DNS at the NGINX Ingress controller's LoadBalancer IP. cert-manager handles TLS via the letsencrypt-prod ClusterIssuer (configure separately).
Step 8 — smoke test
# Client should serve the SPA shell
curl -fsS https://alpha-swarm.ai/ | findstr "<!doctype html"
# Control plane health (unauthenticated)
curl -fsS https://manage.alpha-swarm.ai/manage/health
# OpenAPI spec
curl -fsS https://manage.alpha-swarm.ai/manage/openapi.json | python -m json.tool | findstr title
# Cluster verification helper
bash scripts/verify_tower_cluster.sh
Rollback
# Re-apply the previous overlay with the previous image tag.
git checkout HEAD~1 -- alphaswarm_platform/deployments/kubernetes/overlays/dev/kustomization.yaml
make deploy-k8s ENV=dev
Or, for an immediate rollback that doesn't touch git:
kubectl -n alphaswarm rollout undo deployment/alphaswarm-client
kubectl -n alphaswarm rollout undo deployment/alphaswarm-core
kubectl -n alphaswarm rollout undo deployment/alphaswarm-worker
kubectl -n alphaswarm-admin rollout undo deployment/alphaswarm-cp