ADR 002 — Single multi-stage container for the AlphaSwarm client surface
- Status: Accepted (2026-05-18) — Superseded for
alphaswarm_uiby ADR 011 on 2026-05-25. Still in force for the local-operatoralphaswarm_client/packaging path. - Authors: Platform team
- Supersedes: None
- Superseded by: ADR 011 — CDN-fronted standalone for
alphaswarm_ui(cloud surface only) - Related: ADR 001 — Vite static export, ADR 005 — separated control plane, ADR 011 — CDN-fronted standalone for
alphaswarm_ui, ADR 012 — Solara deprecation
Scope narrowing (2026-05-25): This ADR's decisions apply ONLY to the local-operator
alphaswarm_client/packaging. The cloud-hostedalphaswarm_ui/surface (atalpha-swarm.ai/app.alpha-swarm.ai) is governed by ADR 011 and uses a clean Next.js standalone container with no ASGI proxy stage. See ADR 011 for the cloud rationale.
Context
Today AlphaSwarm runs the Vite frontend on :3001 (compose :3002), the legacy Next.js webui on :3000 (now stopped), the legacy Solara UI on :8765, and the FastAPI API on :8000. Operators have to juggle four URLs and four health probes. The alphaswarm_client Docker image is a chance to collapse these into one.
Three packaging options were considered:
- One container per surface — separate
alphaswarm-frontend,alphaswarm-solara,alphaswarm-apiimages; an external Ingress/NGINX layer fans traffic. - Sidecar pattern — one Pod per surface, sharing localhost via an
nginxsidecar. - Single multi-stage build — Stage 1 builds Vite, Stage 2 prepares Solara, Stage 3 (production) is a
python:3.11-slimruntime that serves both as static + ASGI mount and proxies API traffic.
Decision
alphaswarm_client is one image built from a three-stage Dockerfile that ships:
- Stage 1 (
ui-builder,node:20-alpine) — runspnpm --dir alphaswarm_client build, output to/app/out/. Node is dropped after this stage. - Stage 2 (
solara-builder,python:3.11-slim) — installs Solara + legacy UI deps, pre-warms component caches, verifieslegacy_ui.appis importable. - Stage 3 (
production,python:3.11-slim) — installs FastAPI + uvicorn + httpx + websockets + python-jose +alphaswarm_core. Copies Vite assets from Stage 1 and Solara from Stage 2. Exposes port8080. No Node, no npm.
The Stage 3 runtime mounts:
/static→ Vite assets from Stage 1/legacy→ Solara ASGI app/webui→ legacy Next.js export (rollback only)/api/*→ reverse-proxied toALPHASWARM_CORE_API_URL/ml/*→ reverse-proxied toALPHASWARM_ML_API_URL/mcp/*→ reverse-proxied toALPHASWARM_MCP_URL/manage/*→ reverse-proxied toALPHASWARM_CONTROL_PLANE_URL/ws/*→ WebSocket proxy with reconnect-with-backoff
Consequences
Positive
- One image, one health probe (
/health), one set ofsecurityContextrules. - Stable URL surface for operators — bookmarks, dashboards, and runbooks don't break when backends move.
- All backend addresses live in
ConnectivityConfigenv vars. The same image runs in compose withALPHASWARM_*_URL=http://alphaswarm-core:8000or in K8s withhttp://alphaswarm-core.default.svc.cluster.local. - Auth0 callback URLs stay constant. The Vite app sees one origin; the FastAPI proxy injects M2M
Authorizationheaders for cross-service calls. - Smaller blast radius. The control plane is a separate container on a separate Docker network (
alphaswarm-adminvsalphaswarm-internal) — they only talk over the proxy.
Negative
- Builds are larger and slower than per-surface images. Mitigated by Docker layer caching and buildx (~3 min cold, ~30s incremental).
- Scaling assumes Vite + Solara + proxy throughput grow together. In practice Vite assets are CDN-fronted by NGINX Ingress and the proxy is the bottleneck — a single container HPA on CPU is fine.
- Rolling back to webui-only or Solara-only means env-flag toggles (
ALPHASWARM_CLIENT_ENABLE_LEGACY_UI,ALPHASWARM_CLIENT_ENABLE_SOLARA) rather than swapping deployments.
Alternatives considered
- One container per surface — rejected. Adds 3 health probes, 3 Ingress rules, 3 image tags to keep in lockstep on every release. The operator experience regresses.
- Sidecar pattern — rejected. Mixing sidecars + multi-process supervision in one Pod adds significant Pod-startup ordering risk for marginal CPU savings.
Implementation references
- Multi-stage Dockerfile:
alphaswarm_platform/build/docker/alphaswarm_client/Dockerfile - FastAPI proxy:
alphaswarm/api/proxy.py - WebSocket proxy with reconnect:
alphaswarm/api/ws/proxy.py - ConnectivityConfig:
alphaswarm_core/connectivity/config.py