Skip to main content

Operations runbook — Edge deployment

Deploying AlphaSwarm to edge / on-prem locations where the standard cloud K8s overlays don't fit.

Reference shapes

Shape A — single VM with Docker Compose

The simplest edge deployment: one Linux VM running the docker-compose stack with the admin overlay.

git clone https://github.com/julianwiley/alphaswarm.git
cd alphaswarm

# Generate config + bring up
make generate-config ENV=local
make dev-admin

Suitable for: dev labs, single-tenant trials, training environments.

Not suitable for: multi-node fault tolerance, HPA, NetworkPolicy enforcement.

Shape B — k3s on a single edge box

For sites with a single VM but where you want production-style observability + Pod-level lifecycle:

curl -sfL https://get.k3s.io | sh -
kubectl apply -k alphaswarm_platform/deployments/kubernetes/overlays/dev

k3s ships with Traefik (substitute for the NGINX Ingress) and a built-in service load balancer (Klipper). You can install NGINX Ingress on top if you want to keep the same Ingress manifests as production.

Shape C — rpi_kubernetes (4-node k3s lab)

The reference home/edge cluster uses two sibling repos:

  1. rpi_kubernetes — k3s bootstrap, portal, FinOps policies, storage class.
  2. alphaswarm — every shared service + AlphaSwarm workload under alphaswarm_platform/deployments/kubernetes/.
# In rpi_kubernetes (portal + cluster bootstrap only)
kubectl apply -k kubernetes/

# In alphaswarm (AlphaSwarm shared infra + app overlays)
kubectl apply -k alphaswarm_platform/deployments/kubernetes/overlays/dev

Streaming install helpers live under alphaswarm_platform/scripts/cluster_install/ (install-flink.sh, install-alphavantage.sh, build-flink-jobs.sh). See streaming.md for the full order.

Edge-specific concerns

Image distribution

Edge sites often have slow / metered uplinks. Mirror the AlphaSwarm images into an on-site registry:

docker pull ghcr.io/julianwiley/alphaswarm-client:latest-stable
docker tag ghcr.io/julianwiley/alphaswarm-client:latest-stable mirror.local:5000/alphaswarm-client:latest-stable
docker push mirror.local:5000/alphaswarm-client:latest-stable

Then override the image tags in your overlay:

# alphaswarm_platform/deployments/kubernetes/overlays/edge-site-a/kustomization.yaml
images:
- name: ghcr.io/julianwiley/alphaswarm-client
newName: mirror.local:5000/alphaswarm-client
newTag: latest-stable

Auth0 unreachability

Edge sites may have intermittent connectivity to Auth0's JWKS endpoint. The JWT validator caches JWKS for ALPHASWARM_CP_AUTH_JWKS_TTL_SECONDS (default 600s); set it higher (e.g. 3600s) so the cache spans typical outage windows.

In hard offline scenarios, set ALPHASWARM_AUTH_ENFORCE=permissive so authenticated requests fall through to local-default identity and audit-log the violation. The operator UI shows a yellow banner when this mode is active.

Storage

Edge sites should NOT rely on the in-cluster Postgres + Redis. Provision durable storage upstream and point AlphaSwarm at it via the connectivity matrix:

ALPHASWARM_DATABASE_URL=postgresql://alphaswarm:****@cloud-postgres.example.com:5432/alphaswarm
ALPHASWARM_REDIS_URL=rediss://cloud-redis.example.com:6380

Telemetry

Edge sites should forward telemetry to a central observability collector. Set ALPHASWARM_OTEL_COLLECTOR_URL to the gateway endpoint; the control plane streams MetricPoints + AlertEvents to it via OTLP.

Cutover from compose to k3s

If you started on shape A and want to move to shape B:

  1. docker compose down to stop the compose stack.
  2. Take a Postgres dump: docker exec alphaswarm-postgres pg_dump -U alphaswarm alphaswarm > alphaswarm.sql.
  3. Bring up shape B per the recipe above.
  4. Restore: kubectl exec -n alphaswarm deploy/alphaswarm-postgres -- psql -U alphaswarm alphaswarm < alphaswarm.sql.
  5. Verify /manage/health and /health both return 200.

No code changes required — the connectivity matrix abstracts which backend is hosting which service.