Operations runbook — Edge deployment
Deploying AlphaSwarm to edge / on-prem locations where the standard cloud K8s overlays don't fit.
Reference shapes
Shape A — single VM with Docker Compose
The simplest edge deployment: one Linux VM running the docker-compose stack with the admin overlay.
git clone https://github.com/julianwiley/alphaswarm.git
cd alphaswarm
# Generate config + bring up
make generate-config ENV=local
make dev-admin
Suitable for: dev labs, single-tenant trials, training environments.
Not suitable for: multi-node fault tolerance, HPA, NetworkPolicy enforcement.
Shape B — k3s on a single edge box
For sites with a single VM but where you want production-style observability + Pod-level lifecycle:
curl -sfL https://get.k3s.io | sh -
kubectl apply -k alphaswarm_platform/deployments/kubernetes/overlays/dev
k3s ships with Traefik (substitute for the NGINX Ingress) and a built-in service load balancer (Klipper). You can install NGINX Ingress on top if you want to keep the same Ingress manifests as production.
Shape C — rpi_kubernetes (4-node k3s lab)
The reference home/edge cluster uses two sibling repos:
rpi_kubernetes— k3s bootstrap, portal, FinOps policies, storage class.alphaswarm— every shared service + AlphaSwarm workload underalphaswarm_platform/deployments/kubernetes/.
# In rpi_kubernetes (portal + cluster bootstrap only)
kubectl apply -k kubernetes/
# In alphaswarm (AlphaSwarm shared infra + app overlays)
kubectl apply -k alphaswarm_platform/deployments/kubernetes/overlays/dev
Streaming install helpers live under
alphaswarm_platform/scripts/cluster_install/ (install-flink.sh,
install-alphavantage.sh, build-flink-jobs.sh). See
streaming.md for the full order.
Edge-specific concerns
Image distribution
Edge sites often have slow / metered uplinks. Mirror the AlphaSwarm images into an on-site registry:
docker pull ghcr.io/julianwiley/alphaswarm-client:latest-stable
docker tag ghcr.io/julianwiley/alphaswarm-client:latest-stable mirror.local:5000/alphaswarm-client:latest-stable
docker push mirror.local:5000/alphaswarm-client:latest-stable
Then override the image tags in your overlay:
# alphaswarm_platform/deployments/kubernetes/overlays/edge-site-a/kustomization.yaml
images:
- name: ghcr.io/julianwiley/alphaswarm-client
newName: mirror.local:5000/alphaswarm-client
newTag: latest-stable
Auth0 unreachability
Edge sites may have intermittent connectivity to Auth0's JWKS endpoint. The JWT validator caches JWKS for ALPHASWARM_CP_AUTH_JWKS_TTL_SECONDS (default 600s); set it higher (e.g. 3600s) so the cache spans typical outage windows.
In hard offline scenarios, set ALPHASWARM_AUTH_ENFORCE=permissive so authenticated requests fall through to local-default identity and audit-log the violation. The operator UI shows a yellow banner when this mode is active.
Storage
Edge sites should NOT rely on the in-cluster Postgres + Redis. Provision durable storage upstream and point AlphaSwarm at it via the connectivity matrix:
ALPHASWARM_DATABASE_URL=postgresql://alphaswarm:****@cloud-postgres.example.com:5432/alphaswarm
ALPHASWARM_REDIS_URL=rediss://cloud-redis.example.com:6380
Telemetry
Edge sites should forward telemetry to a central observability collector. Set ALPHASWARM_OTEL_COLLECTOR_URL to the gateway endpoint; the control plane streams MetricPoints + AlertEvents to it via OTLP.
Cutover from compose to k3s
If you started on shape A and want to move to shape B:
docker compose downto stop the compose stack.- Take a Postgres dump:
docker exec alphaswarm-postgres pg_dump -U alphaswarm alphaswarm > alphaswarm.sql. - Bring up shape B per the recipe above.
- Restore:
kubectl exec -n alphaswarm deploy/alphaswarm-postgres -- psql -U alphaswarm alphaswarm < alphaswarm.sql. - Verify
/manage/healthand/healthboth return 200.
No code changes required — the connectivity matrix abstracts which backend is hosting which service.