Tenant-router auth rollout runbook
Operator companion to Edge authentication & cell routing and the manifests at
alphaswarm_platform/deployments/kubernetes/edge/alphaswarm-tenant-router/. Follows the cell-router cutover — run that first if the Envoy edge is not serving yet.
The tenant-router ships fail-closed: AUTH_MODE=required with an
empty issuer, so a fresh apply crash-loops with a SettingsError
until you stamp real IdP values. That is intentional — complete this
runbook to bring the edge up authenticated.
1. Prerequisites
- The IdP is provisioned (Auth0 via
terraform/modules/auth0_identityor Entra viaalphaswarm_entra_directory) and the per-cell backends already validate the same issuer/audience (ALPHASWARM_AUTH_OIDC_ISSUER/..._AUDIENCEinalphaswarm-config, stamped bybuild/scripts/sync_auth0_env_to_k8s.py). - The claims pipeline stamps the namespaced routing claims
(
https://alphaswarm.internal/tenant_id,workspace_id, and — for B2B premium plans —tier). See Auth0 Actions / MSAL setup. - The cells registry has at least one
state=activecell per tier you route to (curl -sS $CP/manage/cells | jq '.data[].tier').
2. Stamp the auth ConfigMap
Edit (or overlay-patch) alphaswarm-tenant-router-config in
deployments/kubernetes/edge/alphaswarm-tenant-router/configmap.yaml:
data:
ALPHASWARM_TENANT_ROUTER_AUTH_MODE: "permissive" # step 3 flips to required
ALPHASWARM_TENANT_ROUTER_OIDC_ISSUER: "https://<tenant>.us.auth0.com/"
ALPHASWARM_TENANT_ROUTER_OIDC_AUDIENCE: "https://api.alphaswarm.internal/manage"
The JWKS URI derives from the issuer
(<issuer>/.well-known/jwks.json); set
ALPHASWARM_TENANT_ROUTER_JWKS_URI only for non-standard IdPs. Only
asymmetric algorithms are accepted — if you change
OIDC_ALGORITHMS, HS* values are refused at boot.
Apply + restart:
kubectl apply -k alphaswarm_platform/deployments/kubernetes/edge/
kubectl -n alphaswarm-edge rollout restart deploy/alphaswarm-tenant-router
kubectl -n alphaswarm-edge rollout status deploy/alphaswarm-tenant-router
3. Canary in permissive, then enforce
permissive denies invalid tokens but lets anonymous requests
through flagged x-alphaswarm-auth: anonymous (per-cell gates still
reject where they require auth). Watch the decision counters:
kubectl -n alphaswarm-edge port-forward svc/alphaswarm-tenant-router 8080 &
curl -s localhost:8080/metrics | grep authz_decisions_total
# alphaswarm_tenant_router_authz_decisions_total{decision="allow",mode="permissive",reason="verified"} 1042
# alphaswarm_tenant_router_authz_decisions_total{decision="allow",mode="permissive",reason="anonymous"} 3
# alphaswarm_tenant_router_authz_decisions_total{decision="deny",mode="permissive",reason="expired_token"} 7
When reason="anonymous" is ~zero for a representative window (only
unauthenticated probes remain), flip to enforcement:
ALPHASWARM_TENANT_ROUTER_AUTH_MODE: "required"
re-apply, restart, and confirm readyz reports the posture:
curl -s localhost:8080/readyz | jq
# {"status":"ok","cells":3,"auth_mode":"required","cba_mode":"enforce",...}
4. Verification checks
# Anonymous is denied (required mode):
curl -s -o /dev/null -w '%{http_code}\n' -XPOST localhost:8080/ext_authz/v3/check \
-H 'content-type: application/json' \
-d '{"attributes":{"request":{"http":{"headers":{}}}}}'
# 401
# A live SPA token is verified and routed:
TOKEN=$(...fetch from the SPA / device flow...)
curl -s -XPOST localhost:8080/ext_authz/v3/check \
-H 'content-type: application/json' \
-d "{\"attributes\":{\"request\":{\"http\":{\"headers\":{\"authorization\":\"Bearer ${TOKEN}\"}}}}}" \
-D - -o /dev/null | grep -i x-alphaswarm
# x-alphaswarm-cell: cell-shared-std-us-east-1a
# x-alphaswarm-auth: verified
# x-alphaswarm-sub: auth0|...
End-to-end through the edge, a tampered or expired token must produce
401 from Envoy, and x-alphaswarm-* request headers sent by the client
must arrive at the cell overwritten with verified values.
5. Cross-cell CBA keys (Phase 5 §8.5)
Cross-cell calls present a Cell-Bound-Authorization JWT. The
validator (co-located in the router) reads each source cell's
verification keys from the cells-registry annotation — publish them
when you enable cross-cell MCP:
curl -sS -XPATCH "$CP/manage/cells/cell-shared-std-us-east-1a" \
-H "authorization: Bearer $MGMT_TOKEN" -H 'content-type: application/json' \
-d '{"annotations":{"alphaswarm.internal/cba-jwks":"{\"keys\":[...]}"}}'
CBA_MODE=enforce (default) is safe before any workload mints CBAs —
requests without the header pass through. Use monitor to log
would-be denials during key rollout; check
cba_decisions_total{decision="deny"} before returning to enforce.
Single-cell edges should additionally pin
ALPHASWARM_TENANT_ROUTER_CBA_DESTINATION_CELL_ID to their own cell id.
6. Rollback
Auth enforcement is config-only — no image rollback needed:
- Flip
AUTH_MODEback topermissive(NOTdisabled; the insecure mode also demandsALLOW_INSECURE=trueand is for local dev only). kubectl -n alphaswarm-edge rollout restart deploy/alphaswarm-tenant-router.- The decision counters (
/metrics) and structuredauthz_denylogs (reason codes:missing_token,expired_token,wrong_audience,wrong_issuer,no_matching_key,forbidden_algorithm,jwks_unreachable) identify what was being denied before you re-enforce.
Failure modes worth knowing
| Symptom | Cause | Response |
|---|---|---|
Pod crash-loops with SettingsError | Missing issuer/audience in required/permissive | Stamp the ConfigMap (step 2). |
All requests 401 jwks_unreachable | Router cannot reach the IdP JWKS (egress 443 blocked, wrong issuer) | Check the NetworkPolicy + issuer URL; the JWKS cache serves stale once warmed, so this bites hardest on cold boots. |
401 no_matching_key after IdP key rotation | kid not in cached JWKS | The router force-refreshes once per unknown kid automatically; persistent failures mean the issuer/JWKS URI points at the wrong tenant. |
503 no_cell_available for premium users | No active shared-prem cell | Explicit tiers are never downgraded — activate a cell for the tier or fix the claim pipeline. |
readyz shows registry_stale: true | Control plane unreachable > REGISTRY_STALENESS_WARN_SECONDS | Routing continues on last-known-good cells; restore alphaswarm-cp before making placement changes. |