ADR 003 — Auth0 zero-trust two-layer security model
- Status: Accepted (2026-05-18)
- Authors: Platform team
- Supersedes: None
- Related: ADR 005 — separated control plane, alphaswarm_docs/identity.md, alphaswarm_docs/auth0-actions.md
Context
AlphaSwarm already uses Auth0 for the operator UI via the in-flight alphaswarm/auth/providers/auth0.py plugin (AGENTS hard rule 27). What's missing for the refactor is the second layer: cryptographic JWT validation at every service boundary, resource-scoped claims so users only see their own resources, and a per-role scope matrix that the alphaswarm_controller micro-project can enforce without ever importing alphaswarm.*.
Three identity strategies were considered:
- Self-hosted Keycloak — full control, but operations burden and one more stateful service per cluster.
- Single-layer Auth0 (current state) — Auth0 only for the SPA login. Backend services still trust user-injected headers via session cookies.
- Two-layer Auth0 (recommended in prompt) — Auth0 OIDC for the SPA + JWT (
RS256) bearer tokens validated independently by every service via JWKS.
Decision
Adopt the two-layer Auth0 model with the following invariants:
-
The Vite SPA in
alphaswarm_clientperforms Authorization Code + PKCE against the Auth0 tenant. Access tokens are short-lived (1 h) JWTs withaud=https://api.alphaswarm.internal/manage. -
Every backend service —
alphaswarm(FastAPI API),alphaswarm_controller(micro-project), and therpi_kubernetesmanagement/backendshim — re-validates JWTs against the Auth0 JWKS independently using the shared validator inalphaswarm_core/auth/. No service trusts a header set by another service. -
Auth0 Post-Login Action (template in
alphaswarm_platform/terraform/modules/auth0_identity/post_login_action.js.tftpl) callsPOST /_internal/auth0/syncto fetch user-specific custom claims and injects them into the access token under thehttps://alphaswarm.internal/namespace:https://alphaswarm.internal/org_id— tenancy boundaryhttps://alphaswarm.internal/roles— coarse role list (alphaswarm-viewer,alphaswarm-admin,alphaswarm-operator)https://alphaswarm.internal/resources— explicit resource ID allowlist (org-scoped)https://alphaswarm.internal/workspace_id,https://alphaswarm.internal/team_ids— existing tenancy hints
-
M2M tokens for service-to-service calls (e.g.
alphaswarm_client→alphaswarm_controller) mint through Auth0 Client Credentials. The proxy inalphaswarm/api/proxy.pyattaches a cached M2M token;alphaswarm_controllervalidates it like any other JWT. -
The four-role RBAC matrix from the refactor prompt becomes the canonical scope grid:
Role Scopes granted alphaswarm-viewerread:infrastructurealphaswarm-operatorread:infrastructure+manage:agentsalphaswarm-adminread:infrastructure+manage:agents+manage:infrastructurealphaswarm-superadminAll of the above + admin:cluster(only role that bypassesfilter_resources) -
Every list endpoint in both
alphaswarmandalphaswarm_controllerpasses its result list throughalphaswarm_core.auth.resource_filter.filter_resources(items, jwt_payload)before returning. The filter respectsadmin:cluster(returns everything) and otherwise intersects against theresourcesclaim.
Consequences
Positive
- Zero-trust between services. A compromised
alphaswarm_clientcontainer can issue requests but cannot forge claims — the control plane re-validates. - Resource scoping moves from "frontend hides things" to "backend cannot return things". Defence in depth.
- Auth0 is already in production for the SPA; the only delta is adding M2M tokens and the
resourcesclaim. - The
alphaswarm_controllermicro-project gets a clean security boundary without importingalphaswarm.auth.*— it depends onalphaswarm_core/auth/only.
Negative
- Every API request pays JWKS verification cost (~0.2 ms with
lru_cache). Acceptable. - The
https://alphaswarm/→https://alphaswarm.internal/namespace rename requires one release of dual-reading both namespaces (handled byauth_claims_namespace_aliasessetting). - Operators need to be onboarded to one of the four roles before they can use the new control plane — solved by
/build/scripts/provision_auth0.pyrunning on bootstrap.
Alternatives considered
- Self-hosted Keycloak — rejected. Adds operational burden without business value. Auth0 plays well with Terraform (already in
alphaswarm_platform/terraform/modules/auth0_identity/). - Cookie-only sessions — rejected. Backend services would have to trust whatever set the cookie; doesn't compose with the cross-service M2M case.
- Opaque tokens with introspection — rejected. Adds a round trip per request against Auth0's
/oauth/token/introspect, and Auth0's free tier rate-limits it.
Implementation references
- JWT validator:
alphaswarm_core/auth/validator.py(extracted fromalphaswarm/auth/providers/auth0.py) - Resource filter:
alphaswarm_core/auth/resource_filter.py - Claims namespace setting:
alphaswarm/config/settings.py::auth_claims_namespace,auth_claims_namespace_aliases - Auth0 Action template:
alphaswarm_platform/terraform/modules/auth0_identity/post_login_action.js.tftpl - Sync endpoint:
alphaswarm/api/routes/auth0_sync.py - Terraform Auth0 module:
alphaswarm_platform/terraform/modules/auth0_identity/main.tf - Provisioning script:
alphaswarm_platform/build/scripts/provision_auth0.py