ADR 003 — Auth0 zero-trust two-layer security model

Status: Accepted (2026-05-18)
Authors: Platform team
Supersedes: None
Related: ADR 005 — separated control plane, alphaswarm_docs/identity.md, alphaswarm_docs/auth0-actions.md

Context

AlphaSwarm already uses Auth0 for the operator UI via the in-flight alphaswarm/auth/providers/auth0.py plugin (AGENTS hard rule 27). What's missing for the refactor is the second layer: cryptographic JWT validation at every service boundary, resource-scoped claims so users only see their own resources, and a per-role scope matrix that the alphaswarm_controller micro-project can enforce without ever importing alphaswarm.*.

Three identity strategies were considered:

Self-hosted Keycloak — full control, but operations burden and one more stateful service per cluster.
Single-layer Auth0 (current state) — Auth0 only for the SPA login. Backend services still trust user-injected headers via session cookies.
Two-layer Auth0 (recommended in prompt) — Auth0 OIDC for the SPA + JWT (RS256) bearer tokens validated independently by every service via JWKS.

Decision

Adopt the two-layer Auth0 model with the following invariants:

The Vite SPA in alphaswarm_client performs Authorization Code + PKCE against the Auth0 tenant. Access tokens are short-lived (1 h) JWTs with aud = https://api.alphaswarm.internal/manage.
Every backend service — alphaswarm (FastAPI API), alphaswarm_controller (micro-project), and the rpi_kubernetes management/backend shim — re-validates JWTs against the Auth0 JWKS independently using the shared validator in alphaswarm_core/auth/. No service trusts a header set by another service.
Auth0 Post-Login Action (template in alphaswarm_platform/terraform/modules/auth0_identity/post_login_action.js.tftpl) calls POST /_internal/auth0/sync to fetch user-specific custom claims and injects them into the access token under the https://alphaswarm.internal/ namespace:
- https://alphaswarm.internal/org_id — tenancy boundary
- https://alphaswarm.internal/roles — coarse role list (alphaswarm-viewer, alphaswarm-admin, alphaswarm-operator)
- https://alphaswarm.internal/resources — explicit resource ID allowlist (org-scoped)
- https://alphaswarm.internal/workspace_id, https://alphaswarm.internal/team_ids — existing tenancy hints
M2M tokens for service-to-service calls (e.g. alphaswarm_client → alphaswarm_controller) mint through Auth0 Client Credentials. The proxy in alphaswarm/api/proxy.py attaches a cached M2M token; alphaswarm_controller validates it like any other JWT.

The four-role RBAC matrix from the refactor prompt becomes the canonical scope grid:

Role	Scopes granted
`alphaswarm-viewer`	`read:infrastructure`
`alphaswarm-operator`	`read:infrastructure` + `manage:agents`
`alphaswarm-admin`	`read:infrastructure` + `manage:agents` + `manage:infrastructure`
`alphaswarm-superadmin`	All of the above + `admin:cluster` (only role that bypasses `filter_resources`)

Every list endpoint in both alphaswarm and alphaswarm_controller passes its result list through alphaswarm_core.auth.resource_filter.filter_resources(items, jwt_payload) before returning. The filter respects admin:cluster (returns everything) and otherwise intersects against the resources claim.

Consequences

Positive

Zero-trust between services. A compromised alphaswarm_client container can issue requests but cannot forge claims — the control plane re-validates.
Resource scoping moves from "frontend hides things" to "backend cannot return things". Defence in depth.
Auth0 is already in production for the SPA; the only delta is adding M2M tokens and the resources claim.
The alphaswarm_controller micro-project gets a clean security boundary without importing alphaswarm.auth.* — it depends on alphaswarm_core/auth/ only.

Negative

Every API request pays JWKS verification cost (~0.2 ms with lru_cache). Acceptable.
The https://alphaswarm/ → https://alphaswarm.internal/ namespace rename requires one release of dual-reading both namespaces (handled by auth_claims_namespace_aliases setting).
Operators need to be onboarded to one of the four roles before they can use the new control plane — solved by /build/scripts/provision_auth0.py running on bootstrap.

Alternatives considered

Self-hosted Keycloak — rejected. Adds operational burden without business value. Auth0 plays well with Terraform (already in alphaswarm_platform/terraform/modules/auth0_identity/).
Cookie-only sessions — rejected. Backend services would have to trust whatever set the cookie; doesn't compose with the cross-service M2M case.
Opaque tokens with introspection — rejected. Adds a round trip per request against Auth0's /oauth/token/introspect, and Auth0's free tier rate-limits it.

Implementation references

JWT validator: alphaswarm_core/auth/validator.py (extracted from alphaswarm/auth/providers/auth0.py)
Resource filter: alphaswarm_core/auth/resource_filter.py
Claims namespace setting: alphaswarm/config/settings.py::auth_claims_namespace, auth_claims_namespace_aliases
Auth0 Action template: alphaswarm_platform/terraform/modules/auth0_identity/post_login_action.js.tftpl
Sync endpoint: alphaswarm/api/routes/auth0_sync.py
Terraform Auth0 module: alphaswarm_platform/terraform/modules/auth0_identity/main.tf
Provisioning script: alphaswarm_platform/build/scripts/provision_auth0.py

Context​

Decision​

Consequences​

Alternatives considered​

Implementation references​

Context

Decision

Consequences

Alternatives considered

Implementation references