Saltar al contenido principal

ADR 003 — Auth0 zero-trust two-layer security model

Context

AlphaSwarm already uses Auth0 for the operator UI via the in-flight alphaswarm/auth/providers/auth0.py plugin (AGENTS hard rule 27). What's missing for the refactor is the second layer: cryptographic JWT validation at every service boundary, resource-scoped claims so users only see their own resources, and a per-role scope matrix that the alphaswarm_controller micro-project can enforce without ever importing alphaswarm.*.

Three identity strategies were considered:

  1. Self-hosted Keycloak — full control, but operations burden and one more stateful service per cluster.
  2. Single-layer Auth0 (current state) — Auth0 only for the SPA login. Backend services still trust user-injected headers via session cookies.
  3. Two-layer Auth0 (recommended in prompt) — Auth0 OIDC for the SPA + JWT (RS256) bearer tokens validated independently by every service via JWKS.

Decision

Adopt the two-layer Auth0 model with the following invariants:

  1. The Vite SPA in alphaswarm_client performs Authorization Code + PKCE against the Auth0 tenant. Access tokens are short-lived (1 h) JWTs with aud = https://api.alphaswarm.internal/manage.

  2. Every backend service — alphaswarm (FastAPI API), alphaswarm_controller (micro-project), and the rpi_kubernetes management/backend shim — re-validates JWTs against the Auth0 JWKS independently using the shared validator in alphaswarm_core/auth/. No service trusts a header set by another service.

  3. Auth0 Post-Login Action (template in alphaswarm_platform/terraform/modules/auth0_identity/post_login_action.js.tftpl) calls POST /_internal/auth0/sync to fetch user-specific custom claims and injects them into the access token under the https://alphaswarm.internal/ namespace:

    • https://alphaswarm.internal/org_id — tenancy boundary
    • https://alphaswarm.internal/roles — coarse role list (alphaswarm-viewer, alphaswarm-admin, alphaswarm-operator)
    • https://alphaswarm.internal/resources — explicit resource ID allowlist (org-scoped)
    • https://alphaswarm.internal/workspace_id, https://alphaswarm.internal/team_ids — existing tenancy hints
  4. M2M tokens for service-to-service calls (e.g. alphaswarm_clientalphaswarm_controller) mint through Auth0 Client Credentials. The proxy in alphaswarm/api/proxy.py attaches a cached M2M token; alphaswarm_controller validates it like any other JWT.

  5. The four-role RBAC matrix from the refactor prompt becomes the canonical scope grid:

    RoleScopes granted
    alphaswarm-viewerread:infrastructure
    alphaswarm-operatorread:infrastructure + manage:agents
    alphaswarm-adminread:infrastructure + manage:agents + manage:infrastructure
    alphaswarm-superadminAll of the above + admin:cluster (only role that bypasses filter_resources)
  6. Every list endpoint in both alphaswarm and alphaswarm_controller passes its result list through alphaswarm_core.auth.resource_filter.filter_resources(items, jwt_payload) before returning. The filter respects admin:cluster (returns everything) and otherwise intersects against the resources claim.

Consequences

Positive

  • Zero-trust between services. A compromised alphaswarm_client container can issue requests but cannot forge claims — the control plane re-validates.
  • Resource scoping moves from "frontend hides things" to "backend cannot return things". Defence in depth.
  • Auth0 is already in production for the SPA; the only delta is adding M2M tokens and the resources claim.
  • The alphaswarm_controller micro-project gets a clean security boundary without importing alphaswarm.auth.* — it depends on alphaswarm_core/auth/ only.

Negative

  • Every API request pays JWKS verification cost (~0.2 ms with lru_cache). Acceptable.
  • The https://alphaswarm/https://alphaswarm.internal/ namespace rename requires one release of dual-reading both namespaces (handled by auth_claims_namespace_aliases setting).
  • Operators need to be onboarded to one of the four roles before they can use the new control plane — solved by /build/scripts/provision_auth0.py running on bootstrap.

Alternatives considered

  • Self-hosted Keycloak — rejected. Adds operational burden without business value. Auth0 plays well with Terraform (already in alphaswarm_platform/terraform/modules/auth0_identity/).
  • Cookie-only sessions — rejected. Backend services would have to trust whatever set the cookie; doesn't compose with the cross-service M2M case.
  • Opaque tokens with introspection — rejected. Adds a round trip per request against Auth0's /oauth/token/introspect, and Auth0's free tier rate-limits it.

Implementation references

  • JWT validator: alphaswarm_core/auth/validator.py (extracted from alphaswarm/auth/providers/auth0.py)
  • Resource filter: alphaswarm_core/auth/resource_filter.py
  • Claims namespace setting: alphaswarm/config/settings.py::auth_claims_namespace, auth_claims_namespace_aliases
  • Auth0 Action template: alphaswarm_platform/terraform/modules/auth0_identity/post_login_action.js.tftpl
  • Sync endpoint: alphaswarm/api/routes/auth0_sync.py
  • Terraform Auth0 module: alphaswarm_platform/terraform/modules/auth0_identity/main.tf
  • Provisioning script: alphaswarm_platform/build/scripts/provision_auth0.py