Edge authentication & cell routing
Every request entering the hosted platform crosses one authentication
decision point before it reaches a cell:
alphaswarm-edge
(Envoy) makes two ext_authz callouts to
alphaswarm-tenant-router,
which verifies identity and decides cell placement in one pass.
Fail-closed verification
The router's posture is an explicit setting
(ALPHASWARM_TENANT_ROUTER_AUTH_MODE), and the default is the strict
one — see the
rollout runbook for the
operational details:
| Mode | No token | Invalid token | Valid token |
|---|---|---|---|
required (default, hosted cells) | 401 | 401 | allow |
permissive (canary/migration) | allow, flagged | 401 | allow |
disabled (local dev; needs ALLOW_INSECURE=true too) | allow | unsigned decode | unsigned decode |
Three design rules keep the edge honest:
- Boot-time refusal. In
required/permissivethe pod exits at startup unless issuer + audience (and a derivable JWKS URI) are configured. A crash-looping edge is strictly better than one that silently routes unauthenticated traffic. - Asymmetric algorithms only.
RS*/PS*/ES*/EdDSAare the only acceptable JWT algorithms;HS*andnoneare rejected before any key material is consulted, closing the alg-confusion class of attacks. Verification semantics mirroralphaswarm_core.auth.jwt_validator.JwtValidator(kid selection, one forced JWKS refresh on unknown kid for key rotation, TTL cache that serves stale on IdP blips). - Identity headers are always overwritten. On every ALLOW the
router emits the full verified set —
x-alphaswarm-sub,x-alphaswarm-tenant,x-alphaswarm-workspace,x-alphaswarm-org,x-alphaswarm-auth— empty when a claim is absent, so a client can never smuggle its ownx-alphaswarm-*values past the edge. Per-cell FastAPI gates (alphaswarm.api.security) still re-validate the JWT; the edge is defense-in-depth, not the only boundary (AGENTS rule 11 applies at every layer).
B2C / B2B tier routing
Cell selection composes the multi-tenancy model with the deployment tiers from RESTRUCTURING_PLAN.md §6.1:
| Plan | JWT tier claim | Cell tier | Tenancy strategy |
|---|---|---|---|
| B2C consumer | (none) or shared-std | shared-std | shared_schema_rls |
| B2B premium | shared-prem | shared-prem | schema_per_tenant |
| Regulated enterprise | (registry pinning) | silo-reg | database_per_enterprise |
| Custom contract | (registry pinning) | silo-custom | hybrid |
Resolution order, per request:
- Registry pinning is authoritative — a tenant listed in a cell's
pinned_tenantsalways lands there (silo cells, controlled migrations), regardless of token claims. - The verified
tierclaim (namespacedhttps://alphaswarm.internal/tier, stamped by the Auth0 Action / Entra claims pipeline) selects the tier. An explicit tier is honored or refused with 503 — never silently downgraded onto another tier's tenancy strategy. - Default tier (
shared-std) otherwise.
Within a tier, unpinned tenants spread across active cells by
rendezvous (highest-random-weight) hashing keyed on
tenant_id → organization_id → sub: every router replica picks the
same cell with no shared state, a tenant is sticky to its cell, and
adding or draining a cell only remaps the tenants that hashed onto it.
Registry staleness (the router caches the control plane's
/manage/cells view) is reported, never failed closed — the data
plane keeps routing on last-known-good cells through a control-plane
outage, surfacing registry_stale in /readyz and a counter in
/metrics.
Cell-Bound-Authorization (cross-cell calls)
Cross-cell calls are the highest-risk path (Phase 5 §8.5). The mint
side lives in alphaswarm.auth.cell_bound; the router hosts the
validator at POST /cell_bound/v1/check (the
alphaswarm-cell-bound-validator Service selects the same pods):
- No
Cell-Bound-Authorizationheader → pass. External user traffic and same-cell calls never carry one; the response still emits emptyx-alphaswarm-cell-source-*headers so smuggled values are stripped. - Header present → the token must verify against the source cell's
published keys (cells-registry annotation
alphaswarm.internal/cba-jwks, JWKS JSON or PEM), withiss= source cell,aud= destination cell, a ≤90 s lifetime (mint stamps 60 s), requiredjti, and per-replica replay rejection. Valid CBAs injectx-alphaswarm-cell-source+x-alphaswarm-cell-source-workload(SPIFFE id) so destination-cell services can authorize the calling workload. CBA_MODE=monitorlogs would-be denials without blocking (rollout aid);enforceis the default and is safe before any workload mints CBAs because headerless requests pass through.
Where things live
| Surface | Path |
|---|---|
| Router service + tests | alphaswarm_platform/tenant_router/ |
| Edge Envoy config (canonical template) | alphaswarm_platform/build/docker/alphaswarm-edge/envoy.template.yaml |
| Deployment (ConfigMap, NetworkPolicy, HPA, Services) | alphaswarm_platform/deployments/kubernetes/edge/alphaswarm-tenant-router/ |
| Backend JWT validation (per-cell) | alphaswarm/auth/oidc.py, alphaswarm_core/auth/jwt_validator.py |
| CBA mint/verify library | alphaswarm/auth/cell_bound.py |
| Operator runbook | Tenant-router auth rollout |
| Cutover history | Cell-router cutover |