Skip to main content

Service-level view

This page catalogues every service AlphaSwarm runs — the application workloads, the control plane, the data layer, the observability stack, and the external edge surface — at a single level of detail. It pairs control-plane-topology.md (which says how services are discovered) and terraform-control-plane.md (which says how they are provisioned) with a what is each service reference.

The single source of truth for the service registry is alphaswarm_platform/configs/deployment/topology.yaml. This page is generated against that file plus each service's matching package contract. When a row drifts, the truth is the YAML.

Reading the catalogue

Every service has its own detail page under services/ with the same layout:

  • Identity — id, role, label, package or upstream image.
  • Wire — protocol, port, health endpoint, public URL (if any).
  • Deployment — which compose / kustomize / AQP CR / Terraform template stands it up.
  • Dependencies — upstream services it calls, downstream services that call it.
  • Operations — runbooks, scaling notes, redaction posture, feature flags.

Detail pages link back to the canonical concept doc that owns each contract — they do not duplicate prose.

How services compose

                    ┌─ alphaswarm-website ──────────┐ public marketing
│ (Cloudflare Pages, no auth) │
└───────────────────────────────┘

▼ NEXT_PUBLIC_ALPHASWARM_APP_URL
B2C / B2B users ─▶ alphaswarm-ui ──┐
Internal staff ─▶ alphaswarm-admin ┼──▶ alphaswarm-cp ──▶ /manage/* control plane
Local power user ─▶ alphaswarm-client┤ ──▶ /auth/* identity broker
Operators (CLI) ─▶ alphaswarm-cli ┤ ──▶ /proxy/* connection mesh (Phase 5)

▼ HTTP
alphaswarm-core (FastAPI)

┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
alphaswarm-worker alphaswarm-executor alphaswarm-beat alphaswarm-ml-mcp
(light queues) (heavy compute) (scheduler) (DataMCP /mcp/ml)

Data plane: postgres ─ redis ─ neo4j ─ chromadb ─ minio ─ iceberg(Polaris)
Streaming: kafka(Strimzi) | redpanda ─ schema-registry ─ flink ─ redpanda-connect
ML / orch: mlflow ─ argo-workflows ─ argo-events ─ bentoml ─ kserve ─ dagster ─ ragflow
Observability: otel-collector ─ prometheus ─ grafana ─ jaeger ─ loki ─ vector ─ victoriametrics ─ phoenix
Mesh ID: spire (issuer) ─▶ linkerd (mTLS) ─▶ vault-secrets-operator ─▶ pomerium (IAP)
Edge: cloudflared (alpha-swarm.ai) | cloudflared-aqp-green | alphaswarm-edge | tenant-router
Sandbox: agent-sandbox/gvisor ─▶ agent-sandbox/pool
Operators: aqp-controller-operator (8 AQP* CRDs) ─ bots-operator (4 QuantBot CRDs)
External: alphaswarm-docs (Cloudflare Pages) ─ alphaswarm-docs-status (Instatus) ─ alphaswarm-docs-archive

Identity flows from spire through linkerd through vault-secrets-operator to every workload pod; secrets land via ExternalSecret resources, never in values.yaml. The pomerium IAP wraps the bare /manage/* ingress.

Application services

Services that run AlphaSwarm code. Each is built from a Dockerfile in this workspace and is owned by the package that supplies its image.

Service idRolePkgImage (key)PortHealthPublic URLDeployed via
alphaswarm-coreapialphaswarmapi8000/readyz— (private)base/alphaswarm-core, AQPMonolith CR, compose api
alphaswarm-workerworkeralphaswarmworker(none)base/alphaswarm-worker, AQPMonolith CR, compose worker
alphaswarm-executorexecutoralphaswarmexecutor(none)base/alphaswarm-executor, compose alphaswarm-executor/worker-gpu
alphaswarm-beatscheduleralphaswarmbeat(none)base/alphaswarm-worker, AQPMonolith CR, compose beat
alphaswarm-cpcontrol-planealphaswarm_controllercp9000/manage/readyzhttps://manage.alpha-swarm.aibase/alphaswarm-cp, compose alphaswarm-cp
alphaswarm-clientfrontendalphaswarm_clientfrontend80/— (private)base/alphaswarm-client, AQPClient CR, compose client
alphaswarm-uifrontendalphaswarm_uiui80/api/healthzhttps://app.alpha-swarm.ai(Vercel/Pages) AQPUI CR
alphaswarm-adminadminalphaswarm_adminadmin8900/admin/healthzhttps://admin.alpha-swarm.aiAQPAdmin CR, compose alphaswarm-admin
alphaswarm-ideidealphaswarm_ideide3000/(per-user)alphaswarm-ide kustomize, AQPIDE CR
alphaswarm-ml-mcpmcpalphaswarm_models(pigg. on api)8000/mcp/ml/toolsbase/alphaswarm-core (extra route)

Data layer

Stateful services owned by the platform — the AlphaSwarm runtime is a client of every row below.

Service idRoleImagePortStorageDeployed via
postgresdatabasepgvector/pgvector:pg1654325 Gi (StatefulSet)base-services/postgres-shared
rediscacheredis:7-alpine (master) / redis-stack:7.4 (local)63792 Gibase/redis-master, base-services/redis-shared
neo4jgraphneo4j:5-community7474, 76875 Gibase-services (cell-local), compose neo4j
chromadbvectorchromadb/chroma:1.0.168000 / 8001(ephemeral)base-services/chromadb, compose chromadb
mlflowmlopsghcr.io/mlflow/mlflow:v2.11.15000object storebase-services/mlflow, compose mlflow

Object storage and the Iceberg catalog (MinIO + Polaris) live under the streaming/lakehouse umbrella; they are documented under base-services/minio and base-services/polaris in deployment patterns by category.

Observability

Routed by otel-collector-gateway; metrics in VictoriaMetrics + Prometheus (parallel during cutover), logs in Loki, traces in Jaeger, and the AI / LLM slice in Phoenix.

Service idRoleImagePortDeployed via
otel-collectorobservabilityotel/opentelemetry-collector4317observability/opentelemetry-collector-gateway
prometheusmetricsprom/prometheus (kube-prometheus-stack)9090observability/kube-prometheus-stack
grafanadashboardsgrafana/grafana3000observability/kube-prometheus-stack
jaegertracingjaegertracing/all-in-one6831 / 16686observability/jaeger
lokilogsgrafana/loki:3.3.23100observability/loki
vectorlog shippertimberio/vector:0.43.0observability/vector
victoriametricsmetricsvictoriametrics/victoria-metrics:v1.108.08428observability/victoriametrics

Phoenix + the OTel operator are documented inline on otel-collector since they are part of the same telemetry pipeline.

External services

Hosted off-cluster — included here because the topology references them and operators need to know who runs them.

Service idRoleHosted onPublic URLDeployed via
alphaswarm-docsdocsCloudflare Pageshttps://docs.alpha-swarm.aiTerraform module cloudflare_pages_docs
alphaswarm-websitemarketingCloudflare Pageshttps://alpha-swarm.aiTerraform module cloudflare_pages_docs (forthcoming)
alphaswarm-docs-statusstatus pageInstatus SaaShttps://status.alpha-swarm.aiTerraform module instatus
alphaswarm-docs-archivearchiveCloudflare Pageshttps://archive.alpha-swarm.aiTerraform module cloudflare_pages_docs

Deployment patterns

Every service above is deployable through one or more of the surfaces below. The deployment-templates catalogue maps each named pattern to a hash-locked TerraformStackSpec.

PatternWhat it stands upTemplate slugSource
Local devk3d cluster + base + minimal observabilitylocal-devtemplates/local-dev.yaml
k3d + MLOpslocal-dev + Argo Workflows + Dagster + MLflowk3d-with-mlopstemplates/k3d-with-mlops.yaml
AWS minimumSingle-account ECS + Cognito + ALB + Bedrock Haikuaws-minimumtemplates/aws-minimum.yaml
AWS shared cellEKS + base + base-services + observability + edge for one shared standard cellaws-cell-shared-stdtemplates/aws-cell-shared-std.yaml
AWS shared cell (premium)shared-std + dedicated node group + reserved capacityaws-cell-shared-premiumtemplates/aws-cell-shared-premium.yaml
AWS silo tenantSingle-tenant cell with hard isolationaws-silo-tenanttemplates/aws-silo-tenant.yaml
GCP cellGKE + Workload Identity + base + base-servicesgcp-full-celltemplates/gcp-full-cell.yaml
Azure cellAKS + Workload Identity + Entra-bound baseazure-full-celltemplates/azure-full-cell.yaml
rpi clusterk3s on ARM64rpi-clustertemplates/rpi-cluster.yaml
Edge onlyCloudflare tunnels + Access apps + cloudflared-aqp-greenedge-onlytemplates/edge-only.yaml
Observability onlyOTel + Prometheus + Loki + Jaeger + Phoenix + VictoriaMetricsobservability-onlytemplates/observability-only.yaml
MLOps onlyArgo Workflows + Argo Events + BentoML + KServe + Dagstermlops-onlytemplates/mlops-only.yaml

Templates are discovered by alphaswarm.terraform.templates and surfaced through:

  • GET /terraform/templates and POST /terraform/stacks/from-template/{slug} (REST).
  • alphaswarm-cli deploy templates {list,describe,apply} (CLI).
  • data.terraform.templates.list_templates and data.terraform.templates.instantiate_template (MCP, used by the agentic plane).

Every instantiation flows through TerraformRuntime so the apply lands a terraform_runs ledger row + spec snapshot per AGENTS rule 42 / 43.

Building blocks (Jinja2 codegen)

The codegen layer at alphaswarm/terraform/codegen/templates/ ships per-module-kind Jinja2 templates. The standard-template catalogue adds five composite building blocks so users can compose their own stacks against typed inputs:

Building blockRendersUsed by
cell.tf.j2One cell — namespaces + base workloads + per-cell ingress + RBACaws-cell-shared-std, aws-silo-tenant, gcp-full-cell, azure-full-cell
observability_stack.tf.j2Full OTel + Prom + Loki + Jaeger + Phoenix + VictoriaMetrics overlayobservability-only, every cell template
mesh_identity.tf.j2spire → linkerd → vault-secrets-operator → pomerium chainevery cell template
mlops_stack.tf.j2Argo Workflows + Events + BentoML + KServe + Dagstermlops-only, k3d-with-mlops
edge_stack.tf.j2cloudflared + access apps + tenant-routeredge-only, every public-facing cell template

These are referenced from TerraformStackSpec.modules[].source with the tpl:// scheme — see the IaC runbook for the operator workflow.

Maintenance

This page and the per-service files mirror the YAML at alphaswarm_platform/configs/deployment/topology.yaml. When you add a service:

  1. Append the service to topology.yaml under services:.
  2. Add a row to the matching table above (by category).
  3. Add concepts/infrastructure/services/<id>.md using the layout on every existing detail page (Identity / Wire / Deployment / Dependencies / Operations).
  4. Add 'concepts/infrastructure/services/<id>' to sidebars.ts under the Services category.
  5. If the service is reachable across cells, also append a row to URL_FALLBACK_FIELDS in alphaswarm/config/topology_fallback.py.
  6. Either invoke the alphaswarm-index-curator or drop a debt note per the always-on alphaswarm-index-reflect rule.

See also