Service-level view
This page catalogues every service AlphaSwarm runs — the application
workloads, the control plane, the data layer, the observability stack,
and the external edge surface — at a single level of detail. It pairs
control-plane-topology.md (which says
how services are discovered) and
terraform-control-plane.md (which says
how they are provisioned) with a what is each service reference.
The single source of truth for the service registry is
alphaswarm_platform/configs/deployment/topology.yaml.
This page is generated against that file plus each service's matching
package contract. When a row drifts, the truth is the YAML.
Reading the catalogue
Every service has its own detail page under
services/ with the same layout:
- Identity — id, role, label, package or upstream image.
- Wire — protocol, port, health endpoint, public URL (if any).
- Deployment — which compose / kustomize / AQP CR / Terraform template stands it up.
- Dependencies — upstream services it calls, downstream services that call it.
- Operations — runbooks, scaling notes, redaction posture, feature flags.
Detail pages link back to the canonical concept doc that owns each contract — they do not duplicate prose.
How services compose
┌─ alphaswarm-website ──────────┐ public marketing
│ (Cloudflare Pages, no auth) │
└───────────────────────────────┘
│
▼ NEXT_PUBLIC_ALPHASWARM_APP_URL
B2C / B2B users ─▶ alphaswarm-ui ──┐
Internal staff ─▶ alphaswarm-admin ┼──▶ alphaswarm-cp ──▶ /manage/* control plane
Local power user ─▶ alphaswarm-client┤ ──▶ /auth/* identity broker
Operators (CLI) ─▶ alphaswarm-cli ┤ ──▶ /proxy/* connection mesh (Phase 5)
│
▼ HTTP
alphaswarm-core (FastAPI)
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
alphaswarm-worker alphaswarm-executor alphaswarm-beat alphaswarm-ml-mcp
(light queues) (heavy compute) (scheduler) (DataMCP /mcp/ml)
Data plane: postgres ─ redis ─ neo4j ─ chromadb ─ minio ─ iceberg(Polaris)
Streaming: kafka(Strimzi) | redpanda ─ schema-registry ─ flink ─ redpanda-connect
ML / orch: mlflow ─ argo-workflows ─ argo-events ─ bentoml ─ kserve ─ dagster ─ ragflow
Observability: otel-collector ─ prometheus ─ grafana ─ jaeger ─ loki ─ vector ─ victoriametrics ─ phoenix
Mesh ID: spire (issuer) ─▶ linkerd (mTLS) ─▶ vault-secrets-operator ─▶ pomerium (IAP)
Edge: cloudflared (alpha-swarm.ai) | cloudflared-aqp-green | alphaswarm-edge | tenant-router
Sandbox: agent-sandbox/gvisor ─▶ agent-sandbox/pool
Operators: aqp-controller-operator (8 AQP* CRDs) ─ bots-operator (4 QuantBot CRDs)
External: alphaswarm-docs (Cloudflare Pages) ─ alphaswarm-docs-status (Instatus) ─ alphaswarm-docs-archive
Identity flows from spire through linkerd through
vault-secrets-operator to every workload pod; secrets land via
ExternalSecret resources, never in values.yaml. The
pomerium IAP wraps the bare /manage/* ingress.
Application services
Services that run AlphaSwarm code. Each is built from a Dockerfile in this workspace and is owned by the package that supplies its image.
| Service id | Role | Pkg | Image (key) | Port | Health | Public URL | Deployed via |
|---|---|---|---|---|---|---|---|
alphaswarm-core | api | alphaswarm | api | 8000 | /readyz | — (private) | base/alphaswarm-core, AQPMonolith CR, compose api |
alphaswarm-worker | worker | alphaswarm | worker | — | (none) | — | base/alphaswarm-worker, AQPMonolith CR, compose worker |
alphaswarm-executor | executor | alphaswarm | executor | — | (none) | — | base/alphaswarm-executor, compose alphaswarm-executor/worker-gpu |
alphaswarm-beat | scheduler | alphaswarm | beat | — | (none) | — | base/alphaswarm-worker, AQPMonolith CR, compose beat |
alphaswarm-cp | control-plane | alphaswarm_controller | cp | 9000 | /manage/readyz | https://manage.alpha-swarm.ai | base/alphaswarm-cp, compose alphaswarm-cp |
alphaswarm-client | frontend | alphaswarm_client | frontend | 80 | / | — (private) | base/alphaswarm-client, AQPClient CR, compose client |
alphaswarm-ui | frontend | alphaswarm_ui | ui | 80 | /api/healthz | https://app.alpha-swarm.ai | (Vercel/Pages) AQPUI CR |
alphaswarm-admin | admin | alphaswarm_admin | admin | 8900 | /admin/healthz | https://admin.alpha-swarm.ai | AQPAdmin CR, compose alphaswarm-admin |
alphaswarm-ide | ide | alphaswarm_ide | ide | 3000 | / | (per-user) | alphaswarm-ide kustomize, AQPIDE CR |
alphaswarm-ml-mcp | mcp | alphaswarm_models | (pigg. on api) | 8000 | /mcp/ml/tools | — | base/alphaswarm-core (extra route) |
Data layer
Stateful services owned by the platform — the AlphaSwarm runtime is a client of every row below.
| Service id | Role | Image | Port | Storage | Deployed via |
|---|---|---|---|---|---|
postgres | database | pgvector/pgvector:pg16 | 5432 | 5 Gi (StatefulSet) | base-services/postgres-shared |
redis | cache | redis:7-alpine (master) / redis-stack:7.4 (local) | 6379 | 2 Gi | base/redis-master, base-services/redis-shared |
neo4j | graph | neo4j:5-community | 7474, 7687 | 5 Gi | base-services (cell-local), compose neo4j |
chromadb | vector | chromadb/chroma:1.0.16 | 8000 / 8001 | (ephemeral) | base-services/chromadb, compose chromadb |
mlflow | mlops | ghcr.io/mlflow/mlflow:v2.11.1 | 5000 | object store | base-services/mlflow, compose mlflow |
Object storage and the Iceberg catalog (MinIO + Polaris) live
under the streaming/lakehouse umbrella; they are documented under
base-services/minio and base-services/polaris in
deployment patterns by category.
Observability
Routed by otel-collector-gateway; metrics in VictoriaMetrics + Prometheus
(parallel during cutover), logs in Loki, traces in Jaeger, and the AI / LLM
slice in Phoenix.
| Service id | Role | Image | Port | Deployed via |
|---|---|---|---|---|
otel-collector | observability | otel/opentelemetry-collector | 4317 | observability/opentelemetry-collector-gateway |
prometheus | metrics | prom/prometheus (kube-prometheus-stack) | 9090 | observability/kube-prometheus-stack |
grafana | dashboards | grafana/grafana | 3000 | observability/kube-prometheus-stack |
jaeger | tracing | jaegertracing/all-in-one | 6831 / 16686 | observability/jaeger |
loki | logs | grafana/loki:3.3.2 | 3100 | observability/loki |
vector | log shipper | timberio/vector:0.43.0 | — | observability/vector |
victoriametrics | metrics | victoriametrics/victoria-metrics:v1.108.0 | 8428 | observability/victoriametrics |
Phoenix + the OTel operator are documented inline on
otel-collector since they are part of the
same telemetry pipeline.
External services
Hosted off-cluster — included here because the topology references them and operators need to know who runs them.
| Service id | Role | Hosted on | Public URL | Deployed via |
|---|---|---|---|---|
alphaswarm-docs | docs | Cloudflare Pages | https://docs.alpha-swarm.ai | Terraform module cloudflare_pages_docs |
alphaswarm-website | marketing | Cloudflare Pages | https://alpha-swarm.ai | Terraform module cloudflare_pages_docs (forthcoming) |
alphaswarm-docs-status | status page | Instatus SaaS | https://status.alpha-swarm.ai | Terraform module instatus |
alphaswarm-docs-archive | archive | Cloudflare Pages | https://archive.alpha-swarm.ai | Terraform module cloudflare_pages_docs |
Deployment patterns
Every service above is deployable through one or more of the surfaces
below. The
deployment-templates catalogue
maps each named pattern to a hash-locked
TerraformStackSpec.
| Pattern | What it stands up | Template slug | Source |
|---|---|---|---|
| Local dev | k3d cluster + base + minimal observability | local-dev | templates/local-dev.yaml |
| k3d + MLOps | local-dev + Argo Workflows + Dagster + MLflow | k3d-with-mlops | templates/k3d-with-mlops.yaml |
| AWS minimum | Single-account ECS + Cognito + ALB + Bedrock Haiku | aws-minimum | templates/aws-minimum.yaml |
| AWS shared cell | EKS + base + base-services + observability + edge for one shared standard cell | aws-cell-shared-std | templates/aws-cell-shared-std.yaml |
| AWS shared cell (premium) | shared-std + dedicated node group + reserved capacity | aws-cell-shared-premium | templates/aws-cell-shared-premium.yaml |
| AWS silo tenant | Single-tenant cell with hard isolation | aws-silo-tenant | templates/aws-silo-tenant.yaml |
| GCP cell | GKE + Workload Identity + base + base-services | gcp-full-cell | templates/gcp-full-cell.yaml |
| Azure cell | AKS + Workload Identity + Entra-bound base | azure-full-cell | templates/azure-full-cell.yaml |
| rpi cluster | k3s on ARM64 | rpi-cluster | templates/rpi-cluster.yaml |
| Edge only | Cloudflare tunnels + Access apps + cloudflared-aqp-green | edge-only | templates/edge-only.yaml |
| Observability only | OTel + Prometheus + Loki + Jaeger + Phoenix + VictoriaMetrics | observability-only | templates/observability-only.yaml |
| MLOps only | Argo Workflows + Argo Events + BentoML + KServe + Dagster | mlops-only | templates/mlops-only.yaml |
Templates are discovered by
alphaswarm.terraform.templates
and surfaced through:
GET /terraform/templatesandPOST /terraform/stacks/from-template/{slug}(REST).alphaswarm-cli deploy templates {list,describe,apply}(CLI).data.terraform.templates.list_templatesanddata.terraform.templates.instantiate_template(MCP, used by the agentic plane).
Every instantiation flows through TerraformRuntime so the apply
lands a terraform_runs ledger row + spec snapshot per AGENTS rule
42 / 43.
Building blocks (Jinja2 codegen)
The codegen layer at
alphaswarm/terraform/codegen/templates/
ships per-module-kind Jinja2 templates. The standard-template catalogue
adds five composite building blocks so users can compose their own
stacks against typed inputs:
| Building block | Renders | Used by |
|---|---|---|
cell.tf.j2 | One cell — namespaces + base workloads + per-cell ingress + RBAC | aws-cell-shared-std, aws-silo-tenant, gcp-full-cell, azure-full-cell |
observability_stack.tf.j2 | Full OTel + Prom + Loki + Jaeger + Phoenix + VictoriaMetrics overlay | observability-only, every cell template |
mesh_identity.tf.j2 | spire → linkerd → vault-secrets-operator → pomerium chain | every cell template |
mlops_stack.tf.j2 | Argo Workflows + Events + BentoML + KServe + Dagster | mlops-only, k3d-with-mlops |
edge_stack.tf.j2 | cloudflared + access apps + tenant-router | edge-only, every public-facing cell template |
These are referenced from TerraformStackSpec.modules[].source with
the tpl:// scheme — see
the IaC runbook for the
operator workflow.
Maintenance
This page and the per-service files mirror the YAML at
alphaswarm_platform/configs/deployment/topology.yaml.
When you add a service:
- Append the service to
topology.yamlunderservices:. - Add a row to the matching table above (by category).
- Add
concepts/infrastructure/services/<id>.mdusing the layout on every existing detail page (Identity / Wire / Deployment / Dependencies / Operations). - Add
'concepts/infrastructure/services/<id>'tosidebars.tsunder the Services category. - If the service is reachable across cells, also append a row to
URL_FALLBACK_FIELDSinalphaswarm/config/topology_fallback.py. - Either invoke the
alphaswarm-index-curatoror drop a debt note per the always-onalphaswarm-index-reflectrule.
See also
control-plane-topology.md— discovery contract +URL_FALLBACK_FIELDSsemantics.terraform-control-plane.md—TerraformRuntimelifecycle + spec hash-locking.iac-runbook.md— quick reference for plan / apply / destroy + shipping a standard template.how-to/operations/local-setup.md— bring the stack up locally.how-to/operations/kubernetes-deploy.md— end-to-end Kubernetes walkthrough.