Observability stack
Phase 2c + 2d of the AlphaSwarm infra-expansion plan stand up the AlphaSwarm-owned
observability plane in the alphaswarm-observability namespace. Everything
the cluster previously read from rpi_kubernetes/observability/ is
re-homed here.
Components
| Component | Folder | Replaces |
|---|---|---|
| kube-prometheus-stack | observability/kube-prometheus-stack/ | rpi observability/prometheus/ |
| OpenTelemetry Operator | observability/opentelemetry-operator/ | new |
| OTel Collector (gateway + agent) | observability/opentelemetry-collector-gateway/ | rpi observability/otel-collector/ |
| Phoenix | observability/phoenix/ | new |
Routing rule (gateway)
The transform/ai_route processor in
collector-gateway.yaml
inspects every span and tags it with alphaswarm.ai_trace=true when:
attributes["openinference.span.kind"] != nil, orattributes["llm.model_name"] != nil, orattributes["agent.name"] != nil.
Two trace pipelines (traces/ai, traces/infra) split on that
attribute. Tail sampling preserves error traces + 100 % of AI
traces; everything else is sampled at 1 %.
DataMCP tools
| Tool | Surface |
|---|---|
data.observability.prometheus.query | Instant PromQL. |
data.observability.prometheus.query_range | Range PromQL. |
data.observability.prometheus.list_alerts | Active alerts. |
data.observability.grafana.list_dashboards | Dashboard catalog. |
data.observability.grafana.export_dashboard | Dashboard JSON. |
data.observability.phoenix.list_projects | Phoenix projects. |
data.observability.phoenix.get_trace | LLM / agent trace. |
data.observability.phoenix.annotate_span | Write evaluator verdict. |
Frontend
- /admin/topology — Phase 0 topology overview.
- (Phase 6 follow-up)
/admin/observability/{prometheus,grafana,phoenix,otel}— domain-scoped admin pages.