Skip to main content

Observability stack

Phase 2c + 2d of the AlphaSwarm infra-expansion plan stand up the AlphaSwarm-owned observability plane in the alphaswarm-observability namespace. Everything the cluster previously read from rpi_kubernetes/observability/ is re-homed here.

Components

ComponentFolderReplaces
kube-prometheus-stackobservability/kube-prometheus-stack/rpi observability/prometheus/
OpenTelemetry Operatorobservability/opentelemetry-operator/new
OTel Collector (gateway + agent)observability/opentelemetry-collector-gateway/rpi observability/otel-collector/
Phoenixobservability/phoenix/new

Routing rule (gateway)

The transform/ai_route processor in collector-gateway.yaml inspects every span and tags it with alphaswarm.ai_trace=true when:

  • attributes["openinference.span.kind"] != nil, or
  • attributes["llm.model_name"] != nil, or
  • attributes["agent.name"] != nil.

Two trace pipelines (traces/ai, traces/infra) split on that attribute. Tail sampling preserves error traces + 100 % of AI traces; everything else is sampled at 1 %.

DataMCP tools

ToolSurface
data.observability.prometheus.queryInstant PromQL.
data.observability.prometheus.query_rangeRange PromQL.
data.observability.prometheus.list_alertsActive alerts.
data.observability.grafana.list_dashboardsDashboard catalog.
data.observability.grafana.export_dashboardDashboard JSON.
data.observability.phoenix.list_projectsPhoenix projects.
data.observability.phoenix.get_traceLLM / agent trace.
data.observability.phoenix.annotate_spanWrite evaluator verdict.

Frontend

  • /admin/topology — Phase 0 topology overview.
  • (Phase 6 follow-up) /admin/observability/{prometheus,grafana,phoenix,otel} — domain-scoped admin pages.