Skip to main content

otel-collector

The single OTLP ingress for the cluster. Every workload pod sends traces, metrics, and logs to this gateway; the gateway fans out by signal type to the appropriate backend.

Identity

FieldValue
Service idotel-collector
Roleobservability
Imageotel/opentelemetry-collector (gateway flavour) — pinned in observability/opentelemetry-collector-gateway/
Port4317 (OTLP gRPC) + 4318 (OTLP HTTP)
Health:13133/ (extensions health_check)

Deployment surfaces

SurfaceWhere
Composeservice otel-collector in alphaswarm_platform/compose/docker-compose.yml
Kustomizeobservability/opentelemetry-collector-gateway/ — gateway Deployment + DaemonSet agent (canonical)
Operatorobservability/opentelemetry-operator/ — auto-instrumentation CRDs
Legacyobservability/otel-collector/ — rollback only; NOT wired to overlays

Routing

SignalDestination
traces.infrastructureJaeger (in-cell) / Tempo (cloud cells)
traces.ai (OpenInference spans)phoenix
metricsVictoriaMetrics + Prometheus (parallel during cutover)
logsLoki (via Vector)

The split happens via OTel routing connector — spans tagged with service.namespace=alphaswarm.ai route to Phoenix; everything else goes to the infra trace pipeline.

Dependencies

Upstream: every alphaswarm workload pod (auto-instrumentation through the OTel operator + manual SDK init in alphaswarm/observability/).

Downstream: Jaeger, Phoenix, Prometheus, VictoriaMetrics, Loki.

Operations

  • Sampling: tail-based for traces — keep 100% of error spans, 5% of healthy traffic. Tuned per cell.
  • Resource tagging: every span carries tenant_id, cell_id, service.id (matching topology), and experiment_id / test_id when set.
  • Auto-instrumentation: Python via opentelemetry-distro; Node via the OTel operator's auto-injected sidecar; Go services use manual SDK.

See also