Skip to main content

Observability

Doc map: alphaswarm_docs/index.md · Progress bus reference: alphaswarm_docs/flows.md#cross-cutting-progress-bus.

AlphaSwarm ships with opt-in OpenTelemetry tracing covering the full request path: FastAPI → Celery → paper session → broker SDK → Postgres → Redis. Install the otel extra to enable it::

pip install -e ".[otel]"

Quick start (Docker)

docker compose up -d starts an OpenTelemetry Collector and Jaeger sidecar alongside the AlphaSwarm services. Each service is pre-wired with ALPHASWARM_OTEL_ENDPOINT=http://otel-collector:4317.

Open http://localhost:16686 and pick a service:

  • alphaswarm-api — FastAPI request handlers + Dash mount
  • alphaswarm-worker — Celery tasks (backtest, paper, ingestion)
  • alphaswarm-paper-trader — paper session loop

Configuration

All knobs live in alphaswarm.config.Settings / .env:

VariableDefaultPurpose
ALPHASWARM_OTEL_ENDPOINTemptyOTLP endpoint. Empty → tracing disabled (safe dev default).
ALPHASWARM_OTEL_SERVICE_NAMEalphaswarmBase service name. Suffixes -api, -worker, -paper added automatically.
ALPHASWARM_OTEL_SAMPLE_RATIO1.0Parent-based head sampler ratio. 0.1 = 10% of traces.
ALPHASWARM_OTEL_PROTOCOLgrpcgrpc (port 4317) or http/protobuf (port 4318).

Instrumentation map

Auto-instrumented on startup (see alphaswarm/observability/tracing.py):

  • FastAPIInstrumentor — every route becomes a span
  • CeleryInstrumentor — every task becomes a span
  • SQLAlchemyInstrumentor — every query becomes a span (attached in alphaswarm/persistence/db.py when ALPHASWARM_OTEL_ENDPOINT is set)
  • HTTPXClientInstrumentor — every HTTPX call (broker REST, UI API client)
  • RedisInstrumentor — every Redis command (pub/sub, kill-switch, Celery broker)

Manual spans are added via the @traced decorator (alphaswarm/observability/decorators.py):

from alphaswarm.observability import traced

@traced("paper.session.run")
async def run(self) -> PaperSessionResult:
...

Works transparently on sync and async callables; when otel isn't installed the tracer is a no-op so the decorator has zero overhead.

Custom exporters

The default is OTLP/gRPC. To use OTLP/HTTP instead:

ALPHASWARM_OTEL_PROTOCOL=http/protobuf
ALPHASWARM_OTEL_ENDPOINT=http://otel-collector:4318/v1/traces

For local development with just the console, install the OTel SDK and point at a local Jaeger all-in-one:

docker run --rm -p 4317:4317 -p 16686:16686 jaegertracing/all-in-one:1.55
export ALPHASWARM_OTEL_ENDPOINT=http://localhost:4317

Kubernetes

Both the API/Worker image and the paper image have the OTel SDK installed. The Kustomize manifests set ALPHASWARM_OTEL_ENDPOINT to the in-cluster collector service; port-forward Jaeger with:

kubectl -n alphaswarm-dev port-forward svc/jaeger 16686:16686

Troubleshooting

Spans never show up in Jaeger.

  • Verify ALPHASWARM_OTEL_ENDPOINT is set in the container: docker compose exec api env | grep OTEL.
  • Check the collector logs for parsing errors: docker compose logs otel-collector.
  • Drop the sample ratio to 1.0 while debugging.

ImportError: opentelemetry-exporter-otlp-proto-grpc at startup.

  • You set ALPHASWARM_OTEL_ENDPOINT but didn't install the otel extra. The tracer logs a warning and continues as a no-op, but to silence it run pip install -e ".[otel]".

Tests emit real spans.

  • They shouldn't — tests/conftest.py installs an autouse fixture that resets ALPHASWARM_OTEL_ENDPOINT="" before each test. If you see real spans, check that the fixture is still in place.

Metrics (optional)

The OTel Collector config in alphaswarm_platform/deploy/otel/otel-collector-config.yaml also exports metrics on port 8889 via the Prometheus exporter, so you can point a Prometheus scraper at the collector for JVM-style service-level dashboards. The AlphaSwarm code doesn't emit custom metrics yet — PRs welcome.

Tracing topology