Observability
Doc map: alphaswarm_docs/index.md · Progress bus reference: alphaswarm_docs/flows.md#cross-cutting-progress-bus.
AlphaSwarm ships with opt-in OpenTelemetry tracing covering the full request
path: FastAPI → Celery → paper session → broker SDK → Postgres →
Redis. Install the otel extra to enable it::
pip install -e ".[otel]"
Quick start (Docker)
docker compose up -d starts an OpenTelemetry Collector and Jaeger
sidecar alongside the AlphaSwarm services. Each service is pre-wired with
ALPHASWARM_OTEL_ENDPOINT=http://otel-collector:4317.
Open http://localhost:16686 and pick a service:
alphaswarm-api— FastAPI request handlers + Dash mountalphaswarm-worker— Celery tasks (backtest, paper, ingestion)alphaswarm-paper-trader— paper session loop
Configuration
All knobs live in alphaswarm.config.Settings / .env:
| Variable | Default | Purpose |
|---|---|---|
ALPHASWARM_OTEL_ENDPOINT | empty | OTLP endpoint. Empty → tracing disabled (safe dev default). |
ALPHASWARM_OTEL_SERVICE_NAME | alphaswarm | Base service name. Suffixes -api, -worker, -paper added automatically. |
ALPHASWARM_OTEL_SAMPLE_RATIO | 1.0 | Parent-based head sampler ratio. 0.1 = 10% of traces. |
ALPHASWARM_OTEL_PROTOCOL | grpc | grpc (port 4317) or http/protobuf (port 4318). |
Instrumentation map
Auto-instrumented on startup (see alphaswarm/observability/tracing.py):
FastAPIInstrumentor— every route becomes a spanCeleryInstrumentor— every task becomes a spanSQLAlchemyInstrumentor— every query becomes a span (attached inalphaswarm/persistence/db.pywhenALPHASWARM_OTEL_ENDPOINTis set)HTTPXClientInstrumentor— every HTTPX call (broker REST, UI API client)RedisInstrumentor— every Redis command (pub/sub, kill-switch, Celery broker)
Manual spans are added via the @traced decorator
(alphaswarm/observability/decorators.py):
from alphaswarm.observability import traced
@traced("paper.session.run")
async def run(self) -> PaperSessionResult:
...
Works transparently on sync and async callables; when otel isn't
installed the tracer is a no-op so the decorator has zero overhead.
Custom exporters
The default is OTLP/gRPC. To use OTLP/HTTP instead:
ALPHASWARM_OTEL_PROTOCOL=http/protobuf
ALPHASWARM_OTEL_ENDPOINT=http://otel-collector:4318/v1/traces
For local development with just the console, install the OTel SDK and point at a local Jaeger all-in-one:
docker run --rm -p 4317:4317 -p 16686:16686 jaegertracing/all-in-one:1.55
export ALPHASWARM_OTEL_ENDPOINT=http://localhost:4317
Kubernetes
Both the API/Worker image and the paper image have the OTel SDK
installed. The Kustomize manifests set ALPHASWARM_OTEL_ENDPOINT to the
in-cluster collector service; port-forward Jaeger with:
kubectl -n alphaswarm-dev port-forward svc/jaeger 16686:16686
Troubleshooting
Spans never show up in Jaeger.
- Verify
ALPHASWARM_OTEL_ENDPOINTis set in the container:docker compose exec api env | grep OTEL. - Check the collector logs for parsing errors:
docker compose logs otel-collector. - Drop the sample ratio to
1.0while debugging.
ImportError: opentelemetry-exporter-otlp-proto-grpc at startup.
- You set
ALPHASWARM_OTEL_ENDPOINTbut didn't install theotelextra. The tracer logs a warning and continues as a no-op, but to silence it runpip install -e ".[otel]".
Tests emit real spans.
- They shouldn't —
tests/conftest.pyinstalls anautousefixture that resetsALPHASWARM_OTEL_ENDPOINT=""before each test. If you see real spans, check that the fixture is still in place.
Metrics (optional)
The OTel Collector config in alphaswarm_platform/deploy/otel/otel-collector-config.yaml
also exports metrics on port 8889 via the Prometheus exporter, so you
can point a Prometheus scraper at the collector for JVM-style
service-level dashboards. The AlphaSwarm code doesn't emit custom metrics
yet — PRs welcome.