RL Lab — interactive RL builder

Lives at /rl/lab in the AlphaSwarm webui. Combines six surfaces into one shell:

Tab	Purpose	Component
Experiment	Compose env + reward + observation + action + agent + ensembler into one `RLExperimentSpec`, save, train.	`ExperimentBuilder.tsx`
Environment	Drag a data pipeline + env + observation + action + reward + termination onto the canvas; save spec.	`EnvironmentBuilder.tsx`
Reward	Drag reward terms, weight them, hit "Preview reward" → server-side decomposition over a synthetic trajectory.	`RewardModelBuilder.tsx`
Observation	Drag observation builders, preview output shape + feature names.	`ObservationBuilder.tsx`
Agent	Pick framework (SB3 / ElegantRL / RLlib / CleanRL / LLM-hybrid) + algorithm + hyperparams.	`AgentBuilder.tsx`
Component library	Browse every registered RL component, filter by tag / source / category.	`RlComponentLibrary.tsx`

Routes

Path	Component
`/rl/lab`	`RlLabPage`
`/rl/library`	`RlComponentLibrary`
`/rl/builder/env`	`EnvironmentBuilder`
`/rl/builder/reward`	`RewardModelBuilder`
`/rl/builder/observation`	`ObservationBuilder`
`/rl/builder/agent`	`AgentBuilder`
`/rl/builder/experiment`	`ExperimentBuilder`
`/rl/runs`	`RlRunsPage`
`/rl/runs/[id]`	`RlRunDetailPage`
`/rl/runs/[id]/replay`	`RlReplayViewer`
`/rl`	Legacy `RlPage` (quick-train, application registry browser).
`/rl/zoo`	RL agent zoo (`/registry/agent`).

The builders all use the existing WorkflowEditor + xyflow stack with domain="rl". The serializer in webui/components/rl/serialize.ts turns a FlowGraph into an RLExperimentSpec payload by bucketising nodes via their palette group (env / observation / action / reward / termination / agent / data pipeline / ensembler).

API surface used

The lab calls the API endpoints in alphaswarm/api/routes/rl.py:

GET /rl/components — kind counts.
GET /rl/components/{kind} — list registered components per kind.
POST /rl/lab/preview-reward — reward decomposition.
POST /rl/lab/preview-observation — observation shape + features.
POST /rl/lab/preview-action — action transform sample.
POST /rl/specs — persist a spec.
POST /rl/specs/{slug}/run — kick off train / evaluate / paper / replay / walk-forward via the matching Celery task.
GET /rl/runs / GET /rl/runs/{id} / .../equity / .../trajectories / .../reward-decomposition / .../episodes / .../actions — runs ledger + step-level data served from DuckDB views over the Iceberg tables.
POST /rl/runs/{id}/replay — re-roll a saved policy on a new window.
POST /rl/data-pipelines/preview — show first rows + array shapes.

Run replay

The replay viewer (/rl/runs/[id]/replay) loads:

rl.equity_curves rows for the chosen episode (slider populates from the row count).
rl.trajectories rows for the chosen episode (each step shows reward + info JSON).

Both come from the DuckDB views generated by alphaswarm/rl/trajectories/duckdb_views.py.

Routes​

API surface used​

Run replay​

Routes

API surface used

Run replay