RL Lab — interactive RL builder
Lives at /rl/lab in the AlphaSwarm webui. Combines six surfaces into one
shell:
| Tab | Purpose | Component |
|---|---|---|
| Experiment | Compose env + reward + observation + action + agent + ensembler into one RLExperimentSpec, save, train. | ExperimentBuilder.tsx |
| Environment | Drag a data pipeline + env + observation + action + reward + termination onto the canvas; save spec. | EnvironmentBuilder.tsx |
| Reward | Drag reward terms, weight them, hit "Preview reward" → server-side decomposition over a synthetic trajectory. | RewardModelBuilder.tsx |
| Observation | Drag observation builders, preview output shape + feature names. | ObservationBuilder.tsx |
| Agent | Pick framework (SB3 / ElegantRL / RLlib / CleanRL / LLM-hybrid) + algorithm + hyperparams. | AgentBuilder.tsx |
| Component library | Browse every registered RL component, filter by tag / source / category. | RlComponentLibrary.tsx |
Routes
| Path | Component |
|---|---|
/rl/lab | RlLabPage |
/rl/library | RlComponentLibrary |
/rl/builder/env | EnvironmentBuilder |
/rl/builder/reward | RewardModelBuilder |
/rl/builder/observation | ObservationBuilder |
/rl/builder/agent | AgentBuilder |
/rl/builder/experiment | ExperimentBuilder |
/rl/runs | RlRunsPage |
/rl/runs/[id] | RlRunDetailPage |
/rl/runs/[id]/replay | RlReplayViewer |
/rl | Legacy RlPage (quick-train, application registry browser). |
/rl/zoo | RL agent zoo (/registry/agent). |
The builders all use the existing
WorkflowEditor +
xyflow stack with domain="rl". The serializer in
webui/components/rl/serialize.ts
turns a FlowGraph into an RLExperimentSpec payload by bucketising
nodes via their palette group (env / observation / action / reward /
termination / agent / data pipeline / ensembler).
API surface used
The lab calls the API endpoints in
alphaswarm/api/routes/rl.py:
GET /rl/components— kind counts.GET /rl/components/{kind}— list registered components per kind.POST /rl/lab/preview-reward— reward decomposition.POST /rl/lab/preview-observation— observation shape + features.POST /rl/lab/preview-action— action transform sample.POST /rl/specs— persist a spec.POST /rl/specs/{slug}/run— kick off train / evaluate / paper / replay / walk-forward via the matching Celery task.GET /rl/runs/GET /rl/runs/{id}/.../equity/.../trajectories/.../reward-decomposition/.../episodes/.../actions— runs ledger + step-level data served from DuckDB views over the Iceberg tables.POST /rl/runs/{id}/replay— re-roll a saved policy on a new window.POST /rl/data-pipelines/preview— show first rows + array shapes.
Run replay
The replay viewer (/rl/runs/[id]/replay) loads:
rl.equity_curvesrows for the chosen episode (slider populates from the row count).rl.trajectoriesrows for the chosen episode (each step shows reward + info JSON).
Both come from the DuckDB views generated by
alphaswarm/rl/trajectories/duckdb_views.py.