Saltar al contenido principal

RL Lab — interactive RL builder

Lives at /rl/lab in the AlphaSwarm webui. Combines six surfaces into one shell:

TabPurposeComponent
ExperimentCompose env + reward + observation + action + agent + ensembler into one RLExperimentSpec, save, train.ExperimentBuilder.tsx
EnvironmentDrag a data pipeline + env + observation + action + reward + termination onto the canvas; save spec.EnvironmentBuilder.tsx
RewardDrag reward terms, weight them, hit "Preview reward" → server-side decomposition over a synthetic trajectory.RewardModelBuilder.tsx
ObservationDrag observation builders, preview output shape + feature names.ObservationBuilder.tsx
AgentPick framework (SB3 / ElegantRL / RLlib / CleanRL / LLM-hybrid) + algorithm + hyperparams.AgentBuilder.tsx
Component libraryBrowse every registered RL component, filter by tag / source / category.RlComponentLibrary.tsx

Routes

PathComponent
/rl/labRlLabPage
/rl/libraryRlComponentLibrary
/rl/builder/envEnvironmentBuilder
/rl/builder/rewardRewardModelBuilder
/rl/builder/observationObservationBuilder
/rl/builder/agentAgentBuilder
/rl/builder/experimentExperimentBuilder
/rl/runsRlRunsPage
/rl/runs/[id]RlRunDetailPage
/rl/runs/[id]/replayRlReplayViewer
/rlLegacy RlPage (quick-train, application registry browser).
/rl/zooRL agent zoo (/registry/agent).

The builders all use the existing WorkflowEditor + xyflow stack with domain="rl". The serializer in webui/components/rl/serialize.ts turns a FlowGraph into an RLExperimentSpec payload by bucketising nodes via their palette group (env / observation / action / reward / termination / agent / data pipeline / ensembler).

API surface used

The lab calls the API endpoints in alphaswarm/api/routes/rl.py:

  • GET /rl/components — kind counts.
  • GET /rl/components/{kind} — list registered components per kind.
  • POST /rl/lab/preview-reward — reward decomposition.
  • POST /rl/lab/preview-observation — observation shape + features.
  • POST /rl/lab/preview-action — action transform sample.
  • POST /rl/specs — persist a spec.
  • POST /rl/specs/{slug}/run — kick off train / evaluate / paper / replay / walk-forward via the matching Celery task.
  • GET /rl/runs / GET /rl/runs/{id} / .../equity / .../trajectories / .../reward-decomposition / .../episodes / .../actions — runs ledger + step-level data served from DuckDB views over the Iceberg tables.
  • POST /rl/runs/{id}/replay — re-roll a saved policy on a new window.
  • POST /rl/data-pipelines/preview — show first rows + array shapes.

Run replay

The replay viewer (/rl/runs/[id]/replay) loads:

  • rl.equity_curves rows for the chosen episode (slider populates from the row count).
  • rl.trajectories rows for the chosen episode (each step shows reward + info JSON).

Both come from the DuckDB views generated by alphaswarm/rl/trajectories/duckdb_views.py.