Skip to main content

RL Iceberg data plane

Per-step RL records persist to four Iceberg tables in the namespace controlled by ALPHASWARM_RL_TRAJECTORY_NAMESPACE (default rl). Writes flow through alphaswarm/rl/trajectories/iceberg_writer.py::IcebergTrajectoryStoreiceberg_catalog.append_arrow.

Tables

TableColumnsWritten when
rl.trajectoriesrun_id, episode, step, ts, reward, info (JSON)Every env step
rl.equity_curvesrun_id, episode, step, ts, portfolio_value, drawdown, cashEvery env step that exposes info["portfolio_value"]
rl.action_logsrun_id, episode, step, ts, asset_idx, action_valueEvery env step (one row per action component)
rl.reward_decompositionrun_id, episode, step, ts, term_name, contributionWhen the reward model exposes info["reward_terms"] (any CompositeReward)

Settings

VariableDefaultPurpose
ALPHASWARM_RL_TRAJECTORY_NAMESPACErlIceberg namespace
ALPHASWARM_RL_TRAJECTORY_TABLEtrajectoriesPer-step trajectory table name
ALPHASWARM_RL_EQUITY_TABLEequity_curvesEquity-curve table name
ALPHASWARM_RL_ACTION_LOG_TABLEaction_logsAction-log table name
ALPHASWARM_RL_REWARD_DECOMP_TABLEreward_decompositionReward-decomposition table name
ALPHASWARM_RL_PERSIST_TRAJECTORIEStrueWhen false, the runtime uses an in-memory store (CI / local).
ALPHASWARM_RL_TRAJECTORY_FLUSH_ROWS1000Rows per buffer before partial flush.
ALPHASWARM_RL_REQUIRE_ICEBERGfalseMake Iceberg write failures hard-fail.

DuckDB views

alphaswarm/rl/trajectories/duckdb_views.py exposes two helpers:

  • ensure_duckdb_views(connection) — registers rl_trajectories / rl_equity_curves / rl_action_logs / rl_reward_decomposition Arrow-backed views.
  • register_run_views(run_id, connection) — adds run-filtered views named rl_<kind>_run_<short_id>.

The API uses these views to serve the /rl/runs/{id}/equity / /trajectories / /reward-decomposition / /actions endpoints without touching PyIceberg directly.

Postgres ledger

The Postgres tables in alphaswarm/persistence/models_rl.py hold the metadata layer that points at these Iceberg row ranges:

  • rl_experiment_specs / rl_experiment_versions — hash-locked spec snapshots.
  • rl_runs — one row per RLRuntime invocation.
  • rl_evaluations — rollout summary.
  • rl_trajectory_refs / rl_equity_curve_refs — pointers to the Iceberg row ranges per episode.
  • rl_component_registrations — DB mirror of the in-memory RL component registry (so /rl/components is fast).