Skip to main content

RL component reference

This page is a hand-written shortcut. The authoritative source is the live registry exposed by GET /rl/components/{kind} (and rendered in the UI at /rl/library).

Kinds

rl_kindPurposeBase class
rl_envGymnasium envBaseRLEnv
rl_observationState featuriserBaseObservationBuilder
rl_actionAction-space spec + transformBaseActionSpace
rl_rewardReward term / compositeBaseRewardModel, RewardTerm
rl_terminationEnd-of-episode predicateBaseTerminationCondition
rl_policyFrozen policyBasePolicy
rl_agentTrain-aware agentBaseRLAgent
rl_dataData pipelineBaseDataPipeline
rl_ensemblerMulti-member orchestratorBaseEnsembler
rl_experimentExperiment runnerBaseExperiment
rl_trajectory_storePer-step persistenceBaseTrajectoryStore

Built-in components (FinRL + AlphaSwarm)

Environments

  • StockTradingEnv — continuous portfolio (existing).
  • PortfolioAllocationEnv — softmax weights (existing).
  • StockTradingDiscreteEnv — single-stock buy/sell/hold (existing).
  • FinRLStockTradingEnv — pandas share-lots (FinRL port).
  • FinRLStockTradingNpEnv — array-backed numpy (FinRL port).
  • FinRLPortfolioCovEnv — covariance + softmax (FinRL port).
  • FinRLCryptoEnv — multi-crypto lookback stack (FinRL port).
  • OptionsTradingEnv, ExecutionEnv, MarketMakingEnv — placeholders.

Reward terms

  • PnLTerm, LogReturnTerm
  • SharpeTerm, SortinoTerm, DrawdownPenaltyTerm, VolatilityPenaltyTerm
  • TurnoverPenaltyTerm, TransactionCostTerm, SlippagePenaltyTerm
  • TurbulenceGateTerm, MarginCallTerm
  • CashIdlePenaltyTerm, BenchmarkOutperformanceTerm, RiskParityTerm
  • PotentialBasedShaping
  • CompositeReward (sum of weighted terms; emits per-term contributions to info["reward_terms"]).

Observation builders

  • PortfolioStateBuilder (cash + weights / positions)
  • TechnicalIndicatorBuilder (FinRL stockstats)
  • CovarianceBuilder (FinRL portfolio cov)
  • TurbulenceBuilder (Mahalanobis stress)
  • VIXBuilder
  • LookbackStackBuilder (FinRL crypto)
  • FundamentalBuilder (FinRobot bridge)
  • MicrostructureBuilder
  • StackedObservationBuilder (composite)

Action spaces

  • ContinuousWeightsAction, SoftmaxWeightsAction, IntegerSharesAction, DiscreteBuySellHoldAction, MultiDiscreteAction, TargetPositionAction.

Termination conditions

  • HorizonTermination, DrawdownTermination, MarginCallTermination, TurbulenceTermination.

Data pipelines

  • IcebergRLDataPipeline (default — AlphaSwarm catalog).
  • YahooFinanceRLDataPipeline (FinRL parity).
  • AlpacaRLDataPipeline (paper-trading bridge).
  • LiveStreamingRLDataPipeline (Kafka / Flink).
  • ReplayRLDataPipeline (offline RL from rl.trajectories).

Agents

  • SB3Adapter — PPO / A2C / DDPG / SAC / TD3 / DQN + sb3-contrib (RecurrentPPO / TRPO / QRDQN / MaskablePPO / ARS / TQC).
  • ElegantRLAdapter, RayRLlibAdapter, CleanRLAdapter.
  • LLMHybridAgent — FinRobot-style LLM advisor + RL backbone.
  • Existing classical / Q-family / actor-critic / evolutionary / SPM trees retained.

Ensemblers / experiments

  • WalkForwardEnsembler (FinRL DRLEnsembleAgent port).
  • BestOfNRunner, CurriculumRunner, MetaEnsembleRunner.
  • BasicRLExperiment, WalkForwardRLExperiment, RewardAblationExperiment, RLAlphaBacktestExperiment.