RL component reference
This page is a hand-written shortcut. The authoritative source is the live registry exposed by
GET /rl/components/{kind}(and rendered in the UI at/rl/library).
Kinds
rl_kind | Purpose | Base class |
|---|---|---|
rl_env | Gymnasium env | BaseRLEnv |
rl_observation | State featuriser | BaseObservationBuilder |
rl_action | Action-space spec + transform | BaseActionSpace |
rl_reward | Reward term / composite | BaseRewardModel, RewardTerm |
rl_termination | End-of-episode predicate | BaseTerminationCondition |
rl_policy | Frozen policy | BasePolicy |
rl_agent | Train-aware agent | BaseRLAgent |
rl_data | Data pipeline | BaseDataPipeline |
rl_ensembler | Multi-member orchestrator | BaseEnsembler |
rl_experiment | Experiment runner | BaseExperiment |
rl_trajectory_store | Per-step persistence | BaseTrajectoryStore |
Built-in components (FinRL + AlphaSwarm)
Environments
StockTradingEnv— continuous portfolio (existing).PortfolioAllocationEnv— softmax weights (existing).StockTradingDiscreteEnv— single-stock buy/sell/hold (existing).FinRLStockTradingEnv— pandas share-lots (FinRL port).FinRLStockTradingNpEnv— array-backed numpy (FinRL port).FinRLPortfolioCovEnv— covariance + softmax (FinRL port).FinRLCryptoEnv— multi-crypto lookback stack (FinRL port).OptionsTradingEnv,ExecutionEnv,MarketMakingEnv— placeholders.
Reward terms
PnLTerm,LogReturnTermSharpeTerm,SortinoTerm,DrawdownPenaltyTerm,VolatilityPenaltyTermTurnoverPenaltyTerm,TransactionCostTerm,SlippagePenaltyTermTurbulenceGateTerm,MarginCallTermCashIdlePenaltyTerm,BenchmarkOutperformanceTerm,RiskParityTermPotentialBasedShapingCompositeReward(sum of weighted terms; emits per-term contributions toinfo["reward_terms"]).
Observation builders
PortfolioStateBuilder(cash + weights / positions)TechnicalIndicatorBuilder(FinRL stockstats)CovarianceBuilder(FinRL portfolio cov)TurbulenceBuilder(Mahalanobis stress)VIXBuilderLookbackStackBuilder(FinRL crypto)FundamentalBuilder(FinRobot bridge)MicrostructureBuilderStackedObservationBuilder(composite)
Action spaces
ContinuousWeightsAction,SoftmaxWeightsAction,IntegerSharesAction,DiscreteBuySellHoldAction,MultiDiscreteAction,TargetPositionAction.
Termination conditions
HorizonTermination,DrawdownTermination,MarginCallTermination,TurbulenceTermination.
Data pipelines
IcebergRLDataPipeline(default — AlphaSwarm catalog).YahooFinanceRLDataPipeline(FinRL parity).AlpacaRLDataPipeline(paper-trading bridge).LiveStreamingRLDataPipeline(Kafka / Flink).ReplayRLDataPipeline(offline RL fromrl.trajectories).
Agents
SB3Adapter— PPO / A2C / DDPG / SAC / TD3 / DQN + sb3-contrib (RecurrentPPO / TRPO / QRDQN / MaskablePPO / ARS / TQC).ElegantRLAdapter,RayRLlibAdapter,CleanRLAdapter.LLMHybridAgent— FinRobot-style LLM advisor + RL backbone.- Existing classical / Q-family / actor-critic / evolutionary / SPM trees retained.
Ensemblers / experiments
WalkForwardEnsembler(FinRLDRLEnsembleAgentport).BestOfNRunner,CurriculumRunner,MetaEnsembleRunner.BasicRLExperiment,WalkForwardRLExperiment,RewardAblationExperiment,RLAlphaBacktestExperiment.