Skip to main content

RL policy backbones

Transformer / RNN / Autoencoder / PatchTST feature trunks for the AlphaSwarm RL policies. Registered through the RLComponent metaclass with rl_kind='rl_policy_backbone'.

Backbones

ClassSourceUse case
TransformerBackboneSelf-attention encoder over the lookback windowDefault for medium sequence (30-100 bars)
RecurrentBackboneLSTM / GRU / RNN cell (configurable)Causal, memory-efficient, anti-bidirectional default
AutoencoderBackboneMLP encoder bottleneckHigh-dim observation (1000+ features) compression
PatchTSTBackbonePatch-tokenised Transformer (Nie 2023)Long-horizon (252+ bars) — avoids token explosion

Wiring through SB3

agent:
class: SB3Adapter
module_path: alphaswarm.rl.agents.sb3_adapter
kwargs:
algorithm: PPO
policy: MlpPolicy
policy_kwargs:
features_extractor_class: alphaswarm.rl.policies.feature_extractors.BackboneFeaturesExtractor
features_extractor_kwargs:
backbone_alias: TransformerBackbone
sequence_length: 30
input_features: 32
features_dim: 128
backbone_kwargs:
n_heads: 4
n_layers: 2
d_ff: 256
dropout: 0.1

Wiring through CleanRL

The CleanRLAdapter wraps the backbone via build_backbone_from_alias:

from alphaswarm.rl.policies import build_backbone_from_alias

trunk = build_backbone_from_alias(
"RecurrentBackbone",
input_features=20,
sequence_length=30,
output_dim=128,
backbone_kwargs={"cell": "lstm", "hidden_size": 128, "num_layers": 2},
)

Shipped example specs

Four reference specs live under configs/rl/policies/:

Adding a new backbone

See the cursor rule for the canonical checklist.

See also