Saltar al contenido principal

RL FinAgent Layered Reflection Adapter (Phase 10)

Reference docs for the FinAgent multimodal LLM-hybrid agent ported into alphaswarm_rl per Zhang AAAI 24.

Five-stage cascade

#StageYAMLPurpose
1low_intelligenceconfigs/agents/finagent/low_intelligence.yamlFactual 2-3 sentence market read
2high_intelligenceconfigs/agents/finagent/high_intelligence.yamlStrategic outlook + bias
3low_reflectionconfigs/agents/finagent/low_reflection.yaml1-bar post-mortem
4high_reflectionconfigs/agents/finagent/high_reflection.yamlk-bar strategic post-mortem
5decisionconfigs/agents/finagent/decision.yamlFinal SELL/HOLD/BUY

Each stage's LLM call routes through router_complete (hard rule 2). The adapter degrades gracefully when the router is unavailable or any stage fails (defaults to HOLD).

Three tools

ToolFilePurpose
KlinePlotterToolalphaswarm/agents/tools/finagent/kline_plotter.pySummarise bars → text
TradingPlotterToolalphaswarm/agents/tools/finagent/trading_plotter.pySummarise action history → text
StrategyAgentsToolalphaswarm/agents/tools/finagent/strategy_agents_tool.pyQuery another RL agent's decision

Modules

FileClassPurpose
alphaswarm_rl/src/alphaswarm_rl/agents/llm_hybrid_layered.pyLayeredReflectionAdapter5-stage prompt cascade
alphaswarm_rl/src/alphaswarm_rl/envs/tradesim_multimodal.pyMultimodalTradingEnvFinAgent-style dict observation

Usage

from alphaswarm_rl.agents.llm_hybrid_layered import LayeredReflectionAdapter

adapter = LayeredReflectionAdapter(
llm_model="ollama/llama3",
rl_weight=0.5, # blend 50% with RL backbone
rl_agent={"class": "ppo_inhouse"},
)
adapter.build(env)
action, _ = adapter.predict(obs) # int in {0=SELL, 1=HOLD, 2=BUY}

# Between predicts, update the memory so reflection stages have something
# to critique:
adapter.update_realised_pnl(realised_short=0.01, realised_k=0.02)

Hard rule alignment

  • Hard rule 2: every LLM call routes through router_complete.
  • Hard rule 12: each stage is a separate AgentRuntime invocation (see the YAMLs' model: blocks).
  • Hard rule 19: adapter registers via RLComponent metaclass under rl_alias='finagent_layered'.

Acceptance

Phase 10 tests verify:

  • 5 stages invoke router_complete exactly once each.
  • Decision JSON parsed correctly into action int.
  • Memory updates persist between calls.
  • Cascade degrades to HOLD on LLM failure.
  • All 3 tools handle valid + empty inputs.