Skip to main content

RL PRUDEX-Compass Evaluation (Phase 9)

Reference docs for the PRUDEX-Compass evaluation framework ported from TradeMaster into alphaswarm_rl.

Six axes, 17 measures

AxisCodeMeasures
ProfitabilityPtotal_return, annualised_return, cagr
Risk-controlRvolatility, max_drawdown, sortino, calmar
UniversalityUcross_dataset_sharpe_mean, cross_dataset_sharpe_std
DiversificationDportfolio_weight_entropy, turnover
ExplainabilityEregime_conditioned_sharpe
X-tra evaluationXperformance_profile_auc, rank_score, extreme_market_score, hit_rate

Plus a sharpe_ratio convenience field. 17 measures total.

Five visualisations

HelperPurpose
pride_star_chart8-axis radar of per-agent scores
prudex_compass_chart6-axis octagon (one axis per PRUDEX axis)
performance_profile_chartCDF of per-step returns across agents
rank_distribution_chartHeatmap of per-metric ranks
extreme_market_chartBar chart of extreme-market cumulative returns

All helpers gracefully degrade to a dict fallback when matplotlib is unavailable.

Modules

FileClassPurpose
alphaswarm_rl/src/alphaswarm_rl/evaluation/prudex_compass.pyPrudexMetrics, PrudexReport, compute_prudex_metricsPer-agent metric computation
alphaswarm_rl/src/alphaswarm_rl/evaluation/visualizations.py5 chart helpersPlot rendering
alphaswarm_rl/src/alphaswarm_rl/experiments/prudex_evaluation.pyPrudexEvaluationExperiment aggregator

Usage

from alphaswarm_rl.experiments.prudex_evaluation import PrudexEvaluation
from alphaswarm_rl.evaluation.visualizations import (
prudex_compass_chart, pride_star_chart, performance_profile_chart,
)

exp = PrudexEvaluation(periods_per_year=252)
report = exp.run(
agent_results={
"eiie": {"equity_curve": eq_eiie, "weights_history": w_eiie},
"deeptrader": {"equity_curve": eq_dt, "weights_history": w_dt},
"ppo": {"equity_curve": eq_ppo, "weights_history": w_ppo},
},
)
# Visualise:
fig = prudex_compass_chart(report)

Hard rule alignment

  • Hard rule 19: PrudexEvaluation registers via RLComponent metaclass under rl_alias='prudex_compass'.
  • Hard rule 18: report lands in rl_runs.result_summary via the parent RLRuntime; no direct Iceberg writes from this experiment.

Acceptance

Phase 9 tests verify:

  • All 17 measures compute without error on synthetic equity series.
  • Per-axis breakdown has exactly 6 axes (P/R/U/D/E/X).
  • 5 visualisation helpers return a Figure (matplotlib) or dict fallback.
  • Rank matrix is in [1, N_agents] per metric.