Optimal-control / HJB math layer

Audience: quants extending AlphaSwarm with optimal-execution or market-making models, plus AI agents that need to reason about the closed-form solvers.

The optimal-control package — alphaswarm/optimal_control/ — hosts the JAX-compiled implementations of two canonical Hamilton-Jacobi-Bellman problems:

Avellaneda-Stoikov 2008 market making — alphaswarm/optimal_control/avellaneda_stoikov.py.
Cartea-Jaimungal-Penalva 2015 inventory-penalised optimal liquidation — alphaswarm/optimal_control/cartea_jaimungal.py.

The convenience layer alphaswarm/optimal_control/hjb_solver.py exposes solve_avst / solve_cj / value_function_to_arrow so the analysis-flow runner can dispatch them uniformly and persist the results to alphaswarm_gold_analysis_optimal_control per AGENTS rule 21.

Where to invoke

Three call sites cover almost every use case.

1. Direct Python API

from alphaswarm.optimal_control import compute_optimal_quotes, solve_avst

# Single-point AvSt quotes — pure JIT-compiled JAX path.
res = compute_optimal_quotes(
    mid_price=100.0,
    inventory=10.0,
    gamma=0.1,
    sigma=0.02,
    k=1.5,
    T_minus_t=1.0,
)
print(res.bid, res.ask, res.half_spread)

# Inventory grid via vmap.
out = solve_avst(
    mid_price=100.0,
    inventory_grid=[-50, -25, 0, 25, 50],
    gamma=0.1, sigma=0.02, k=1.5, T_minus_t=1.0,
)

2. Analysis flows (preferred — gives you UI form + Iceberg persistence)

from alphaswarm.analysis import run_flow

result = run_flow(
    "optimal_control.avellaneda_stoikov_quotes",
    None,
    {
        "mid_price": 100.0,
        "inventory_min": -50.0,
        "inventory_max": 50.0,
        "inventory_step": 5.0,
        "gamma": 0.1, "sigma": 0.01, "k": 1.5, "T_minus_t": 1.0,
    },
)

The flow is a thin facade over solve_avst and writes its rows to the gold-tier alphaswarm_gold_analysis_optimal_control.<table> namespace when invoked through AnalysisRuntime.

3. Agent-callable DataMCPTool

# inside an AgentSpec body the tool surfaces as ``data.optimal_control.solve_hjb``
result = ctx.tools["data.optimal_control.solve_hjb"].invoke(
    ctx=mcp_ctx, model="avst", mid_price=100.0, inventory=10.0,
    gamma=0.1, sigma=0.01, k=1.5, T_minus_t=1.0,
)

The tool is registered in alphaswarm/data/mcp/tools/optimal_control.py and complies with AGENTS rule 22 — agents never read Iceberg / Postgres directly.

Avellaneda-Stoikov (single-asset)

Reservation price plus optimal half-spread:

r(s, q, t) = s − q · γ · σ² · (T − t)
δ        = ½ · γ · σ² · (T − t) + (1/γ) · ln(1 + γ/k)
bid       = r − δ
ask       = r + δ

The JAX kernel _avst_kernel is JIT-compiled with @jax.jit and takes only Python floats / arrays — no I/O, no globals, no Python control flow keyed on values. vmap lets us evaluate the kernel across an inventory grid in one compiled call.

The closed-form GLFT 2013 variant ( glft_closed_form) is what alphaswarm.strategies.hft.alphas.GLFTMM calls on every event. Its 2/γ · ln(1 + γ/k) term differs from the finite-horizon AvSt 1/γ · ln(...) by a factor of two — that's the long-horizon limit.

Cartea-Jaimungal-Penalva (inventory-penalised liquidation)

Linear-quadratic ansatz H(t, q, S) = q·S + h₂(t)·q² + h₁(t)·q + h₀(t) reduces the HJB to a system of three coupled ODEs:

dh₂/dt = −φ − h₂² / κ
dh₁/dt = −h₁ · h₂ / κ
dh₀/dt = −h₁² / (4 · κ)

Solved backwards from the terminal conditions h₂(T) = −α and h₁(T) = h₀(T) = 0 via fixed-step RK4. The optimal feedback trading rate is

ν*(t, q) = − (h₂(t) · q + ½ · h₁(t)) / κ

When φ > 0 the agent sells (or buys) faster than TWAP near the terminal because h₂ decreases; when φ = 0 the rate collapses to zero (no urgency).

Pairing with reinforcement learning

The closed forms are reference benchmarks. To learn a richer policy for non-Gaussian dynamics, drive an RL agent through:

alphaswarm.rl.envs.MarketMakingEnv — PPO/SAC over AvSt knobs.
alphaswarm.rl.envs.OptimalExecutionEnv — Cartea-Jaimungal block liquidation.

Sample experiment YAMLs ship under configs/rl/ (avellaneda_stoikov_mm.yaml, cartea_jaimungal_execution.yaml).

Where to invoke​

1. Direct Python API​

2. Analysis flows (preferred — gives you UI form + Iceberg persistence)​

3. Agent-callable DataMCPTool​

Avellaneda-Stoikov (single-asset)​

Cartea-Jaimungal-Penalva (inventory-penalised liquidation)​

Pairing with reinforcement learning​

See also​