Orchestration control plane refactor — rollout runbook
This is the operator-facing rollback / rollout guide for the additive
WorkflowRuntime + OrchestrationAdapter stack landed by the
seven phases described in
ALPHASWARM_REFACTOR_MASTER_PROMPT.md and
the matching cursor plan. Every change in the refactor is gated by
one of the ALPHASWARM_ORCHESTRATION_* flags defined on
alphaswarm/config/settings.py; with every
flag at its default False the platform behaves identically to the
pre-refactor build. The Phase 0 regression test
tests/agents/test_orchestration_flags.py
enforces this — run it before flipping anything.
Flag inventory
Flag (env var prefix ALPHASWARM_) | Default | Activates | First needed in |
|---|---|---|---|
ORCHESTRATION_STUDIO_ENABLED | false | /workflows/* API surface, Vite studio routes, WorkflowSpec registry persistence | Phase 5 |
ORCHESTRATION_CREW_ADAPTER_ENABLED | false | CrewProcessAdapter registration (crewai stays an optional import) | Phase 2 |
ORCHESTRATION_FUSION_ENABLED | false | SignalFusionAdapter + WeightCentricExecutionAdapter + build_dialectical_with_fusion_graph | Phase 4 |
ORCHESTRATION_SCHEDULE_ENABLED | false | AutomationScheduleAdapter Celery beat entry | Phase 3 |
ORCHESTRATION_WORKFLOW_VERSIONING_ENABLED | false | Snapshots WorkflowSpec into workflow_spec_versions on first run | Phase 5 |
ORCHESTRATION_KILL_PROPAGATION_ENABLED | false | Watchdog + KillSwitch UI fan halts into WorkflowRun rows | Phase 6 |
ORCHESTRATION_MAX_DEBATE_ROUNDS (int) | 2 | Hard cap enforced by DialecticalDebateAdapter and the graph builder | Phase 2 |
ORCHESTRATION_HALT_CHECK_TIMEOUT_SECONDS (float) | 1.0 | Per-transition halt-check budget in WorkflowRuntime | Phase 2 |
The two numeric knobs are read every transition, so changing them takes effect on the next workflow step without a restart.
Recommended rollout order
- Phase 0 → Phase 1: deploy with every flag at default. Run the full pytest suite plus tests/agents/test_orchestration_flags.py to confirm zero behavioural drift.
- Phase 2 (debate): flip
ORCHESTRATION_CREW_ADAPTER_ENABLEDif you want CrewAI-backed crew adapters to register; otherwise leave off. The bounded-debate cap is always honoured by the new graph builder kwarg regardless of this flag. - Phase 3 (scheduler): flip
ORCHESTRATION_SCHEDULE_ENABLEDAFTER restarting Celery workers + beat. The flag controls whetheralphaswarm.tasks.celery_appregisters the beat schedule entry. - Phase 4 (fusion): flip
ORCHESTRATION_FUSION_ENABLEDonly after confirming the existingrisk_simulator_approvespredicate still routes correctly on a staging dataset — fusion adds a sibling pathway, the existing risk gate stays authoritative. - Phase 5 (studio): flip
ORCHESTRATION_STUDIO_ENABLEDandORCHESTRATION_WORKFLOW_VERSIONING_ENABLEDtogether. Apply the alembic migration0046_workflow_versioning.pyBEFORE the flag is flipped on the API process. - Phase 6 (halt fan-out): flip
ORCHESTRATION_KILL_PROPAGATION_ENABLEDlast. The KillSwitch UI keeps its existing behaviour with this flag off; turning it on adds workflow-run fan-out to the existing/agents/halt,/paper/stop-all,/bots/halt-all,/rl/halt-all, and/quant-agents/haltfan-out.
Rollback recipes
All rollbacks are flag-flips (no migrations, no data loss):
- Disable studio + API: set
ALPHASWARM_ORCHESTRATION_STUDIO_ENABLED=falseand reload the API. The/workflows/*routes refuse new requests with503 Service Unavailablewhile the rest of the API keeps serving. - Disable scheduler: set
ALPHASWARM_ORCHESTRATION_SCHEDULE_ENABLED=falseand restart Celery beat. Already-running scheduled runs finish normally; no new ones are enqueued. - Disable fusion: set
ALPHASWARM_ORCHESTRATION_FUSION_ENABLED=falseand reload. The optionalbuild_dialectical_with_fusion_graphbuilder refuses to compile; existing builders are unaffected. - Disable kill fan-out: set
ALPHASWARM_ORCHESTRATION_KILL_PROPAGATION_ENABLED=false. The KillSwitch UI keeps its existing five halt buttons (agents / paper / bots / rl / quant-agents); the new "Halt workflows" button no-ops. - Disable workflow versioning: set
ALPHASWARM_ORCHESTRATION_WORKFLOW_VERSIONING_ENABLED=false. New runs refuse to snapshot a spec hash; existingworkflow_spec_versionsrows stay readable. - Full revert: set every
ALPHASWARM_ORCHESTRATION_*flag tofalse, redeploy. The platform behaves exactly like the pre-refactor build. The new tables (workflow_specs,workflow_spec_versions,workflow_runs) stay empty and add no read overhead to other routes.
Migration safety
- The single new migration
0046_workflow_versioning.pyis additive: it creates three new tables and adds no columns to existing tables. Downgrade returns the database to the0045_pgvector_foundationhead. - The new
alphaswarm.tasks.orchestration_tasksmodule appends to the Celeryincludelist; cold installs without the module fail loudly at worker boot rather than silently dropping tasks. - The Vite studio bundle is code-split: routes under
alphaswarm_client/src/routes/workflows/*lazy-load only when the user navigates there, so disabling the flag also disables the bundle download path.
Pre-flip checklist
Run before flipping any flag in production:
docker exec alphaswarm-api python -m pytest tests/agents/test_orchestration_flags.py -vdocker exec alphaswarm-api python -m pytest tests/agents/test_watchdog.py -vdocker exec alphaswarm-api alembic current— confirm head is at least0045_pgvector_foundation; for Phase 5+ confirm0046_workflow_versioning.- Snapshot the Redis kill-switch key
(
redis-cli get $ALPHASWARM_RISK_KILL_SWITCH_KEY) — the watchdog uses the same key so the new gate stays consistent.
Where each layer lives
- Settings flags: alphaswarm/config/settings.py "Orchestration control plane" block.
- Regression test: tests/agents/test_orchestration_flags.py.
- Adapter abstraction:
alphaswarm/agents/orchestration/(Phase 1). - Adapters:
alphaswarm/agents/orchestration/adapters/(Phases 2-4). - DataMCP tools:
alphaswarm/data/mcp/tools/orchestration.py+automation.py(Phase 3). - Celery task:
alphaswarm/tasks/orchestration_tasks.py(Phase 3). - Persistence:
alphaswarm/persistence/models_workflows.py+ alembic0046_workflow_versioning.py(Phase 5). - API:
alphaswarm/api/routes/workflows.py(Phase 5). - Studio UI:
alphaswarm_client/src/routes/workflows/*(Phase 5). - Halt + watchdog hardening:
alphaswarm/tasks/agent_watchdog_tasks.py,alphaswarm_client/src/components/common/KillSwitch.tsx(Phase 6).