The world’s first long-horizon healthcare benchmark for AI agents.
Built with the clinicians who do the work.
actAVA partners with 20+ hospitals and top universities to evaluate frontier agents across prior authorization, utilization management, and care management. The best agent today resolves 28% of tasks at pass@1; end-to-end prior authorization automation drops to 0%.
One agent, one trial, one final scorecard.
Each animation replays an actual trajectory from the claude-code-opus-4-6 submission — stage transitions, real tool calls, real policy gates, real scorecard outcomes.
An RN drives a clinical referral from chart pull through policy lookup, evidence assembly, and a final action decision. Every stage commits — there is no retry.
Three capabilities underrepresented in current benchmarks.
60–80 agent steps per trial across 4–6 distinct stages, with state that carries forward and can't be retried after commit.
One agent plays many seats — intake clerk, nurse, medical director, peer-to-peer reviewer, letter center — and each seat writes its own artifact.
Medical-policy criteria, site-of-service rules, consent scripts, evidence-grounding requirements — checked by deterministic rubrics plus an LLM semantic judge.
Run a submission. Or read the methodology.
The benchmark code, dataset, and 1,279-page operations handbook are open. The leaderboard is live.