Agentic benchmark built from real
case filings.

Need 6th Circuit right-of-publicity authority on monetizing information versus identity itself.

search [6th Circuit right of publicity paid product identity distinction]

search [Ohio right of publicity database publication informational product]

read [ETW Corp. v. Jireh Publishing, Inc.]

Prioritize directories, publications, and paid informational products.

search [Ohio right of publicity informational use commercial speech case]

read [Ruffin-Steinback v. dePasse]

search [6th Circuit persona used as product draw versus factual reference]

read [Parks v. LaFace Records]

search [database service not trade on identity right of publicity 6th Circuit]

read [exhibit - plaintiff allegations summary]

edit draft [five-case short list with holdings and fit]

Real World Tasks

Include drafting briefs, finding precedents, spotting errors and predicting outcomes.

Long Horizon

Each task can take hundreds steps of searching, reading and drafting.

Humans work in teams for days to solve tasks like these.

Dataset-Grounded

To perform well agents have to search across our dataset of all US case law and regulations.

14 million real cases, including not publicly available filings.

Examples

Prompt

Loading prompt…

Loading world files.

Fetching the attached materials for this example.

About Midpage

14M Cases

Plus 6M statutes and regulations, with a comprehensive legal dataset and a proprietary citator showing which cases are overruled.

300+ Firms

Midpage is used by over 300 law firms directly and reaches hundreds of thousands through partner organizations.

5 Partnerships

We are the data supplier to 5 multi-billion-dollar organizations. They use our data, search, and MCP.

200,000 Visitors

Every month, 200k+ visitors read cases directly on our website.

Benchmark Results

Score vs cost

Average estimated cost per task vs. average benchmark score. Baseline without MCP on the completed 600-task litigation run.

Claude Opus 4.7max

Avg cost: $7.66

Avg score: 52.6%

Avg latency: 23.8 min

15%

30%

45%

60%

75%

$2.0

$4.0

$6.0

$8.0

MCP off / on

MCP includes tools for searching across our case law and regulations corpus. Values are measured on the completed 600-task FrontierLaw benchmark runs.

Claude Opus 4.7max

MCP off: 52.6% · $7.66 · 23.8 min

MCP on: 51.8% · $7.34 · 21.8 min

15%

30%

45%

60%

75%

MCP off

MCP on

Opus 4.6maxOpus 4.7maxGPT-5.4xhighGemini 3.1 ProhighKimi K2.5maxKimi K2.6max

Method

Midpage is building the first benchmark and RL env for agentic litigation. In the US alone, millions of cases are filed each year. Using those filings as verifiable environments, we want to help teach LLMs how to become excellent lawyers.

This is a private benchmark. To avoid leaking the questions, we do not give collaborators access to the full set of sample tasks. Submissions are run with the candidate model and harness inside our environments.

Responses are graded from 0 to 1 through rubrics hand-crafted by our attorneys. Our tasks use the Harbor format created by Laude.org.

RL Envs

Like the benchmark, RL uses Harbor format. Rollouts use the provider's own agent harness, and evaluators score from 0 to 1. This scales to tens of thousands of tasks. To avoid cheating we use cases that are not in the model's training sets yet.

Midpage collects case data the same day courts publish it, long before it reaches training datasets. These matters include multiple motions from the parties and one or more decisions from the judge. The documents are often 10 to 50 pages long and represent days of human work. To solve them, agents have to work like human case teams: research arguments, counterarguments, and past precedent, then draft long final outputs that are consistent with the rest of the case file and compliant with court rules.

Talk to us about the benchmark, our MCP, our dataset, and our RL environments.

Request leaderboard

DM us on @ottozastrow

Or contact: benchmark@midpage.ai

Agentic benchmark built from realcase filings.

Agentic benchmark built from real
case filings.