Agentic benchmark built from real
case filings.

Need 6th Circuit right-of-publicity authority on monetizing information versus identity itself.
search [6th Circuit right of publicity paid product identity distinction]
search [Ohio right of publicity database publication informational product]
read [ETW Corp. v. Jireh Publishing, Inc.]
Prioritize cases involving directories, publications, and paid informational products.
search [Ohio right of publicity informational use commercial speech case]
read [Ruffin-Steinback v. dePasse]
search [6th Circuit persona used as product draw versus factual reference]
read [Parks v. LaFace Records]
search [database service not trade on identity right of publicity 6th Circuit]
read [exhibit - plaintiff allegations summary]
edit draft [five-case short list with holdings and fit]

Real World Tasks

Include drafting briefs, finding precedents, spotting errors and predicting outcomes.

Designed by former lawyers from top firms like Latham Watkins and Greenberg Traurig.

Long Horizon

Each task can take hundreds steps of searching, reading and drafting.

Humans work in teams for days to solve tasks like these.

Dataset-Grounded

To perform well agents have to search across our dataset of all US case law and regulations.

14 million real cases, including not publicly available filings.

About Midpage

14M Cases

Plus 6M statutes and regulations, with a comprehensive legal dataset and a proprietary citator showing which cases are overruled.

300+ Firms

Midpage is used by over 300 law firms directly and reaches hundreds of thousands through partner organizations.

5 Partnerships

We are the data supplier to 5 multi-billion-dollar organizations. They use our data, search, and MCP.

LiteraPerplexity

200.000 Visitors

Every month, 200k+ visitors read cases directly on our website.

Benchmark Results

Accuracy vs cost

X axis is average cost per task, Y axis is average accuracy.

30
45
60
75
$0
$5
$10
$15
Avg accuracy
Cost

MCP off / on

MCP includes tools for searching across our case law and regulations corpus.

40
50
60
70
80
MCP off
MCP on
Avg accuracy
Opus 4.6
GPT-5.4

Method

Midpage is building the first benchmark and RL env for agentic litigation. In the US alone, millions of cases are filed each year. Using those filings as verifiable environments, we want to help teach LLMs how to become excellent lawyers.

This is a private benchmark. To avoid leaking the questions, we do not give collaborators access to the full set of sample tasks. Submissions are run with the candidate model and harness inside our environments.

Responses are graded from 0 to 1 through rubrics hand-crafted by our attorneys. Our tasks use the Harbor format. Shoutout to Alex at Laude.org.

RL Envs

Like the benchmark, RL uses Harbor format. Rollouts use the provider's own agent harness, and evaluators score from 0 to 1. This scales to tens of thousands of tasks. To avoid cheating we use cases that are not in the model's training sets yet.

Midpage collects case data the same day courts publish it, long before it reaches training datasets. These matters include multiple motions from the parties and one or more decisions from the judge. The documents are often 10 to 50 pages long and represent days of human work. To solve them, agents have to work like human case teams: research arguments, counterarguments, and past precedent, then draft long final outputs that are consistent with the rest of the case file and compliant with court rules.

Contact Us

Talk to us about the benchmark, our MCP, our dataset, and our RL environments.

Request leaderboard