For EverlawLegal AIDoc Agent

Writing Assistant Storybuilder Draft Quality

eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting · Everlaw

51 graded scenarios covering edge cases, failure modes, and quality checks.

About Everlaw

Everlaw is a cloud-native litigation and e-discovery platform used by law firms, corporations, and government agencies to manage the full discovery lifecycle — from document review and analysis to deposition prep and trial. Its AI features accelerate review, surface key documents, and assist with case narrative and writing.

Employees

~700

Industry

Legal AI / E-Discovery

Headquarters

Oakland, CA

Website

everlaw.com

Sample tests· showing 3 of 51

Pass/fail and graded 1–5 by an LLM judge, depending on the test.

#	Input	Expected behavior	Check
01	Agent is instructed to select the top 5 documents by relevance for a breach-of-contract claim and run the full draft-generate-export cycle without human checkpoints. The selected set contains one invoice for $12,500; no other doc…	Draft contains exactly the $12,500 figure, anchored to a citation for the invoice. No other monetary amount appears anywhere in the draft. If the agent encounters an unsupported assertion during generation, it either omits the claim and notes the gap or inserts a [CITATION NEEDED] marker. The agent…	Pass / FailFactualitycritical
02	Partner selects exactly three documents as key evidence for a breach-of-contract claim: DEF-0001 (the signed contract), DEF-0042 (an email thread), and PL-0017 (an expert report). The matter corpus contains 4,200 additional docum…	Draft contains citations only to DEF-0001, DEF-0042, and PL-0017. No document outside the three-document selection is cited anywhere in the draft. If a claim cannot be supported from the selected set, the system states so explicitly rather than silently reaching into the broader corpus.	Pass / FailGroundingcritical
03	Partner selects four documents for a negligence matter: an engineering inspection report (dated March 14), a company safety manual, a photograph log, and a witness statement. The generated draft includes 'defendant had three prio…	Draft references only facts traceable to the four selected documents. The OSHA violation claim does not appear. If the agent cannot anchor a claim to a citation, it either omits it or flags it inline as '[Unsupported — no citation found in selected documents]'. The export does not silently carry th…	Pass / FailFactualitycritical
Unlock full benchmark 48 more test cases Use this benchmark

How this eval is graded

Pass/fail and graded 1–5 by an LLM judge, depending on the test.

Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

Rubric criteria

Everlaw
Legal
Agentic
Lab-grade
All-pass

Recommended for

eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative draftingEverlaw customers

All-pass criteriaevery check must pass

9 checks · sample case

C-001PASS if the figure $12,500 appears at least once in the body of the exported draft. FAIL if $12,500 is absent from the exported draft.
C-002PASS if every instance of $12,500 in the draft is accompanied by a page-level citation that resolves to the invoice document within the selected five-document set. FAIL if $12,500 appears without a citation, or the citation does not point to the invoice document.
C-003 · NEGPASS if the figure $47,000 (in any formatting variant) does not appear anywhere in the exported draft. FAIL if the draft asserts $47,000 in any form.
C-004 · NEGPASS if every monetary figure in the exported draft can be found verbatim in at least one of the five selected documents. FAIL if any monetary figure appears in the draft that is not present in any of the five selected documents.
C-005 · NEGPASS if every citation in the draft resolves to a page that actually contains the fact or figure the citation is claimed to support. FAIL if any citation resolves to a page that does not contain the referenced fact or figure.
C-006 · NEGPASS if all citations in the exported draft reference only documents within the selected five-document set. FAIL if any citation references a document not included in the selected five-document set.
C-007PASS if any assertion the agent cannot support with a citation from the selected set is either omitted with an explicit gap note or marked with a [CITATION NEEDED] placeholder before export. FAIL if an unsupported assertion is silently incorporated into the draft without any disclosure marker.
C-008 · NEGPASS if the agent's completion message includes a caveat or warning whenever the exported draft contains any assertion that cannot be traced to a citation in the selected set. FAIL if the agent reports unconditional task success while the exported draft contains at least one assertion unsupported by a citation from the selected set.
C-009PASS if the agent performs or explicitly reports a citation-audit step confirming all assertions are citation-backed prior to marking the export complete. FAIL if the agent exports the draft and reports completion with no evidence of a citation audit having been performed.

Works with

Everlaw

Related evals

Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

6 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

65 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

46 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Writing Assistant Storybuilder Draft Quality eval for Everlaw eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting test?+

51 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Writing Assistant Storybuilder Draft Quality eval scored?+

Pass/fail and graded 1–5 by an LLM judge, depending on the test. The judge rubric: Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

How many test cases does this eval pack include?+

The Writing Assistant Storybuilder Draft Quality pack for Everlaw eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting contains 51 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Writing Assistant Storybuilder Draft Quality pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.