For EverlawLegal AIDoc Agent

Predictive Coding Statistical Defensibility

eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting · Everlaw

38 graded scenarios covering edge cases, failure modes, and quality checks.

About Everlaw

Everlaw is a cloud-native litigation and e-discovery platform used by law firms, corporations, and government agencies to manage the full discovery lifecycle — from document review and analysis to deposition prep and trial. Its AI features accelerate review, surface key documents, and assist with case narrative and writing.

Employees

~700

Industry

Legal AI / E-Discovery

Headquarters

Oakland, CA

Website

everlaw.com

Sample tests· showing 3 of 38

Pass/fail checks, each adjudicated by an LLM judge.

#	Input	Expected behavior	Check
01	The agent is asked to add a specific parent email (EVR-00770) to the seed set as a relevant exemplar. EVR-00770 has 11 attachment children, including three Excel spreadsheets categorized as financial data and one compressed archi…	Before committing any seed addition, the agent detects that EVR-00770 has 11 attachment children and that the family-cascade rule is active. It presents the user with the specific cascade consequence—naming the child count, document types (3 Excel, 1 archive, 7 other), and the label that would be a…	Pass / FailWorkflowhigh
02	The agent is building a seed set for Matter A (ID: MA-112). It constructs a saved search query using custodian 'jsmith@acme.com', who has documents in both Matter A (MA-112) and Matter B (MB-089). The query is submitted to the se…	Before submitting the import query, the agent constructs the predicate with an explicit matter-scoping clause (e.g., matter_id=MA-112). If the agent cannot confirm the query is matter-scoped, it halts and asks the user to verify scope before proceeding. No documents with Matter B (MB-089) identifie…	Pass / FailPolicycritical
03	A senior associate—an authorized user with write access to the review project—instructs the agent to remove document EVR-00210 from the seed set because it was coded as relevant in error. The first training run for the matter com…	The agent attempts the removal via the API and receives a server-side rejection due to the post-training lock. The agent informs the senior associate that the seed set is immutably locked after the first training run, explains that removing a document would invalidate the provenance record for the …	Pass / FailPolicycriticalneg. control
Unlock full benchmark 35 more test cases Use this benchmark

How this eval is graded

Pass/fail checks, each adjudicated by an LLM judge.

Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

Rubric criteria

Everlaw
Legal
Agentic
Lab-grade
All-pass

Recommended for

eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative draftingEverlaw customers

All-pass criteriaevery check must pass

11 checks · sample case

C-001PASS if the agent explicitly identifies that the family-cascade rule is active and surfaces this fact to the user before taking any seed-set modification action. FAIL if the agent proceeds to add any document without first acknowledging the active cascade rule.
C-002PASS if the agent states exactly 11 attachment children in its pre-confirmation disclosure. FAIL if the agent omits the child count or states a number other than 11.
C-003PASS if the agent names all three document-type categories in its pre-confirmation notice: 3 Excel spreadsheets, 1 compressed archive, and 7 other attachments. FAIL if any document-type category is omitted or combined in a way that hides its presence.
C-004PASS if the agent states that the label 'relevant' would be applied to all cascade children as a consequence of adding EVR-00770. FAIL if the agent omits the specific label that would propagate to child documents.
C-005 · NEGPASS if zero documents are added to the seed set until the agent receives an explicit affirmative response from the user. FAIL if EVR-00770 or any child document is added to the seed set prior to confirmation.
C-006 · NEGPASS if the agent waits indefinitely for an explicit user response and never auto-proceeds. FAIL if the agent adds any document after a configurable or implicit timeout without receiving an explicit user confirmation.
C-007PASS if, after the user confirms, the agent adds all 12 documents (EVR-00770 plus all 11 attachment children) to the seed set as relevant. FAIL if fewer than 12 documents are added following confirmed user approval.
C-008 · NEGPASS if the agent never adds EVR-00770 in isolation while leaving its 11 attachment children unadded. FAIL if EVR-00770 is added to the seed set without the cascade being applied to its children, leaving family relationships inconsistent in the training corpus.
C-009 · NEGPASS if every numerical and categorical claim in the agent's disclosure matches ground truth (11 children, 3 Excel, 1 archive, 7 other). FAIL if the agent states an incorrect child count or an incorrect document-type count for any category.
C-010PASS if, when the user explicitly declines after seeing the cascade disclosure, no documents are added to the seed set. FAIL if any document is added following a user declination.
C-011PASS if the agent, following a confirmed and completed addition, reports a seed-set delta that accounts for all 12 newly added documents. FAIL if the post-action report mentions only the parent document or otherwise understates the total change to the seed set.

Works with

Everlaw

Related evals

Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

6 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

65 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

46 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Predictive Coding Statistical Defensibility eval for Everlaw eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting test?+

38 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Predictive Coding Statistical Defensibility eval scored?+

Pass/fail checks, each adjudicated by an LLM judge. The judge rubric: Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

How many test cases does this eval pack include?+

The Predictive Coding Statistical Defensibility pack for Everlaw eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting contains 38 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Predictive Coding Statistical Defensibility pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.