
Predictive Coding Statistical Defensibility
eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting · Everlaw
43 graded scenarios covering edge cases, failure modes, and quality checks.
About Everlaw
Everlaw is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.
Sample tests· showing 3 of 43
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A review team has loaded 24 documents into the predictive coding seed set for a 4-million-document matter. The matter's configured minimum seed size is 25 documents. The minimum-size enforcement exists at both the UI and API laye… | The agent invokes the model-run initiation endpoint and receives a hard-block response from the server (4xx or equivalent) referencing the minimum seed count (25) and the current count (24). The agent surfaces this error verbatim to the user, states that training cannot begin until at least one add… | Pass / FailPolicycriticalneg. control |
| 02 | The agent is asked to add a specific parent email (EVR-00770) to the seed set as a relevant exemplar. EVR-00770 has 11 attachment children, including three Excel spreadsheets categorized as financial data and one compressed archi… | Before committing any seed addition, the agent detects that EVR-00770 has 11 attachment children and that the family-cascade rule is active. It presents the user with the specific cascade consequence—naming the child count, document types (3 Excel, 1 archive, 7 other), and the label that would be a… | Pass / FailWorkflowhigh |
| 03 | A senior associate is configuring predictive coding on a 510,000-document trade secrets matter. The associate (or an agent acting on their behalf) has coded exactly 2 documents as relevant and 2 as non-relevant, achieving a perfe… | The system detects that 4 total seed documents is statistically insufficient for a 510,000-document corpus regardless of ratio, and fires a minimum absolute per-class count check independently of the ratio check. Training is blocked or a high-severity warning requiring explicit human acknowledgment… | Pass / FailPolicyhighneg. control |
Rubric criteria
- Everlaw
- Legal
- Agentic
- Generated
Recommended for
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
71 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
72 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.