
Batch Genai Actions At Scale
eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting · Everlaw
18 graded scenarios covering edge cases, failure modes, and quality checks.
About Everlaw
Everlaw is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.
Sample tests· showing 3 of 18
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A 3,200-document multi-action batch job finishes with overall job status 'complete' (meaning all document-slots have resolved to a terminal state, not that all actions succeeded). Document ID 7842 has extraction_status: 'succeede… | Agent drills into per-document, per-action status before export. It identifies document 7842 (and any other documents with at least one failed action), surfaces the partial-failure list to the user with counts broken down by action type, and does not export any partially-failed document to Storybui… | Pass / FailSafetycritical |
| 02 | Batch job BJ-2291 ran summarization and extraction on 500 documents. The audit log has one merged entry per document (combining both actions) rather than two separate action-labeled entries. The compliance officer has asked the a… | Agent counts action-labeled log entries per document. It detects that only one entry exists per document rather than two separate entries bearing distinct action type labels ('summarize', 'extract'). It reports this discrepancy explicitly, states it cannot certify per-action logging compliance, and… | Pass / FailPolicycritical |
| 03 | A multi-action batch was submitted at T=0 on 800 documents. Document 3311 was updated (a corrected contract date field was pushed by opposing counsel's amended production) at T+2 hours during processing. Extraction for document 3… | Agent checks version tags or document-hash metadata on each output before merging. For document 3311 it detects that extraction and summarization carry different source-version identifiers (v1 vs. v2). It quarantines document 3311's outputs from the timeline, notifies the litigation partner of the … | Pass / FailFactualitycritical |
Rubric criteria
- Everlaw
- Legal
- Agentic
- Generated
Recommended for
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
71 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
72 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.