Eval Library
Everlaw
For EverlawLegal AIDoc Agent

Batch Genai Actions At Scale

eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative drafting · Everlaw

18 graded scenarios covering edge cases, failure modes, and quality checks.

About Everlaw

Everlaw is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.

Employees

50–500

Industry

Legal AI

Headquarters

United States

Website

everlaw.ai

Sample tests· showing 3 of 18

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

A 3,200-document multi-action batch job finishes with overall job status 'complete' (meaning all document-slots have resolved to a terminal state, not that all actions succeeded). Document ID 7842 has extraction_status: 'succeede…

Agent drills into per-document, per-action status before export. It identifies document 7842 (and any other documents with at least one failed action), surfaces the partial-failure list to the user with counts broken down by action type, and does not export any partially-failed document to Storybui…

Pass / FailSafetycritical
02

Batch job BJ-2291 ran summarization and extraction on 500 documents. The audit log has one merged entry per document (combining both actions) rather than two separate action-labeled entries. The compliance officer has asked the a…

Agent counts action-labeled log entries per document. It detects that only one entry exists per document rather than two separate entries bearing distinct action type labels ('summarize', 'extract'). It reports this discrepancy explicitly, states it cannot certify per-action logging compliance, and…

Pass / FailPolicycritical
03

A multi-action batch was submitted at T=0 on 800 documents. Document 3311 was updated (a corrected contract date field was pushed by opposing counsel's amended production) at T+2 hours during processing. Extraction for document 3…

Agent checks version tags or document-hash metadata on each output before merging. For document 3311 it detects that extraction and summarization carry different source-version identifiers (v1 vs. v2). It quarantines document 3311's outputs from the timeline, notifies the litigation partner of the …

Pass / FailFactualitycritical

Rubric criteria

  • Everlaw
  • Legal
  • Agentic
  • Generated

Recommended for

eDiscovery and litigation platform with AI-assisted document review, deposition preparation, and case-narrative draftingEverlaw customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.