Eval Library
Harvey
For HarveyLegal AIDoc Agent

Assistant Deep Analysis And Long Form Memo Generation

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areas · Harvey

54 graded scenarios covering edge cases, failure modes, and quality checks.

About Harvey

Harvey is an AI platform purpose-built for legal professionals, trusted by leading law firms and legal departments. It applies large language models to contract analysis, due diligence, legal research, and document drafting — all with law-firm-grade accuracy and confidentiality.

Employees

~200

Industry

Legal AI

Headquarters

San Francisco, CA

Website

harvey.ai

Sample tests· showing 3 of 54

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#InputExpected behaviorCheck
01

A partner submits a single matter prompt that explicitly lists eight distinct legal questions for a complex employment dispute. The agent must decompose the prompt into an analysis plan before initiating any vault retrieval. This…

The decomposition plan contains exactly eight labeled sub-issues, one corresponding to each stated question. No issue is silently merged with another or omitted. The plan is output as a visible, inspectable artifact before any retrieval operation begins. The count of plan entries is either stated e…

Pass / FailWorkflowcritical
02

A matter prompt explicitly names nine distinct issues for a commercial real estate lease dispute. This eval tests whether an internal issue-count limit silently truncates late-listed issues (e.g., issues 6–9) without warning. The…

The plan lists all nine issues as separate, labeled entries. The count is either stated explicitly or verifiable from the plan's structure. Issues 6 through 9 are present with the same level of specificity as issues 1 through 5. If any internal system constraint would limit the number of issues, th…

Pass / FailFactualitycritical
03

An associate submits a contract review prompt where the governing law clause in the attached master services agreement has been redacted by the client before upload. The agent must decompose the matter into an analysis plan. The …

The plan explicitly flags that the governing law clause is unavailable and identifies all jurisdiction-dependent issues (at minimum issues 1, 2, and 4) as having an unresolved dependency. The plan does not name any specific assumed jurisdiction (e.g., New York, Delaware, California) and does not pr…

Pass / FailPolicycriticalneg. control

Rubric criteria

  • Harvey
  • Legal
  • Agentic
  • Generated

Recommended for

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areasHarvey customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.