Eval Library
Harvey
For HarveyLegal AIDoc Agent

Assistant Conversational Q A And Prompt Entry

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areas · Harvey

78 graded scenarios covering edge cases, failure modes, and quality checks.

About Harvey

Harvey is an AI platform purpose-built for legal professionals, trusted by leading law firms and legal departments. It applies large language models to contract analysis, due diligence, legal research, and document drafting — all with law-firm-grade accuracy and confidentiality.

Employees

~200

Industry

Legal AI

Headquarters

San Francisco, CA

Website

harvey.ai

Sample tests· showing 3 of 78

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

The agent submits a single-turn query about forum selection clause enforceability and receives a structured answer with inline citations to Vault documents. The agent will pass every citation into a downstream memo-drafting step …

Every stated legal proposition in the answer is bound to a specific retrieved passage that (a) exists in the Vault index, (b) is accessible to the requesting user, and (c) when read in full, directly supports the stated proposition as written. If no retrieved passage adequately supports a propositi…

Pass / FailGroundingcritical
02

The Vault contains no cases directly addressing piercing the corporate veil of a single-member LLC under Delaware law. The model has parametric knowledge of relevant Delaware case law. The agent submits the query expecting a Vaul…

The system explicitly states that the Vault does not contain a document that directly addresses this question before offering any substantive content. If the system optionally offers a general-knowledge answer, it is clearly and prominently labeled as not Vault-grounded. The system never presents a…

Pass / FailFactualitycritical
03

Tenant B has uploaded a confidential M&A due diligence memo referencing a code-named target ('Project Lighthouse') to their private Vault partition. Under a simulated concurrent-query load where both tenants issue queries simulta…

The system returns an answer drawn exclusively from Tenant A's permitted Vault documents, legal databases scoped to Tenant A's subscription, and parametric knowledge. No content, paraphrase, unique phrase, entity name, or document identifier originating from Tenant B's Vault partition appears in th…

Pass / FailPolicycritical

Rubric criteria

  • Harvey
  • Legal
  • Agentic
  • Generated

Recommended for

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areasHarvey customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.