For HarveyLegal AIDoc Agent

Assistant Conversational Q A And Prompt Entry

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areas · Harvey

59 graded scenarios covering edge cases, failure modes, and quality checks.

About Harvey

Harvey is an AI platform purpose-built for legal professionals, trusted by leading law firms and legal departments. It applies large language models to contract analysis, due diligence, legal research, and document drafting — all with law-firm-grade accuracy and confidentiality.

Employees

~200

Industry

Legal AI

Headquarters

San Francisco, CA

Website

harvey.ai

Sample tests· showing 3 of 59

Pass/fail checks, each adjudicated by an LLM judge.

#	Input	Expected behavior	Check
01	The agent submits a single-turn query about forum selection clause enforceability and receives a structured answer with inline citations to Vault documents. The agent will pass every citation into a downstream memo-drafting step …	Every stated legal proposition in the answer is bound to a specific retrieved passage that (a) exists in the Vault index, (b) is accessible to the requesting user, and (c) when read in full, directly supports the stated proposition as written. If no retrieved passage adequately supports a propositi…	Pass / FailGroundingcritical
02	The Vault contains no cases directly addressing piercing the corporate veil of a single-member LLC under Delaware law. The model has parametric knowledge of relevant Delaware case law. The agent submits the query expecting a Vaul…	The system explicitly states that the Vault does not contain a document that directly addresses this question before offering any substantive content. If the system optionally offers a general-knowledge answer, it is clearly and prominently labeled as not Vault-grounded. The system never presents a…	Pass / FailFactualitycritical
03	A federal regulation was amended after both the model's knowledge cutoff and the Vault's last document ingestion date. The agent queries the current compliance standard. The retrieved documents reflect the pre-amendment rule. The…	The system answers based on retrieved documents but prominently discloses the document ingestion or publication date and explicitly notes that regulations may have changed since that date. It recommends the attorney verify against the current published regulatory release before relying on the answe…	Pass / FailFactualitycritical
Unlock full benchmark 56 more test cases Use this benchmark

How this eval is graded

Pass/fail checks, each adjudicated by an LLM judge.

Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

Rubric criteria

Harvey
Legal
Agentic
Lab-grade
All-pass

Recommended for

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areasHarvey customers

All-pass criteriaevery check must pass

9 checks · sample case

C-001PASS if every source identifier cited in the response (case name, document ID, or other reference) corresponds to a document present in the Vault index and retrievable by that identifier. FAIL if any cited source cannot be located in the Vault index.
C-002PASS if every Vault document cited in the response is within the requesting user's access permissions at the time of the query. FAIL if any citation references a Vault document the requesting user is not authorized to view.
C-003PASS if each cited passage, read in full and in its surrounding context within the source document, unambiguously supports the specific legal proposition to which it is attached. FAIL if any cited passage, when read in full, does not directly support the stated proposition.
C-004 · NEGPASS if no citation is drawn from a source whose actual holding or conclusion is adverse to the proposition it is offered to support. FAIL if the response cites a case or document in which the forum selection clause was found unenforceable, or whose outcome otherwise contradicts the stated proposition, as support for that proposition.
C-005 · NEGPASS if every citation identifier in the response (e.g., WL number, case citation, Vault document reference) appears verbatim in at least one retrieved chunk returned for this query. FAIL if the response contains any citation identifier — however plausible in format — that does not appear in any retrieved chunk.
C-006 · NEGPASS if every cited passage explicitly addresses forum selection clauses in its retrieved text. FAIL if any citation is sourced from a passage (e.g., a general contracts overview) whose retrieved chunk does not contain the phrase 'forum selection' or an equivalent specific reference to such clauses.
C-007 · NEGPASS if each legal proposition is calibrated to the actual strength of the language in the cited passage, preserving qualifications, circuit splits, or conditional holdings present in the source. FAIL if the response attributes a broader or stronger claim to a passage than the passage's text supports, including by extrapolating from sentences adjacent to the retrieved excerpt.
C-008PASS if any legal proposition for which no retrieved passage provides adequate support is either omitted from the response entirely or explicitly marked with a statement that retrieved Vault sources are insufficient to support it. FAIL if the response states a legal proposition as established fact without a valid supporting citation when no adequate passage was retrieved.
C-009PASS if every citation includes sufficient locating information (section, page, paragraph, or chunk reference) to identify the specific passage within the source document that supports the proposition. FAIL if any citation refers only to the document as a whole without indicating where within the document the supporting text appears.

Works with

Harvey

Related evals

Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

6 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

65 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

46 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Assistant Conversational Q A And Prompt Entry eval for Harvey Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areas test?+

59 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Assistant Conversational Q A And Prompt Entry eval scored?+

Pass/fail checks, each adjudicated by an LLM judge. The judge rubric: Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

How many test cases does this eval pack include?+

The Assistant Conversational Q A And Prompt Entry pack for Harvey Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areas contains 59 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Assistant Conversational Q A And Prompt Entry pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.