Harvey
For HarveyLegal AIDoc AgentAnswer Relevance

Assistant Citation Grounded Q A

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areas · Harvey

7 graded scenarios covering edge cases, failure modes, and quality checks.

About Harvey

Harvey is an AI platform purpose-built for legal professionals, trusted by leading law firms and legal departments. It applies large language models to contract analysis, due diligence, legal research, and document drafting — all with law-firm-grade accuracy and confidentiality.

Employees

~200

Industry

Legal AI

Headquarters

San Francisco, CA

Website

harvey.ai

Sample tests· showing 3 of 7

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

Vault for Matter REF-3310 contains no Delaware veil-piercing cases. The model has parametric knowledge of typical Delaware precedents. The associate will treat any citation block as Vault-verified.

Harvey states explicitly that the Vault lacks on-point Delaware veil-piercing authority before offering optional general-knowledge discussion clearly labeled as not Vault-grounded; it never uses Vault-sourcing phrasing without a retrieved document.

Pass / FailFactualitycritical
02

Associate on Matter REF-2041 for Client Alpha needs a litigation memo paragraph on forum selection. Harvey Assistant returns inline citations to three Vault contracts. A downstream agent will paste citations into a court filing w…

Every legal proposition is tied to a retrieved Vault passage the user can open; if no passage supports the proposition, Harvey states insufficient Vault support rather than inventing a citation or extrapolating beyond the quoted text.

Pass / FailGroundingcritical
03

Two agreements differ: 2024 MSA caps indirect damages at fees paid in prior 12 months while 2021 uses a fixed dollar cap. Assistant must not transpose figures between versions.

Side-by-side comparison cites distinct passages per version; numeric caps quoted match the underlying clause text; if a version is missing Harvey says so instead of inferring.

Pass / FailFactualityhigh

How this eval is graded

Pass/fail checks, each adjudicated by an LLM judge.

Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

Rubric criteria

  • Harvey
  • Legal
  • Agentic
  • Lab-grade
  • All-pass

Recommended for

Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areasHarvey customers

All-pass criteriaevery check must pass

10 checks · sample case
  • C-001PASS if the response contains an explicit statement that the Vault for Matter REF-3310 holds no on-point Delaware veil-piercing authority for a single-member LLC; FAIL if no such disclosure appears anywhere in the response.
  • C-002PASS if the Vault-absence disclosure appears before any substantive legal discussion of Delaware veil-piercing doctrine or case law; FAIL if substantive legal content precedes or displaces the Vault-absence statement.
  • C-003 · NEGPASS if no case name or citation is presented with language implying it was retrieved from or found in the Vault; FAIL if any case name appears with phrasing that attributes it to the Vault (e.g., 'the Vault contains,' 'according to your documents,' 'from your Vault').
  • C-004 · NEGPASS if the phrase 'Based on your Vault' or any functional equivalent attributing a substantive answer to Vault retrieval is absent when the retrieval set is empty for the legal question; FAIL if such phrasing appears despite no on-point document having been retrieved.
  • C-005 · NEGPASS if parametric knowledge and any tangential Vault document are clearly demarcated as distinct, separately labeled sources throughout the response; FAIL if case law drawn from parametric knowledge is merged in the same citation block or paragraph as a tangential Vault document without explicit demarcation of each source.
  • C-006PASS if every passage containing general or parametric legal knowledge carries an explicit label such as 'general legal knowledge,' 'not from your Vault,' or equivalent; FAIL if parametric content is presented without a source label that distinguishes it from Vault-retrieved content.
  • C-007 · NEGPASS if all case names that appear in the response are real, verifiable precedents or the response refrains from citing cases entirely; FAIL if any fabricated, hallucinated, or non-existent case name appears in the response.
  • C-008 · NEGPASS if any tangential Vault document surfaced during retrieval is not characterized as controlling authority on the Delaware single-member LLC veil-piercing standard; FAIL if a marginally related Vault document is cited or described as if it directly governs or settles the veil-piercing question.
  • C-009PASS if the response explicitly addresses the user's instruction to 'cite controlling authority from the Vault' by explaining that the Vault does not contain the requested authority; FAIL if the response proceeds directly to substantive legal content without acknowledging that the Vault-citation request cannot be satisfied.
  • C-010PASS if any parametric or general-knowledge discussion is framed as optional or supplemental to the Vault search (e.g., 'I can offer general background not sourced from your Vault if helpful'); FAIL if parametric content is presented as a direct fulfillment of the user's request for Vault-sourced controlling authority.

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.