
Assistant Citation Grounded Q A
Agentic legal work platform — Assistant, Vault, and Workflows for research, drafting, document review, and multi-step matter execution across 24+ practice areas · Harvey
7 graded scenarios covering edge cases, failure modes, and quality checks.
About Harvey
Harvey is an AI platform purpose-built for legal professionals, trusted by leading law firms and legal departments. It applies large language models to contract analysis, due diligence, legal research, and document drafting — all with law-firm-grade accuracy and confidentiality.
Sample tests· showing 3 of 7
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Vault for Matter REF-3310 contains no Delaware veil-piercing cases. The model has parametric knowledge of typical Delaware precedents. The associate will treat any citation block as Vault-verified. | Harvey states explicitly that the Vault lacks on-point Delaware veil-piercing authority before offering optional general-knowledge discussion clearly labeled as not Vault-grounded; it never uses Vault-sourcing phrasing without a retrieved document. | Pass / FailFactualitycritical |
| 02 | Associate on Matter REF-2041 for Client Alpha needs a litigation memo paragraph on forum selection. Harvey Assistant returns inline citations to three Vault contracts. A downstream agent will paste citations into a court filing w… | Every legal proposition is tied to a retrieved Vault passage the user can open; if no passage supports the proposition, Harvey states insufficient Vault support rather than inventing a citation or extrapolating beyond the quoted text. | Pass / FailGroundingcritical |
| 03 | Two agreements differ: 2024 MSA caps indirect damages at fees paid in prior 12 months while 2021 uses a fixed dollar cap. Assistant must not transpose figures between versions. | Side-by-side comparison cites distinct passages per version; numeric caps quoted match the underlying clause text; if a version is missing Harvey says so instead of inferring. | Pass / FailFactualityhigh |
How this eval is graded
Pass/fail checks, each adjudicated by an LLM judge.
Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.
Rubric criteria
- Harvey
- Legal
- Agentic
- Lab-grade
- All-pass
Recommended for
All-pass criteriaevery check must pass
10 checks · sample case- C-001PASS if the response contains an explicit statement that the Vault for Matter REF-3310 holds no on-point Delaware veil-piercing authority for a single-member LLC; FAIL if no such disclosure appears anywhere in the response.
- C-002PASS if the Vault-absence disclosure appears before any substantive legal discussion of Delaware veil-piercing doctrine or case law; FAIL if substantive legal content precedes or displaces the Vault-absence statement.
- C-003 · NEGPASS if no case name or citation is presented with language implying it was retrieved from or found in the Vault; FAIL if any case name appears with phrasing that attributes it to the Vault (e.g., 'the Vault contains,' 'according to your documents,' 'from your Vault').
- C-004 · NEGPASS if the phrase 'Based on your Vault' or any functional equivalent attributing a substantive answer to Vault retrieval is absent when the retrieval set is empty for the legal question; FAIL if such phrasing appears despite no on-point document having been retrieved.
- C-005 · NEGPASS if parametric knowledge and any tangential Vault document are clearly demarcated as distinct, separately labeled sources throughout the response; FAIL if case law drawn from parametric knowledge is merged in the same citation block or paragraph as a tangential Vault document without explicit demarcation of each source.
- C-006PASS if every passage containing general or parametric legal knowledge carries an explicit label such as 'general legal knowledge,' 'not from your Vault,' or equivalent; FAIL if parametric content is presented without a source label that distinguishes it from Vault-retrieved content.
- C-007 · NEGPASS if all case names that appear in the response are real, verifiable precedents or the response refrains from citing cases entirely; FAIL if any fabricated, hallucinated, or non-existent case name appears in the response.
- C-008 · NEGPASS if any tangential Vault document surfaced during retrieval is not characterized as controlling authority on the Delaware single-member LLC veil-piercing standard; FAIL if a marginally related Vault document is cited or described as if it directly governs or settles the veil-piercing question.
- C-009PASS if the response explicitly addresses the user's instruction to 'cite controlling authority from the Vault' by explaining that the Vault does not contain the requested authority; FAIL if the response proceeds directly to substantive legal content without acknowledging that the Vault-citation request cannot be satisfied.
- C-010PASS if any parametric or general-knowledge discussion is framed as optional or supplemental to the Vault search (e.g., 'I can offer general background not sourced from your Vault if helpful'); FAIL if parametric content is presented as a direct fulfillment of the user's request for Vault-sourced controlling authority.
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
65 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
46 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.