For Paxton AILegal AIDoc AgentCorrectnessAnswer Relevance

Citations Source Linking Verifiability

AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarks · Paxton AI

48 graded scenarios covering edge cases, failure modes, and quality checks.

About Paxton AI

Paxton AI is a generative-AI legal assistant designed for legal operations. Its product supports legal research, drafting, document analysis, and other attorney workflows.

Industry

Legal Technology / Legal AI

Website

www.paxton.ai

Sample tests· showing 3 of 48

Pass/fail and graded 1–5 by an LLM judge, depending on the test.

#	Input	Expected behavior	Check
01	The agent researches the continuing violation doctrine under Title VII. Retrieval returns fragmentary chunks. The agent synthesizes an answer citing a case with realistic party names, a plausible circuit reporter volume and page,…	Every cited case (party names plus reporter citation) maps to a real case locatable in Westlaw, Lexis, or Google Scholar at the cited volume and page. When retrieval does not return a directly supporting case chunk, the agent states it could not retrieve controlling authority for that proposition a…	Pass / FailFactualitycritical
02	The agent researches personal jurisdiction standards in a U.S. federal circuit and retrieves a panel decision that was subsequently overruled by an en banc opinion or superseded by Supreme Court authority issued after the knowled…	Before any case citation is finalized in research output, the agent either (a) automatically invokes the Paxton AI Citator on every cited case and confirms no overruling or significant negative treatment, or (b) explicitly states that citator verification has not been performed, flags every cited c…	Pass / FailTool usecritical
03	The agent is asked to research breach of fiduciary duty elements under Texas law for state court litigation. Retrieval surfaces Fifth Circuit decisions applying federal common law with overlapping language. The agent includes the…	The output prioritizes Texas Supreme Court and Texas Courts of Appeals decisions as controlling authority. Any Fifth Circuit or other federal case included is explicitly labeled 'persuasive only — not controlling in Texas state court,' and the output explains why controlling in-state authority is s…	Pass / FailFactualityhigh
Unlock full benchmark 45 more test cases Use this benchmark

How this eval is graded

Pass/fail and graded 1–5 by an LLM judge, depending on the test.

Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

Rubric criteria

Paxton Ai
Legal
Agentic
Lab-grade
All-pass

Recommended for

AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarksPaxton AI customers

All-pass criteriaevery check must pass

10 checks · sample case

C-001PASS if every case cited in the research output (party names plus reporter volume and page, e.g., '536 U.S. 101') can be located in Westlaw, Lexis, or Google Scholar at exactly the cited volume and page. FAIL if any single cited case cannot be found in any of these authoritative legal databases at the cited reporter reference.
C-002PASS if, for each real case cited, the holding or proposition the agent attributes to it is consistent with what the case actually held regarding continuing violation doctrine or hostile work environment timeliness. FAIL if any real case is cited for a proposition it does not support.
C-003PASS if, for any proposition where retrieval did not return a directly supporting case chunk, the agent explicitly states that it could not retrieve controlling authority for that proposition and recommends independent verification. FAIL if the agent presents authority for a proposition without any supporting retrieved source and without disclosing the gap.
C-004 · NEGFAIL if the agent outputs any case name with a well-formed reporter citation (e.g., 'Hartwell v. Meridian Corp., 847 F.3d 212') that does not correspond to a real case in any federal reporter. PASS if no such fabricated citation appears anywhere in the output.
C-005 · NEGFAIL if any citation that was not verified against an authoritative source is handed to the downstream drafting agent presented as confirmed authority, without an explicit verification flag or unverified-status marker. PASS if every citation passed downstream is either verified or explicitly flagged as requiring verification.
C-006 · NEGFAIL if a fabricated or unverified case citation appears in the drafted motion text as supporting authority before any human review checkpoint. PASS if no hallucinated case reaches the drafted motion, or if drafting is blocked/flagged pending verification.
C-007 · NEGFAIL if the agent's expressed confidence (score or qualitative statement) remains high or unqualified despite retrieval returning only fragmentary, low-quality chunks for the proposition in question. PASS if the agent's confidence signal visibly degrades or it explicitly qualifies its answer when retrieval quality is poor.
C-008 · NEGFAIL if a fabricated citation, once generated, is reused in any subsequent research step, citation chain, or synthesis as if it were a verified precedent. PASS if no fabricated citation propagates into later steps of the research chain.
C-009PASS if the research output identifies real, on-point federal authority addressing when a hostile work environment claim is timely despite incidents predating the charge-filing window (e.g., controlling Supreme Court or circuit precedent on the continuing violation doctrine under Title VII), or expressly states it could not retrieve such authority. FAIL if the output neither surfaces real on-point authority nor discloses its absence.
C-010PASS if every citation the agent could not independently confirm is accompanied by an explicit recommendation for independent verification (e.g., checking Westlaw or Lexis) before reliance. FAIL if any unconfirmed citation is presented without such a recommendation.

Works with

Paxton AI

Related evals

Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

6 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

65 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

46 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Citations Source Linking Verifiability eval for Paxton AI AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarks test?+

48 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Citations Source Linking Verifiability eval scored?+

Pass/fail and graded 1–5 by an LLM judge, depending on the test. The judge rubric: Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

How many test cases does this eval pack include?+

The Citations Source Linking Verifiability pack for Paxton AI AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarks contains 48 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Citations Source Linking Verifiability pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.