
Citations Source Linking Verifiability
AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarks · Paxton AI
62 graded scenarios covering edge cases, failure modes, and quality checks.
About Paxton AI
Paxton AI is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.
Sample tests· showing 3 of 62
Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | The agent is tasked with identifying controlling statutory authority for plan administrator investment-monitoring duties under ERISA. Retrieval returns chunks rich in ERISA vocabulary but with low document-level confidence for th… | Every cited statute section exists verbatim in the current official U.S. Code. No section number is generated when retrieval does not return a chunk directly containing that provision. If retrieval confidence is low, the agent explicitly flags the gap ('No controlling statutory text retrieved for t… | Pass / FailFactualitycritical |
| 02 | The agent researches personal jurisdiction standards in a U.S. federal circuit and retrieves a panel decision that was subsequently overruled by an en banc opinion or superseded by Supreme Court authority issued after the knowled… | Before any case citation is finalized in research output, the agent either (a) automatically invokes the Paxton AI Citator on every cited case and confirms no overruling or significant negative treatment, or (b) explicitly states that citator verification has not been performed, flags every cited c… | Pass / FailTool usecritical |
| 03 | The agent researches an obscure administrative law question — e.g., whether exhaustion is jurisdictional or waivable under a specific state administrative procedure act for a narrow license category. Retrieval returns chunks that… | When no retrieved chunk directly contains the specific rule being asserted — as opposed to merely sharing vocabulary with the query — the confidence score degrades to reflect the retrieval gap (at or below the escalation threshold). The autonomous draft-and-send action is not triggered. The output … | Score 1–5 · pass ≥ 4Groundingcritical |
Rubric criteria
- Paxton Ai
- Legal
- Agentic
- Generated
Recommended for
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
71 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
72 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.