For Paxton AILegal AIDoc AgentCorrectnessAnswer Relevance

Legal Research Q A Contextual Research With Jurisdictional Coverage

AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarks · Paxton AI

47 graded scenarios covering edge cases, failure modes, and quality checks.

About Paxton AI

Paxton AI is a generative-AI legal assistant designed for legal operations. Its product supports legal research, drafting, document analysis, and other attorney workflows.

Industry

Legal Technology / Legal AI

Website

www.paxton.ai

Sample tests· showing 3 of 47

Pass/fail and graded 1–5 by an LLM judge, depending on the test.

#	Input	Expected behavior	Check
01	An agent is tasked with researching service-of-process rules. The query mentions 'the Northern District' — local shorthand used by practitioners in a specific metro area — without specifying state or the court's full name. There …	The system either (a) identifies the ambiguity — there are multiple 'Northern District' federal courts — and prompts the agent to confirm which court is intended before returning any substantive content, or (b) if contextual signals justify a specific interpretation, names the assumed court explici…	Pass / FailPolicycriticalneg. control
02	An employment attorney's agent receives a task to generate a client alert on notice obligations for a planned reduction in force. The attorney's client employs approximately 80 full-time workers in California and plans to elimina…	The response clearly distinguishes and separately labels two regimes: (1) California's state WARN Act (Cal. Labor Code §§ 1400–1408 or the current codification) — stating the applicable employer-size threshold, covered-employee count threshold, notice period, and recipient list under state law; and…	Pass / FailFactualitycritical
03	An attorney's AI agent receives a drafting task containing a legal research subtask. No jurisdiction has been set in session context. The query uses 'our state' without any prior conversation establishing which state is relevant.…	The system refuses to answer the substantive question and emits a clarification prompt asking the user to specify which state's law applies. It does not silently default to federal law, a statistically common state, or any last-accessed jurisdiction. The response makes clear that jurisdiction is a …	Pass / FailPolicycriticalneg. control
Unlock full benchmark 44 more test cases Use this benchmark

How this eval is graded

Pass/fail and graded 1–5 by an LLM judge, depending on the test.

Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

Rubric criteria

Paxton Ai
Legal
Agentic
Lab-grade
All-pass

Recommended for

AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarksPaxton AI customers

All-pass criteriaevery check must pass

10 checks · sample case

C-001PASS if: the response explicitly acknowledges that 'the Northern District' is ambiguous (multiple federal districts bear that name) OR explicitly names one specific assumed court, and this disclosure appears BEFORE any substantive local-rule content. FAIL if: substantive local-rule content appears before any jurisdiction-identification disclosure, or no disclosure is made at all.
C-002PASS if: the response either (a) asks the user/agent to confirm which Northern District court is intended before answering substantively, or (b) names the assumed court explicitly in the first sentence of the response and invites confirmation or correction. FAIL if: the response neither asks for clarification nor both names the assumed court in the first sentence and invites correction.
C-003PASS if: when the response proceeds under an assumed court rather than asking for clarification, it states or implies a contextual basis for the assumption (e.g., prior conversation cues, user-provided facts), or no assumption is made because clarification was requested. FAIL if: the response proceeds under an assumed court while citing no contextual basis whatsoever for choosing that court over other Northern Districts.
C-004 · NEGPASS if: the response does NOT present local rules of a single specific Northern District court (e.g., N.D. Cal., N.D. Ill., N.D. Tex.) as the answer without flagging the multi-district ambiguity or naming the assumption. FAIL if: local rules for one specific court are returned as if the intended court were obvious, with no ambiguity flag or named assumption.
C-005 · NEGPASS if: any content that mixes a specific district's local rules with the Federal Rules of Civil Procedure clearly labels which provisions are local rules and which are FRCP, or no such mixing occurs. FAIL if: the response blends a Northern District court's local rules with FRCP provisions without distinguishing which is which.
C-006 · NEGPASS if: the response does NOT present generally applicable FRCP electronic-service provisions (e.g., Rule 5 content) under a heading or framing claiming they are 'Northern District' local rules. FAIL if: generic FRCP content is labeled or framed as the local rules of a Northern District court.
C-007 · NEGPASS if: any jurisdiction assumption or ambiguity flag appears in the primary response text where a downstream agent or attorney reading the answer would see it (e.g., the opening sentence or a prominent leading statement). FAIL if: the assumption or ambiguity flag appears only in a footer, disclaimer block at the end, citation metadata, or other field separate from the main answer body.
C-008PASS if: when flagging ambiguity, the response makes clear that more than one federal 'Northern District' court exists (e.g., by saying multiple districts share the name or by listing examples), rather than using vague hedging like 'rules may vary'. FAIL if: the response uses only generic hedging without conveying that 'Northern District' maps to multiple distinct courts.
C-009 · NEGPASS if: the response does NOT assert specific local-rule numbers, deadlines, or e-service requirements as authoritative for a court whose identity has not been confirmed or explicitly assumed in the response. FAIL if: specific rule citations or requirements are asserted as the governing local rules while the court identity remains unconfirmed and unstated.
C-010PASS if: the response, in addition to handling the ambiguity, offers a constructive path forward (e.g., asks which state/court, lists candidate Northern Districts to choose from, or offers to proceed once confirmed). FAIL if: the response refuses or stalls without giving the agent any actionable way to resolve which court is intended.

Works with

Paxton AI

Related evals

Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

6 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

65 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

46 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Legal Research Q A Contextual Research With Jurisdictional Coverage eval for Paxton AI AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarks test?+

47 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Legal Research Q A Contextual Research With Jurisdictional Coverage eval scored?+

Pass/fail and graded 1–5 by an LLM judge, depending on the test. The judge rubric: Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.

How many test cases does this eval pack include?+

The Legal Research Q A Contextual Research With Jurisdictional Coverage pack for Paxton AI AI legal assistant — research, drafting, contract review, and deep research with linked citations; publishes its own accuracy / citation benchmarks contains 47 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Legal Research Q A Contextual Research With Jurisdictional Coverage pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.