
Vlex Multi Jurisdictional Research 50 State Survey Compare Jurisdictions
Global legal research platform with the Vincent AI agent — multi-step research, document analysis, drafting, and workflow building over a 1B+ document global law collection · vLex (Vincent AI)
32 graded scenarios covering edge cases, failure modes, and quality checks.
Sample tests· showing 3 of 32
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An attorney submits a single natural-language query containing two orthogonal legal sub-questions — statute of limitations and damage caps for medical malpractice — each anchoring different substantive doctrine and requiring inde… | The parsing layer identifies two distinct legal propositions: (1) the applicable statute of limitations for medical malpractice claims — a procedural question driven by state civil-practice statutes and tolling rules — and (2) statutory damage caps on medical malpractice recoveries — a substantive … | Pass / FailWorkflowcritical |
| 02 | An agent executing a multi-step agentic workflow issues a 'clear all' action to reset a prior jurisdiction selection before re-selecting for a new matter. Due to a sequencing error, the agent triggers the 'Run Analysis' action wh… | The 'Run Analysis' button is visibly disabled when zero jurisdictions are selected, or a blocking inline validation fires on click with a message that explicitly names the minimum requirement (e.g., 'Please select at least 2 jurisdictions to compare'). The analysis engine is never invoked. The agen… | Pass / FailWorkflowhighneg. control |
| 03 | A transactional attorney uploads a contract and instructs the agent to 'compare in the states most relevant to this contract.' The contract names Delaware (state of incorporation, §12), California (governing law, §14.2), and New … | Before touching the Compare Jurisdictions selector, the agent outputs a pre-execution disclosure that names each specific jurisdiction it intends to select and the clause or rationale supporting each choice — for example: 'I plan to select Delaware [§12, state of incorporation], California [§14.2, … | Pass / FailPolicycriticalneg. control |
How this eval is graded
Pass/fail checks, each adjudicated by an LLM judge.
Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.
Rubric criteria
- Vlex
- Legal
- Agentic
- Lab-grade
- All-pass
Recommended for
All-pass criteriaevery check must pass
10 checks · sample case- C-001PASS if the system identifies the compound query as containing two orthogonal sub-questions — (1) statute of limitations for medical malpractice and (2) statutory damage caps on medical malpractice recoveries — and processes each as an independent sub-survey with its own retrieval scope. FAIL if the query is treated as a single undifferentiated retrieval unit.
- C-002PASS if the final output contains two clearly labeled, structurally separated answer sets (e.g., two tables or two memo sections) — one exclusively covering statute-of-limitations doctrine and one exclusively covering damage caps. FAIL if the output presents a single merged structure without per-sub-question separation.
- C-003PASS if the output makes the two-way split of the original query apparent to the attorney (e.g., explicit section labels or a statement that the query was parsed into two sub-surveys), so the attorney can see how the compound query was interpreted. FAIL if the parsing decision is invisible and the attorney has no way to tell how the query was decomposed.
- C-004PASS if, for at least three sampled states, every cell in the statute-of-limitations section answers only the SOL question and every cell in the damage-caps section answers only the damages-cap question. FAIL if any sampled cell answers the wrong sub-question for its section.
- C-005 · NEGPASS if no individual state's answer cell conflates statute-of-limitations rules and damage-cap rules in a single undifferentiated answer. FAIL if any per-state cell contains a mixture of SOL and damages-cap content with no structural separation, such that an SOL figure could be read where a cap figure belongs.
- C-006 · NEGPASS if every jurisdiction's entry addresses both sub-questions in a consistent, predictable structure. FAIL if states inconsistently answer only one sub-question or the other with no predictable pattern across jurisdictions, indicating a single query string was dispatched without decomposition.
- C-007 · NEGPASS if both the statute-of-limitations sub-question and the damage-caps sub-question are present in the output. FAIL if either sub-question is silently absent, producing a clean-looking but incomplete 50-state survey with no acknowledgment of the omission.
- C-008 · NEGPASS if any drafted memo or narrative text attributes statute-of-limitations values and damage-cap values to the correct sub-question for each jurisdiction discussed. FAIL if drafted text asserts a specific SOL value in any jurisdiction where the underlying data is actually a damages-cap figure (or vice versa), inverting legal conclusions.
- C-009 · NEGPASS if the system either handles the compound query itself or, if it asks the attorney to clarify or re-submit, provides concrete guidance on how to re-phrase (e.g., suggesting the two separate sub-queries). FAIL if the system refuses to process the query and asks for two separate submissions without offering any re-phrasing guidance.
- C-010PASS if both the statute-of-limitations section and the damage-caps section each cover the full set of requested jurisdictions (all states), with no states present in one section but missing from the other. FAIL if either section omits jurisdictions that appear in the other section.
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
65 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
46 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.