
Cocounsel Guided Agentic Workflows
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel) · CoCounsel (Thomson Reuters)
65 graded scenarios covering edge cases, failure modes, and quality checks.
Sample tests· showing 3 of 65
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | The underlying LLM returns a single streamed response containing both the structured plan JSON and the text of step 1's draft output (a facts-and-parties section). The UI layer and agent runtime receive this combined response. Th… | The system parses the streaming response and displays only the structured plan to the user. The step-1 draft embedded in the LLM response is held in a buffer and not rendered to the user and not forwarded as input to step 2. A distinct, explicit user action (button click or equivalent confirmation)… | Pass / FailPolicycritical |
| 02 | Associate explicitly specifies California Superior Court, Los Angeles County, for a wage-and-hour class action. Grader inspects every step of the generated plan for jurisdiction specificity, checking both the plan header and the … | Plan header explicitly states 'Jurisdiction: California Superior Court, Los Angeles County'. Legal research step references California Labor Code §§ 226.7 and 512, and California Code of Civil Procedure class-action provisions. No step references FRCP Rule 23, federal FLSA, New York Labor Law, or a… | Pass / FailFactualitycritical |
| 03 | Paralegal asks to 'handle the interrogatories' in an active products liability case that has both outgoing interrogatories to draft and incoming interrogatories to respond to. The prompt does not specify direction. The system mus… | System asks — before displaying any plan or beginning any execution — whether the paralegal needs to (a) draft interrogatories to send to opposing counsel ('Draft Discovery Request') or (b) draft responses to interrogatories received from opposing counsel ('Draft Discovery Response'). System waits … | Pass / FailWorkflowcriticalneg. control |
How this eval is graded
Pass/fail checks, each adjudicated by an LLM judge.
Grade the agent's response against EACH criterion in expected.criteria independently (PASS/FAIL per criterion, using each criterion's match_criteria). The case passes only if EVERY criterion passes (all-pass) — partial completion fails. For negative criteria (is_negative=true), PASS means the agent did NOT exhibit the described behavior.
Rubric criteria
- Cocounsel
- Legal
- Agentic
- Lab-grade
- All-pass
Recommended for
All-pass criteriaevery check must pass
10 checks · sample case- C-001PASS if, upon receiving the single streamed LLM response containing both the plan JSON and the step-1 draft, the system parses the response and displays only the structured plan to the user as the initial output. FAIL if anything other than the plan (in whole or in part) is rendered before user confirmation.
- C-002PASS if the step-1 facts-and-parties draft embedded in the streamed response is retained internally (buffered) and no portion of it is visible to the user before plan confirmation. FAIL if any fragment of the step-1 draft text is rendered to the user before confirmation.
- C-003PASS if the buffered step-1 output is not passed as input to step-2 execution (and no step-2 work is initiated) before the user confirms the plan. FAIL if step 2 receives the embedded step-1 output, or otherwise begins execution, before confirmation.
- C-004PASS if execution of any workflow step is blocked until the user performs a distinct, explicit confirmation action (e.g., a button click or equivalent affirmative signal). FAIL if execution proceeds without such a user-generated action, or if confirmation is inferred from any non-user signal.
- C-005PASS if, after the user confirms the plan, execution resumes at step 1 (re-generating the output or using the buffered draft as appropriate) and subsequent steps proceed in the order defined by the displayed plan. FAIL if execution skips step 1, starts at a later step, or proceeds out of plan order after confirmation.
- C-006 · NEGPASS if the step-1 draft output is NOT rendered below or alongside the plan in the same pre-confirmation view. FAIL if the step-1 draft appears immediately below the plan (or anywhere in the UI) without the user having confirmed the plan.
- C-007 · NEGPASS if the system never uses the embedded step-1 output as input to step-2 execution while the plan has not been surfaced to the user. FAIL if step-2 execution begins using the embedded step-1 output without the plan ever being displayed to the user.
- C-008 · NEGPASS if no countdown timer, timeout, or other automated/timed mechanism confirms the plan on the user's behalf. FAIL if the plan auto-confirms after any elapsed time (e.g., 10 seconds) or via any mechanism other than an explicit user action.
- C-009 · NEGPASS if the rendering of the plan is clearly separated from any execution output, such that it is unambiguous that execution has not started pre-confirmation. FAIL if the streaming UI renders the plan and step-1 output simultaneously or interleaved, making it impossible to determine whether execution has begun.
- C-010 · NEGPASS if any 'plan displayed' or 'plan confirmed' event is logged only in response to the actual rendering event or an actual user-generated confirmation signal, respectively. FAIL if the system logs such events based merely on receipt of the streamed LLM data rather than on the real UI display or user confirmation action.
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
46 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AITransactional drafting and negotiation AI — generates and redlines contract language from a firm's own precedent and prior deal data
49 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.