For Manifest OSLegal AIDoc Agent

Workflow Orchestration Agent Loop Control

Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teams · Manifest OS

16 graded scenarios covering edge cases, failure modes, and quality checks.

About Manifest OS

Manifest OS provides an AI-native operating model for its law-firm partners, combining a unified brand, centralized back-office services, and software for client communication, legal research, drafting, billing, and reporting.

Industry

Legal Technology / Law-Firm Operating Platform

Website

manifestos.com

Sample tests· showing 3 of 16

Pass/fail checks, each adjudicated by an LLM judge.

#	Input	Expected behavior	Check
01	Matter M-8821 has a petition-drafting task (task-C-draft) blocked on two prerequisites: (A) evidence package upload and (B) intake form completion. Both prerequisites complete on separate workers within a 50ms window. The orchest…	The orchestrator enqueues task-C-draft exactly once. The dispatch event log for M-8821 contains a single DISPATCHED record for task-C-draft. The agent queue contains exactly one message for task-C-draft. The idempotency mechanism (e.g., conditional write on a dispatched flag) prevents a second enqu…	Pass / FailWorkflowcritical
02	Evidence collection task task-evidence-M5503 for matter M-5503 was marked COMPLETE in the in-memory completion cache at T=0ms. At T=5ms, the worker rolled back the task due to an external evidence-API failure and updated the DB r…	The dispatcher performs a synchronous read of task-evidence-M5503 status from the authoritative database (not the in-memory cache) immediately before evaluating dispatch eligibility. It reads status=FAILED, does not enqueue task-draft-petition-M5503, emits a BLOCKED_ON_FAILURE event for task-draft-…	Pass / FailPolicycritical
03	Matter M-7712 requires an H-1B petition draft. The task type key is ai_drafting:petition:H1B. A capability registry update pushed 2 hours ago reassigned that key from the AI Case Evaluator endpoint to the AI Drafter endpoint. The…	Before enqueuing, the dispatcher calls the live agent capability registry to resolve the current canonical agent endpoint for task type ai_drafting:petition:H1B. It receives the AI Drafter endpoint, confirms it differs from the cached entry, invalidates the stale cache entry, and dispatches to the …	Pass / FailTool usecritical
Unlock full benchmark 13 more test cases Use this benchmark

How this eval is graded

Pass/fail checks, each adjudicated by an LLM judge.

Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

Manifest Os
Legal
Agentic
Generated

Recommended for

Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teamsManifest OS customers

Works with

Manifest OS

Related evals

Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

6 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

65 graded scenarios covering edge cases, failure modes, and quality checks.

View Legal AI

Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)

46 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Workflow Orchestration Agent Loop Control eval for Manifest OS Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teams test?+

16 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Workflow Orchestration Agent Loop Control eval scored?+

Pass/fail checks, each adjudicated by an LLM judge. The judge rubric: Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain. A criterion passes at a judge score of 4 or higher.

How many test cases does this eval pack include?+

The Workflow Orchestration Agent Loop Control pack for Manifest OS Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teams contains 16 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Workflow Orchestration Agent Loop Control pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.