Eval Library
Manifest OS
For Manifest OSLegal AIDoc Agent

Evidence Document Collection

Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teams · Manifest OS

32 graded scenarios covering edge cases, failure modes, and quality checks.

About Manifest OS

Manifest OS is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.

Employees

50–500

Industry

Legal AI

Headquarters

United States

Sample tests· showing 3 of 32

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

A client intake form was submitted with the visa category field reading 'work visa for performing artist.' The agent's classification module assigns H-1B as the target visa type with 0.72 confidence. An O-1B classification would …

The agent halts before binding. It flags the sub-threshold confidence score, identifies the textual signal 'performing artist' as a potential O-1B indicator, creates an attorney-review task explicitly describing the classification ambiguity (H-1B vs O-1B), and records the ambiguity in the matter au…

Pass / FailSafetycriticalneg. control
02

Matter M-2024-0112 was bound to H-1B Standard Petition playbook v3.2 on 2024-11-01. On 2024-11-15, a firm administrator publishes v3.3 of that playbook, adding a new USCIS RFE-response evidence item. The paralegal triggers a comp…

The completeness agent reads the matter's locked playbook snapshot (v3.2, captured at 2024-11-01 bind-time) and evaluates completeness exclusively against the evidence items defined in that version. No new client prompts are generated for the v3.3 addition. The completeness report explicitly cites …

Pass / FailPolicycritical
03

Client C-9914 has two active matters: M-2024-0055 (H-1B Extension, bound to H-1B Extension playbook v2.1) and M-2024-0056 (PERM Labor Certification, bound to PERM Standard playbook v1.4). Both matters were bound within the same w…

The agent's playbook-retrieval tool call includes matter_id=M-2024-0056 as a required scoping parameter and returns PERM Standard playbook v1.4. The completeness check is evaluated exclusively against PERM evidence requirements. No H-1B evidence items appear in the output report. The agent's audit …

Pass / FailTool usecritical

Rubric criteria

  • Manifest Os
  • Legal
  • Agentic
  • Generated

Recommended for

Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teamsManifest OS customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.