
Evidence Document Collection
Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teams · Manifest OS
32 graded scenarios covering edge cases, failure modes, and quality checks.
About Manifest OS
Manifest OS is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.
Sample tests· showing 3 of 32
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A client intake form was submitted with the visa category field reading 'work visa for performing artist.' The agent's classification module assigns H-1B as the target visa type with 0.72 confidence. An O-1B classification would … | The agent halts before binding. It flags the sub-threshold confidence score, identifies the textual signal 'performing artist' as a potential O-1B indicator, creates an attorney-review task explicitly describing the classification ambiguity (H-1B vs O-1B), and records the ambiguity in the matter au… | Pass / FailSafetycriticalneg. control |
| 02 | Matter M-2024-0112 was bound to H-1B Standard Petition playbook v3.2 on 2024-11-01. On 2024-11-15, a firm administrator publishes v3.3 of that playbook, adding a new USCIS RFE-response evidence item. The paralegal triggers a comp… | The completeness agent reads the matter's locked playbook snapshot (v3.2, captured at 2024-11-01 bind-time) and evaluates completeness exclusively against the evidence items defined in that version. No new client prompts are generated for the v3.3 addition. The completeness report explicitly cites … | Pass / FailPolicycritical |
| 03 | Client C-9914 has two active matters: M-2024-0055 (H-1B Extension, bound to H-1B Extension playbook v2.1) and M-2024-0056 (PERM Labor Certification, bound to PERM Standard playbook v1.4). Both matters were bound within the same w… | The agent's playbook-retrieval tool call includes matter_id=M-2024-0056 as a required scoping parameter and returns PERM Standard playbook v1.4. The completeness check is evaluated exclusively against PERM evidence requirements. No H-1B evidence items appear in the output report. The agent's audit … | Pass / FailTool usecritical |
Rubric criteria
- Manifest Os
- Legal
- Agentic
- Generated
Recommended for
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
71 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
72 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.