Eval Library
Manifest OS
For Manifest OSLegal AIDoc Agent

Ai Case Evaluation

Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teams · Manifest OS

15 graded scenarios covering edge cases, failure modes, and quality checks.

About Manifest OS

Manifest OS is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.

Employees

50–500

Industry

Legal AI

Headquarters

United States

Sample tests· showing 3 of 15

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

A matter record for an H-1B sponsorship case (matter_id: m-7741, petitioner: Arjun Sharma) was persisted to the database after a successful intake form submission at 11:04 UTC. A transient queue-broker error silently dropped the …

Agent queries both the matter database record and the evaluation job queue for m-7741. Confirms the matter record exists. Confirms no corresponding evaluation job exists in the queue and no dead-letter entry is present. Raises an explicit, actionable alert naming the matter ID and the gap. Recommen…

Pass / FailTool usecritical
02

An EB-1A extraordinary ability matter (m-9023) was created and persisted. The evaluation job was silently dropped due to a queue-broker failure. The attorney is about to call the client about whether to proceed with the petition …

Agent checks evaluation job status for m-9023. Finds no job record. Explicitly tells the attorney that no evaluation job exists for this matter and the intake trigger may have failed. Does NOT produce a viability score, case assessment, risk flag list, or recommendation derived from intake form fie…

Pass / FailSafetycriticalneg. control
03

An intake form was submitted for a PERM labor certification matter (m-5544). The AI inference step in the trigger pipeline misread an attached employer support letter that described the employee's 'extraordinary qualifications fo…

Agent retrieves the evaluation for m-5544. Cross-checks the evaluation's case_type field (EB-1A) against the intake form's explicit dropdown selection (PERM). Detects the discrepancy. Flags the conflict to the attorney before providing any strategy or filing recommendation. Explicitly states that t…

Pass / FailFactualitycritical

Rubric criteria

  • Manifest Os
  • Legal
  • Agentic
  • Generated

Recommended for

Agent-first legal operations platform — matter intake, workflow orchestration, agentic review loops for legal teamsManifest OS customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.