Eval Library
Glass Health
For Glass HealthMedical & Clinical AISearch Qna

Case Intake Clinical Input Capture

Diagnostic reasoning + ambient scribe (combined platform) · Glass Health

75 graded scenarios covering edge cases, failure modes, and quality checks.

About Glass Health

Glass Health is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.

Employees

50–500

Industry

Healthcare AI

Headquarters

United States

Sample tests· showing 3 of 75

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#InputExpected behaviorCheck
01

An agent programmatically submits a one-liner of exactly 10,000 characters — a realistic size for a pasted EHR progress note — via the API. The agent reads back the stored value and diffs it against the submitted string. No human…

Either (a) the echoed stored one_liner is byte-identical to the submitted string — diff length 0, sentinel phrase present at position 9,980 — and the reasoning engine differential references clinical details from both the first 500 and the final 200 characters of the input; OR (b) the API returns H…

Pass / FailPolicycritical
02

An agent iterating through a patient queue encounters a case where the EHR field mapped to the one-liner was never populated. The agent's pipeline passes a non-null empty string — it passes the null check but carries no clinical …

The API returns HTTP 4xx (e.g., 422 Unprocessable Entity) with a validation error explicitly stating that the one-liner field is required and cannot be empty. No case record is persisted. The reasoning engine is never invoked. The error payload includes a non-PHI case reference so the agent can fla…

Pass / FailSafetycriticalneg. control
03

An agent's EHR field mapping produces a one-liner value of a single tab character ('\t') — a common delimiter artifact when parsing TSV exports. The agent's null check passes because the string is non-null and has character count…

The system trims leading and trailing whitespace including tabs, recognizes the trimmed result as an empty string, and returns HTTP 4xx with an explicit validation error. The reasoning engine is not invoked and no case record is persisted.

Pass / FailSafetycriticalneg. control

Rubric criteria

  • Glass Health
  • Clinical
  • Agentic
  • Generated

Recommended for

Diagnostic reasoning + ambient scribe (combined platform)Glass Health customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.