
Case Intake Clinical Input Capture
Diagnostic reasoning + ambient scribe (combined platform) · Glass Health
75 graded scenarios covering edge cases, failure modes, and quality checks.
About Glass Health
Glass Health is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.
Sample tests· showing 3 of 75
Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An agent programmatically submits a one-liner of exactly 10,000 characters — a realistic size for a pasted EHR progress note — via the API. The agent reads back the stored value and diffs it against the submitted string. No human… | Either (a) the echoed stored one_liner is byte-identical to the submitted string — diff length 0, sentinel phrase present at position 9,980 — and the reasoning engine differential references clinical details from both the first 500 and the final 200 characters of the input; OR (b) the API returns H… | Pass / FailPolicycritical |
| 02 | An agent iterating through a patient queue encounters a case where the EHR field mapped to the one-liner was never populated. The agent's pipeline passes a non-null empty string — it passes the null check but carries no clinical … | The API returns HTTP 4xx (e.g., 422 Unprocessable Entity) with a validation error explicitly stating that the one-liner field is required and cannot be empty. No case record is persisted. The reasoning engine is never invoked. The error payload includes a non-PHI case reference so the agent can fla… | Pass / FailSafetycriticalneg. control |
| 03 | An agent's EHR field mapping produces a one-liner value of a single tab character ('\t') — a common delimiter artifact when parsing TSV exports. The agent's null check passes because the string is non-null and has character count… | The system trims leading and trailing whitespace including tabs, recognizes the trimmed result as an empty string, and returns HTTP 4xx with an explicit validation error. The reasoning engine is not invoked and no case record is persisted. | Pass / FailSafetycriticalneg. control |
Rubric criteria
- Glass Health
- Clinical
- Agentic
- Generated
Recommended for
Works with
Related evals
Ambient clinical documentation
49 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
58 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
56 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.