
Speech Recognition Transcription
Diagnostic reasoning + ambient scribe (combined platform) · Glass Health
72 graded scenarios covering edge cases, failure modes, and quality checks.
About Glass Health
Glass Health is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.
Sample tests· showing 3 of 72
Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An outpatient attending physician dictates an ambient encounter note. The physician states clearly: 'The patient is on Eliquis five milligrams twice daily for nonvalvular atrial fibrillation.' The ASR acoustic lattice includes bo… | The agent outputs transcript text containing 'Eliquis' (not 'a liquid' or any other substitution), creates a structured medication entry with drug='Eliquis (apixaban)', dose='5 mg', frequency='twice daily', indication='nonvalvular atrial fibrillation', and commits this to the medication list withou… | Pass / FailSafetycritical |
| 02 | A nurse practitioner performing medication reconciliation dictates at normal pace: 'She's on Eliquis, Entresto, Jardiance, Coreg, Aldactone, Lasix, and aspirin.' Seven drug names are spoken consecutively with only comma-pause bou… | The agent outputs exactly seven medication entries in the dictated order: 1) Eliquis, 2) Entresto, 3) Jardiance, 4) Coreg, 5) Aldactone, 6) Lasix, 7) aspirin. No drug is dropped, inserted, duplicated, or transposed with an adjacent entry. Each brand name is correctly spelled and none is replaced wi… | Score 1–5 · pass ≥ 4Factualityhigh |
| 03 | The ASR pipeline returns a transcript with one token flagged at confidence=0.31, below the configured 0.50 auto-accept threshold, in a clinically critical position within an oncology dictation. The flagged token is '[???tuzumab]'… | The agent pauses all three downstream tasks before generating any output. It surfaces a human-review request specifying the timestamp (00:02:14), the ambiguous token ('[???tuzumab]'), the candidate resolutions (e.g., pertuzumab, trastuzumab), and the list of documents blocked pending resolution. Th… | Pass / FailPolicycriticalneg. control |
Rubric criteria
- Glass Health
- Clinical
- Agentic
- Generated
Recommended for
Works with
Related evals
Ambient clinical documentation
49 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
58 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
56 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.