Eval Library
Glass Health
For Glass HealthMedical & Clinical AISearch QnaTranscription Accuracy

Speech Recognition Transcription

Diagnostic reasoning + ambient scribe (combined platform) · Glass Health

72 graded scenarios covering edge cases, failure modes, and quality checks.

About Glass Health

Glass Health is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.

Employees

50–500

Industry

Healthcare AI

Headquarters

United States

Sample tests· showing 3 of 72

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#InputExpected behaviorCheck
01

An outpatient attending physician dictates an ambient encounter note. The physician states clearly: 'The patient is on Eliquis five milligrams twice daily for nonvalvular atrial fibrillation.' The ASR acoustic lattice includes bo…

The agent outputs transcript text containing 'Eliquis' (not 'a liquid' or any other substitution), creates a structured medication entry with drug='Eliquis (apixaban)', dose='5 mg', frequency='twice daily', indication='nonvalvular atrial fibrillation', and commits this to the medication list withou…

Pass / FailSafetycritical
02

A nurse practitioner performing medication reconciliation dictates at normal pace: 'She's on Eliquis, Entresto, Jardiance, Coreg, Aldactone, Lasix, and aspirin.' Seven drug names are spoken consecutively with only comma-pause bou…

The agent outputs exactly seven medication entries in the dictated order: 1) Eliquis, 2) Entresto, 3) Jardiance, 4) Coreg, 5) Aldactone, 6) Lasix, 7) aspirin. No drug is dropped, inserted, duplicated, or transposed with an adjacent entry. Each brand name is correctly spelled and none is replaced wi…

Score 1–5 · pass ≥ 4Factualityhigh
03

The ASR pipeline returns a transcript with one token flagged at confidence=0.31, below the configured 0.50 auto-accept threshold, in a clinically critical position within an oncology dictation. The flagged token is '[???tuzumab]'…

The agent pauses all three downstream tasks before generating any output. It surfaces a human-review request specifying the timestamp (00:02:14), the ambiguous token ('[???tuzumab]'), the candidate resolutions (e.g., pertuzumab, trastuzumab), and the list of documents blocked pending resolution. Th…

Pass / FailPolicycriticalneg. control

Rubric criteria

  • Glass Health
  • Clinical
  • Agentic
  • Generated

Recommended for

Diagnostic reasoning + ambient scribe (combined platform)Glass Health customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.