For Glass HealthMedical & Clinical AISearch Qna

Speaker Diarization Multi Speaker Attribution

Diagnostic reasoning + ambient scribe (combined platform) · Glass Health

58 graded scenarios covering edge cases, failure modes, and quality checks.

About Glass Health

Glass Health is an AI ambient-scribing and clinical-decision-support platform. It supports encounter documentation, clinical questions, differential diagnosis, and treatment-plan drafting using medical guidelines and literature.

Industry

Healthcare AI / Clinical Decision Support

Website

glass.health

Sample tests· showing 3 of 58

Pass/fail checks, each adjudicated by an LLM judge.

#	Input	Expected behavior	Check
01	Patient arrived 5 minutes early and began describing symptoms to the triage nurse while recording was already active. First 31 seconds of audio capture the patient saying 'I've been having chest tightness for two weeks, worse wit…	Agent assigns PATIENT to speaker_A based on first-person symptom self-report framing, subjective complaint structure, and duration/severity quantification. When speaker_B enters at 0:32 and issues an open-ended intake prompt, agent confirms speaker_B = CLINICIAN. Chest tightness with exertion appea…	Pass / FailSafetycritical
02	The system incorrectly assigned CLINICIAN to the patient at utterance 1 (patient spoke first). Over utterances 2–13, the 'patient' cluster accumulates: first-person self-report ('my pain is a 7'), lay hedging ('I'm not sure but..…	Agent fires a global retroactive role flip: reassigns all 12 prior utterances from their current (wrong) labels to corrected labels. Re-processes every SOAP field that drew from inverted utterances, moving content from Assessment back into HPI and vice versa as appropriate. Logs each reassigned utt…	Pass / FailSafetycritical
03	The patient, a registered nurse, describes her chief complaint using precise clinical language from utterance 1: 'I've had intermittent paroxysmal supraventricular tachycardia episodes — heart rate around 180 — no syncope, no dia…	Agent assigns PATIENT to speaker_A based on: (a) first-person ownership framing ('I've had'), (b) self-report symptom structure with onset and associated-symptom enumeration, (c) turn-taking position as respondent to an implicit intake prompt, not as an issuer of questions or directives. Agent does…	Pass / FailSafetycritical
Unlock full benchmark 55 more test cases Use this benchmark

How this eval is graded

Pass/fail checks, each adjudicated by an LLM judge.

Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

Glass Health
Clinical
Agentic
Generated

Recommended for

Diagnostic reasoning + ambient scribe (combined platform)Glass Health customers

Works with

Glass Health

Related evals

Medical & Clinical AI

Ambient clinical documentation

49 graded scenarios covering edge cases, failure modes, and quality checks.

View Medical & Clinical AI

Ambient clinical documentation

58 graded scenarios covering edge cases, failure modes, and quality checks.

View Medical & Clinical AI

Ambient clinical documentation

56 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Speaker Diarization Multi Speaker Attribution eval for Glass Health Diagnostic reasoning + ambient scribe (combined platform) test?+

58 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Speaker Diarization Multi Speaker Attribution eval scored?+

Pass/fail checks, each adjudicated by an LLM judge. The judge rubric: Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain. A criterion passes at a judge score of 4 or higher.

How many test cases does this eval pack include?+

The Speaker Diarization Multi Speaker Attribution pack for Glass Health Diagnostic reasoning + ambient scribe (combined platform) contains 58 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Speaker Diarization Multi Speaker Attribution pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.