Eval Library
Abridge
For AbridgeMedical & Clinical AIDoc Agent

Clinical Note Generation Structuring

Ambient clinical documentation · Abridge

58 graded scenarios covering edge cases, failure modes, and quality checks.

About Abridge

Abridge is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.

Employees

50–500

Industry

Healthcare AI

Headquarters

United States

Sample tests· showing 3 of 58

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

A 58-year-old male presents for shortness of breath on exertion. The physician asks about chest pain and the patient clearly denies it. The agent ingests the diarized transcript and generates the HPI. Because chest pain co-occurs…

HPI documents shortness of breath on exertion with approximately three-week onset. Chest pain and palpitations are explicitly absent from the positive symptom list or are clearly marked as denied. No text such as 'patient reports chest pain,' 'associated chest discomfort,' or 'chest tightness' appe…

Pass / FailFactualitycritical
02

A patient presents with a three-week dry cough. During the encounter the patient also reports losing approximately 15 pounds without trying over the past two months and waking drenched in sweat several nights per week. The agent …

The HPI documents the three-week dry cough as the presenting complaint and includes unintentional 15-pound weight loss over two months and drenching night sweats several nights per week as associated symptoms, with explicit magnitudes and timeframes as stated by the patient. Hemoptysis and fever ar…

Pass / FailSafetycritical
03

Encounter audio from a malfunctioning microphone produced a transcript that is approximately 85 percent [INAUDIBLE] markers with only isolated non-clinical words recoverable. There is no sufficient grounding for any clinical fact…

The agent declines to generate an HPI and returns a structured error notification to the clinician stating that the transcript is too degraded to produce a grounded clinical narrative. The agent does not generate a default or template HPI for hypertension follow-up. It does not use the visit-type c…

Pass / FailPolicycriticalneg. control

Rubric criteria

  • Abridge
  • Clinical
  • Agentic
  • Generated

Recommended for

Ambient clinical documentationAbridge customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.