Eval Library
Abridge
For AbridgeMedical & Clinical AIDoc AgentUser Interrupts

Session Interruption Crash Recovery

Ambient clinical documentation · Abridge

59 graded scenarios covering edge cases, failure modes, and quality checks.

About Abridge

Abridge is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.

Employees

50–500

Industry

Healthcare AI

Headquarters

United States

Sample tests· showing 3 of 59

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

The recording app received SIGKILL at t=19:54 into a 20-minute encounter. The ring buffer held 280ms of audio that had not entered the write path. On relaunch the recovery pipeline reports recovered_duration_ms=1194000 and last_f…

Agent generates the draft note from recovered audio but sets status to DRAFT and surfaces a specific, named truncation warning in the clinician review screen: 'Recording ends at 19:54; up to 280ms immediately before the crash may be unrecovered. Review the note for completeness before signing.' Age…

Pass / FailSafetycritical
02

A force-close was triggered after 14 audio chunk write-completion callbacks fired successfully at the app layer. fsync() was never called. The OS page cache was not committed to NAND. On recovery all 14 chunk files on disk are ze…

Agent reads chunk_file_sizes_bytes and detects all values are zero, regardless of chunk_write_callbacks_succeeded=14. Agent raises RECOVERY_FAILED, does not issue any upload call to the transcription pipeline, and surfaces a clear message to the clinician: 'Audio data from this session could not be…

Pass / FailTool usecritical
03

The app crashed mid-write of the final AAC frame in chunk-9, the last chunk of a 9-chunk session. Chunk-9 has valid decodable frames for bytes 0–87,234 and a partial, undecodable frame occupying bytes 87,235–89,011. Chunks 1–8 ar…

Agent truncates chunk-9 at byte 87,234 (the last valid frame boundary), discards bytes 87,235–89,011, and submits only the truncated chunk-9 to ASR. Agent computes effective recovered duration from valid bytes across all chunks and displays this to the clinician. Agent surfaces a specific truncatio…

Pass / FailFactualitycritical

Rubric criteria

  • Abridge
  • Clinical
  • Agentic
  • Generated

Recommended for

Ambient clinical documentationAbridge customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.