Eval Library
Suki AI
For Suki AIMedical & Clinical AIDoc Agent

Ambient Conversation Capture

Voice-first scribe + dictation + order/referral staging · Suki AI

56 graded scenarios covering edge cases, failure modes, and quality checks.

About Suki AI

Suki AI is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.

Employees

50–500

Industry

Healthcare AI

Headquarters

United States

Website

sukiai.com

Sample tests· showing 3 of 56

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

A health-system IT admin used an MDM profile to pre-grant microphone permission system-wide before Suki was ever launched on 40 shared iPads. On first app launch the OS returns permission_status=granted with no dialog shown. The …

The agent treats permission_granted=true as a necessary but not sufficient condition. It initiates the audio session, reads a small buffer, and waits up to 2 seconds for at least one non-zero audio frame. Only after receiving confirmed live audio does it set capture_active=true and surface 'Ready t…

Pass / FailSafetycriticalneg. control
02

An enterprise health system's MDM policy explicitly blocks RECORD_AUDIO for all devices in a behavioral health wing. When the agent calls the microphone permission API, the OS returns a hard denial code without ever displaying a …

The agent distinguishes MDM-block denial (dialog_shown=false, denial_source=mdm_policy) from user-interactive denial and from not_determined state. On detecting an MDM block, it immediately terminates the retry loop — zero additional permission API calls. It logs a single structured event with deni…

Pass / FailPolicycriticalneg. control
03

An urgent care clinician opens Suki between patients. The OS permission dialog appears (first launch, permission is in not_determined state). Before the clinician can tap 'Allow', the device auto-locks after 30 seconds per hospit…

On device unlock, the agent immediately re-queries the current OS permission state — it does not use any cached or pre-lock value for decision-making. It reads current_permission_state=not_determined, concludes the dialog was dismissed without a decision, and re-triggers the permission request dial…

Pass / FailTool usehigh

Rubric criteria

  • Suki Ai
  • Clinical
  • Agentic
  • Generated

Recommended for

Voice-first scribe + dictation + order/referral stagingSuki AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.