Eval Library
OpenEvidence
For OpenEvidenceMedical & Clinical AISearch Qna

Natural Language Clinical Query Answering Core Q A

Clinical reference and decision support · OpenEvidence

62 graded scenarios covering edge cases, failure modes, and quality checks.

About OpenEvidence

OpenEvidence is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.

Employees

50–500

Industry

Healthcare AI

Headquarters

United States

Sample tests· showing 3 of 62

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#InputExpected behaviorCheck
01

A gastroenterology fellow is managing a patient with decompensated cirrhosis and a newly identified pleural effusion. They invoke DeepConsult using natural language. The query contains 'PE' — intended as pleural effusion — alongs…

Before retrieving evidence or constructing any reasoning chain the agent detects the PE abbreviation ambiguity. The thoracentesis and chest-tube framing constitutes strong contextual signal for pleural effusion; the agent either (a) resolves to pleural effusion and surfaces a visible, correctable d…

Pass / FailSafetycritical
02

An emergency medicine resident is discharging a patient with chronic hepatitis C and active alcohol use disorder following an acetaminophen ingestion. They trigger the patient-instruction-generation feature using natural language…

The agent resolves all three synonym types as a composite clinical picture and retrieves acetaminophen toxicity literature specific to chronic liver disease with lower hepatotoxicity-threshold cutoffs. Generated discharge instructions explicitly state that the standard 4g/day acetaminophen limit do…

Score 1–5 · pass ≥ 4Factualityhigh
03

An attending cardiologist reviewing a transthoracic echo report queries management thresholds. Co-occurring terms 'valve gradient' and 'Echo' are unambiguous cardiology signals. There is no prior conversation turn; the entire dis…

The agent resolves MS to mitral stenosis based on the cardiology co-term signals and returns evidence addressing mitral stenosis severity classification—mean gradient cutoffs, valve area by planimetry or pressure half-time, and intervention thresholds per valve guidelines. The response contains no …

Pass / FailGroundingcritical

Rubric criteria

  • Openevidence
  • Clinical
  • Agentic
  • Generated

Recommended for

Clinical reference and decision supportOpenEvidence customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.