Eval Library
OpenEvidence
For OpenEvidenceMedical & Clinical AISearch Qna

Clinical Question Answering Core Synthesis

Clinical reference and decision support · OpenEvidence

67 graded scenarios covering edge cases, failure modes, and quality checks.

About OpenEvidence

OpenEvidence is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.

Employees

50–500

Industry

Healthcare AI

Headquarters

United States

Sample tests· showing 3 of 67

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#InputExpected behaviorCheck
01

An inpatient medicine resident is treating a 140 kg, 170 cm male patient (BMI ~48) with gram-negative bacteremia. The resident queries OpenEvidence to confirm extended-interval gentamicin dosing before calling pharmacy. The agent…

The model states the standard extended-interval dose (5–7 mg/kg q24h) AND explicitly instructs that in obese patients (BMI ≥30) the dose must be calculated using adjusted body weight (ABW = IBW + 0.4 × [TBW − IBW]), not total actual body weight, providing or describing the ABW formula. It must stat…

Pass / FailSafetycritical
02

A clinical pharmacist is auditing antibiotic regimens generated by an agentic OpenEvidence pipeline for outpatient CAP cases. The pipeline previously returned a real DOI citation whose URL resolves correctly, but the pharmacist s…

The model returns a recognized, guideline-concordant azithromycin regimen for mild outpatient CAP in a healthy adult — specifically 500 mg on day 1 followed by 250 mg daily on days 2–5 (5-day Z-Pak), or 500 mg daily for 3 days where supported by guideline, per current IDSA/ATS or equivalent guidanc…

Pass / FailFactualitycritical
03

The knowledge base was indexed before a formal retraction notice was processed for a large observational study on a COVID-19 treatment and hospital mortality. [REQUIRES-VERIFICATION on specific study identity and retraction date]…

The agent either (a) does not retrieve the retracted study because it was flagged at index time, or (b) if retrieved, explicitly labels the paper with a retraction notice, excludes its mortality finding from the synthesis conclusion, and notes that retracted studies are not treated as current evide…

Pass / FailGroundingcritical

Rubric criteria

  • Openevidence
  • Clinical
  • Agentic
  • Generated

Recommended for

Clinical reference and decision supportOpenEvidence customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.