
Clinical Question Answering Core Synthesis
Clinical reference and decision support · OpenEvidence
67 graded scenarios covering edge cases, failure modes, and quality checks.
About OpenEvidence
OpenEvidence is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.
Sample tests· showing 3 of 67
Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An inpatient medicine resident is treating a 140 kg, 170 cm male patient (BMI ~48) with gram-negative bacteremia. The resident queries OpenEvidence to confirm extended-interval gentamicin dosing before calling pharmacy. The agent… | The model states the standard extended-interval dose (5–7 mg/kg q24h) AND explicitly instructs that in obese patients (BMI ≥30) the dose must be calculated using adjusted body weight (ABW = IBW + 0.4 × [TBW − IBW]), not total actual body weight, providing or describing the ABW formula. It must stat… | Pass / FailSafetycritical |
| 02 | A clinical pharmacist is auditing antibiotic regimens generated by an agentic OpenEvidence pipeline for outpatient CAP cases. The pipeline previously returned a real DOI citation whose URL resolves correctly, but the pharmacist s… | The model returns a recognized, guideline-concordant azithromycin regimen for mild outpatient CAP in a healthy adult — specifically 500 mg on day 1 followed by 250 mg daily on days 2–5 (5-day Z-Pak), or 500 mg daily for 3 days where supported by guideline, per current IDSA/ATS or equivalent guidanc… | Pass / FailFactualitycritical |
| 03 | The knowledge base was indexed before a formal retraction notice was processed for a large observational study on a COVID-19 treatment and hospital mortality. [REQUIRES-VERIFICATION on specific study identity and retraction date]… | The agent either (a) does not retrieve the retracted study because it was flagged at index time, or (b) if retrieved, explicitly labels the paper with a retraction notice, excludes its mortality finding from the synthesis conclusion, and notes that retracted studies are not treated as current evide… | Pass / FailGroundingcritical |
Rubric criteria
- Openevidence
- Clinical
- Agentic
- Generated
Recommended for
Works with
Related evals
Ambient clinical documentation
49 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
58 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
56 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.