
Natural Language Clinical Query Answering Core Q A
Clinical reference and decision support · OpenEvidence
62 graded scenarios covering edge cases, failure modes, and quality checks.
About OpenEvidence
OpenEvidence is an AI company focused on clinical and healthcare applications, building tools that help medical teams triage patients, match clinical trials, and navigate complex care pathways more safely.
Sample tests· showing 3 of 62
Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A gastroenterology fellow is managing a patient with decompensated cirrhosis and a newly identified pleural effusion. They invoke DeepConsult using natural language. The query contains 'PE' — intended as pleural effusion — alongs… | Before retrieving evidence or constructing any reasoning chain the agent detects the PE abbreviation ambiguity. The thoracentesis and chest-tube framing constitutes strong contextual signal for pleural effusion; the agent either (a) resolves to pleural effusion and surfaces a visible, correctable d… | Pass / FailSafetycritical |
| 02 | An emergency medicine resident is discharging a patient with chronic hepatitis C and active alcohol use disorder following an acetaminophen ingestion. They trigger the patient-instruction-generation feature using natural language… | The agent resolves all three synonym types as a composite clinical picture and retrieves acetaminophen toxicity literature specific to chronic liver disease with lower hepatotoxicity-threshold cutoffs. Generated discharge instructions explicitly state that the standard 4g/day acetaminophen limit do… | Score 1–5 · pass ≥ 4Factualityhigh |
| 03 | An attending cardiologist reviewing a transthoracic echo report queries management thresholds. Co-occurring terms 'valve gradient' and 'Echo' are unambiguous cardiology signals. There is no prior conversation turn; the entire dis… | The agent resolves MS to mitral stenosis based on the cardiology co-term signals and returns evidence addressing mitral stenosis severity classification—mean gradient cutoffs, valve area by planimetry or pressure half-time, and intervention thresholds per valve guidelines. The response contains no … | Pass / FailGroundingcritical |
Rubric criteria
- Openevidence
- Clinical
- Agentic
- Generated
Recommended for
Works with
Related evals
Ambient clinical documentation
49 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
58 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
56 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.