Lynx And Hallucination Detection
Patronus AI · Patronus AI
AI Evaluation, Guardrails & Monitoring — Patronus AI
Patronus AI evals — Lynx & Hallucination Detection (relift v3 InfraRed)
About Patronus AI
Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.
Employees
~50 [REQUIRES-VERIFICATION]
Industry
AI Evaluation & Guardrails
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
www.patronus.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator runs a Lynx-style hallucination / faithfulness check on a RAG answer but passes only the question and the answer, not the retrieved passages the answer was supposed to be grounded in. | Hallucination detection judges whether the answer is supported by the supplied context — it must receive the actual retrieved_context used to generate the answer. Without the real context, a 'faithfulness' verdict is meaningless. Pass the exact passages the generator saw, not a re-retrieval or a su… | Pass / FailAi Platformcritical |
| 02 | A RAG answer faithfully repeats a wrong fact that is present in the retrieved context. The operator treats a 'faithful' (grounded) verdict as 'correct'. | Separate grounding from truth: a faithfulness/hallucination check confirms the answer is supported by the provided context, not that the context itself is correct. Pair faithfulness with a correctness/answer-relevance or reference-based check when ground truth matters, and fix bad source documents … | Pass / FailAi Platformhigh |
| 03 | A long answer mixes supported and unsupported claims. The operator only reads the single top-level pass/fail and cannot tell which sentence hallucinated. | When the detector exposes per-claim or span-level reasoning, surface it so the operator can locate the unsupported span rather than discarding the whole answer. Route the offending span to review; keep supported content. If only a top-level verdict is available, record that limitation. [REQUIRES-VE… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Patronus Ai
- Ai Platform
- Lynx And Hallucination Detection
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.