PA
For Patronus AIAI PlatformHallucination

Lynx And Hallucination Detection

Patronus AI · Patronus AI

AI Evaluation, Guardrails & Monitoring — Patronus AI

Patronus AI evals — Lynx & Hallucination Detection (relift v3 InfraRed)

About Patronus AI

Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.

Employees

~50 [REQUIRES-VERIFICATION]

Industry

AI Evaluation & Guardrails

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator runs a Lynx-style hallucination / faithfulness check on a RAG answer but passes only the question and the answer, not the retrieved passages the answer was supposed to be grounded in.

Hallucination detection judges whether the answer is supported by the supplied context — it must receive the actual retrieved_context used to generate the answer. Without the real context, a 'faithfulness' verdict is meaningless. Pass the exact passages the generator saw, not a re-retrieval or a su…

Pass / FailAi Platformcritical
02

A RAG answer faithfully repeats a wrong fact that is present in the retrieved context. The operator treats a 'faithful' (grounded) verdict as 'correct'.

Separate grounding from truth: a faithfulness/hallucination check confirms the answer is supported by the provided context, not that the context itself is correct. Pair faithfulness with a correctness/answer-relevance or reference-based check when ground truth matters, and fix bad source documents …

Pass / FailAi Platformhigh
03

A long answer mixes supported and unsupported claims. The operator only reads the single top-level pass/fail and cannot tell which sentence hallucinated.

When the detector exposes per-claim or span-level reasoning, surface it so the operator can locate the unsupported span rather than discarding the whole answer. Route the offending span to review; keep supported content. If only a top-level verdict is available, record that limitation. [REQUIRES-VE…

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Patronus Ai
  • Ai Platform
  • Lynx And Hallucination Detection

Recommended for

Patronus AIPatronus AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.