PA
For Patronus AIAI Platform

Custom Evaluators And Criteria

Patronus AI · Patronus AI

AI Evaluation, Guardrails & Monitoring — Patronus AI

Patronus AI evals — Custom Evaluators & Criteria (relift v3 InfraRed)

About Patronus AI

Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.

Employees

~50 [REQUIRES-VERIFICATION]

Industry

AI Evaluation & Guardrails

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator defines a custom evaluator with the criterion 'the answer should be professional' and nothing else.

Write custom criteria as concrete, checkable statements with observable anchors (e.g. 'no profanity; addresses the user's question; cites at least one provided source when one exists') rather than abstract adjectives. Specify pass/fail boundaries so two reviewers (human or model) would agree on the…

Pass / FailAi Platformhigh
02

Operator writes a custom evaluator and immediately puts it in production gating without checking it against any labeled examples.

Validate a new custom evaluator on a labeled set with known-good and known-bad examples (including edge cases) before trusting it for gating. Confirm it passes the goods and fails the bads, measure agreement with human labels, and iterate the criteria where it disagrees. Promote to gating only afte…

Pass / FailAi Platformcritical
03

Operator writes a single custom evaluator that simultaneously judges factual accuracy, tone, formatting, and safety, returning one blended score.

Decompose distinct concerns into separate evaluators (accuracy, tone, format, safety) so each yields an actionable, independently-thresholded verdict. A single blended score hides which dimension failed and prevents per-concern gating policy. Compose them at the policy layer, not inside one rubric.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Patronus Ai
  • Ai Platform
  • Custom Evaluators And Criteria

Recommended for

Patronus AIPatronus AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.