For Patronus AIAI Platform

Custom Evaluators And Criteria

Patronus AI · Patronus AI

AI Evaluation, Guardrails & Monitoring — Patronus AI

Patronus AI evals — Custom Evaluators & Criteria (relift v3 InfraRed)

About Patronus AI

Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.

Employees

~50 [REQUIRES-VERIFICATION]

Industry

AI Evaluation & Guardrails

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Website

www.patronus.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator defines a custom evaluator with the criterion 'the answer should be professional' and nothing else.	Write custom criteria as concrete, checkable statements with observable anchors (e.g. 'no profanity; addresses the user's question; cites at least one provided source when one exists') rather than abstract adjectives. Specify pass/fail boundaries so two reviewers (human or model) would agree on the…	Pass / FailAi Platformhigh
02	Operator writes a custom evaluator and immediately puts it in production gating without checking it against any labeled examples.	Validate a new custom evaluator on a labeled set with known-good and known-bad examples (including edge cases) before trusting it for gating. Confirm it passes the goods and fails the bads, measure agreement with human labels, and iterate the criteria where it disagrees. Promote to gating only afte…	Pass / FailAi Platformcritical
03	Operator writes a single custom evaluator that simultaneously judges factual accuracy, tone, formatting, and safety, returning one blended score.	Decompose distinct concerns into separate evaluators (accuracy, tone, format, safety) so each yields an actionable, independently-thresholded verdict. A single blended score hides which dimension failed and prevents per-concern gating policy. Compose them at the policy layer, not inside one rubric.	Pass / FailAi Platformmedium
Use this eval

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Patronus Ai
Ai Platform
Custom Evaluators And Criteria

Recommended for

Patronus AIPatronus AI customers

Works with

Patronus AI

Related evals

AI Platform

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.

Custom Evaluators And Criteria

About Patronus AI

Sample tests· showing 3 of 9

How this eval is graded

Rubric criteria

Recommended for

Works with

Related evals

Claude API

Claude API

Claude API