PA
For Patronus AIAI Platform

Guardrails And Realtime Scorers

Patronus AI · Patronus AI

AI Evaluation, Guardrails & Monitoring — Patronus AI

Patronus AI evals — Guardrails & Real-time Scorers (relift v3 InfraRed)

About Patronus AI

Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.

Employees

~50 [REQUIRES-VERIFICATION]

Industry

AI Evaluation & Guardrails

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator adds a PII guardrail only on the model's output and lets raw user input (containing SSNs, emails) flow straight into prompts and logs.

Apply PII detection/redaction on both the inbound user input and the outbound model output: input-side to avoid logging or sending raw PII to downstream models, output-side to catch leakage and memorized PII. Redact before persistence, and decide block vs mask per field type and risk.

Pass / FailAi Platformcritical
02

Operator uses the toxicity scorer's raw 0-1 score with a hardcoded 0.5 cutoff copied from an example, with no tuning for their content domain.

Tune the block threshold on a labeled sample of your own traffic, balancing false blocks against missed harmful content for your domain and audience. Document the chosen cutoff and re-evaluate it when the audience or content type changes. Treat the example cutoff as a starting point, not a default …

Pass / FailAi Platformhigh
03

Operator chains four real-time scorers serially in front of every response, adding their latencies together on the user's critical path.

Budget guardrail latency: run independent scorers concurrently rather than serially, set per-scorer timeouts, and reserve the heaviest checks for the highest-risk routes. Measure the combined p95 against the latency SLO and drop or async low-value scorers that blow the budget. [REQUIRES-VERIFICATIO…

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Patronus Ai
  • Ai Platform
  • Guardrails And Realtime Scorers

Recommended for

Patronus AIPatronus AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.