Guardrails And Realtime Scorers
Patronus AI · Patronus AI
AI Evaluation, Guardrails & Monitoring — Patronus AI
Patronus AI evals — Guardrails & Real-time Scorers (relift v3 InfraRed)
About Patronus AI
Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.
Employees
~50 [REQUIRES-VERIFICATION]
Industry
AI Evaluation & Guardrails
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
www.patronus.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator adds a PII guardrail only on the model's output and lets raw user input (containing SSNs, emails) flow straight into prompts and logs. | Apply PII detection/redaction on both the inbound user input and the outbound model output: input-side to avoid logging or sending raw PII to downstream models, output-side to catch leakage and memorized PII. Redact before persistence, and decide block vs mask per field type and risk. | Pass / FailAi Platformcritical |
| 02 | Operator uses the toxicity scorer's raw 0-1 score with a hardcoded 0.5 cutoff copied from an example, with no tuning for their content domain. | Tune the block threshold on a labeled sample of your own traffic, balancing false blocks against missed harmful content for your domain and audience. Document the chosen cutoff and re-evaluate it when the audience or content type changes. Treat the example cutoff as a starting point, not a default … | Pass / FailAi Platformhigh |
| 03 | Operator chains four real-time scorers serially in front of every response, adding their latencies together on the user's critical path. | Budget guardrail latency: run independent scorers concurrently rather than serially, set per-scorer timeouts, and reserve the heaviest checks for the highest-risk routes. Measure the combined p95 against the latency SLO and drop or async low-value scorers that blow the budget. [REQUIRES-VERIFICATIO… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Patronus Ai
- Ai Platform
- Guardrails And Realtime Scorers
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.