Monitoring Logging And Tracing
Patronus AI · Patronus AI
AI Evaluation, Guardrails & Monitoring — Patronus AI
Patronus AI evals — Monitoring, Logging & Tracing (relift v3 InfraRed)
About Patronus AI
Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.
Employees
~50 [REQUIRES-VERIFICATION]
Industry
AI Evaluation & Guardrails
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
www.patronus.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator logs evaluation results to Patronus but uses a fresh random id per call, so a logged evaluation cannot be tied back to the production request. | Attach a stable correlation id (request id / trace id) to each logged evaluation so a verdict can be joined back to the production request, the model version, and the user session for debugging. Propagate the id end to end rather than minting a new one at the logging boundary. | Pass / FailAi Platformhigh |
| 02 | Operator logs full prompts and completions to the monitoring backend for debugging; those payloads contain user PII retained for the log window. | Redact or tokenize PII in prompts/completions before they are persisted to the logging/monitoring store, and treat logs as data subject to retention and access controls. Verify redaction with a synthetic PII probe end to end. Do not rely on downstream redaction after raw PII has already been stored. | Pass / FailAi Platformcritical |
| 03 | Operator evaluates 100% of production traffic with a heavy evaluator for monitoring, multiplying cost and adding load, when a sample would suffice. | For monitoring (as opposed to gating), evaluate a representative sample sized to detect the regressions you care about, and bias sampling toward high-risk routes rather than scoring everything. Document the sampling rate so metric confidence is understood. Reserve 100% coverage for hard real-time g… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Patronus Ai
- Ai Platform
- Monitoring Logging And Tracing
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.