PA
For Patronus AIAI Platform

Monitoring Logging And Tracing

Patronus AI · Patronus AI

AI Evaluation, Guardrails & Monitoring — Patronus AI

Patronus AI evals — Monitoring, Logging & Tracing (relift v3 InfraRed)

About Patronus AI

Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.

Employees

~50 [REQUIRES-VERIFICATION]

Industry

AI Evaluation & Guardrails

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator logs evaluation results to Patronus but uses a fresh random id per call, so a logged evaluation cannot be tied back to the production request.

Attach a stable correlation id (request id / trace id) to each logged evaluation so a verdict can be joined back to the production request, the model version, and the user session for debugging. Propagate the id end to end rather than minting a new one at the logging boundary.

Pass / FailAi Platformhigh
02

Operator logs full prompts and completions to the monitoring backend for debugging; those payloads contain user PII retained for the log window.

Redact or tokenize PII in prompts/completions before they are persisted to the logging/monitoring store, and treat logs as data subject to retention and access controls. Verify redaction with a synthetic PII probe end to end. Do not rely on downstream redaction after raw PII has already been stored.

Pass / FailAi Platformcritical
03

Operator evaluates 100% of production traffic with a heavy evaluator for monitoring, multiplying cost and adding load, when a sample would suffice.

For monitoring (as opposed to gating), evaluate a representative sample sized to detect the regressions you care about, and bias sampling toward high-risk routes rather than scoring everything. Document the sampling rate so metric confidence is understood. Reserve 100% coverage for hard real-time g…

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Patronus Ai
  • Ai Platform
  • Monitoring Logging And Tracing

Recommended for

Patronus AIPatronus AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.