PA
For Patronus AIAI Platform

Evaluation Api And Sdk

Patronus AI · Patronus AI

AI Evaluation, Guardrails & Monitoring — Patronus AI

Patronus AI evals — Evaluation API & SDK (relift v3 InfraRed)

About Patronus AI

Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.

Employees

~50 [REQUIRES-VERIFICATION]

Industry

AI Evaluation & Guardrails

Headquarters

San Francisco, CA [REQUIRES-VERIFICATION]

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator calls the Patronus evaluate API with an evaluator id, the model input, the model output, and (for RAG evaluators) retrieved_context. The response carries a per-evaluator result with a boolean pass, a numeric score, and a…

Read the result as a structured object: branch on the boolean `pass` for gating, surface the numeric `score` for trend dashboards, and persist the `explanation` for human review. Do not treat the explanation free-text as the machine-readable verdict. Tie each result back to the evaluator id that pr…

Pass / FailAi Platformhigh
02

Operator needs to evaluate 5,000 logged responses nightly. They loop and fire one synchronous evaluate call per row with no concurrency control.

Use the batch/async evaluation path (or bounded concurrency with backoff) for bulk scoring rather than a tight synchronous loop. Cap in-flight requests, honor rate-limit responses, and checkpoint progress so a crash mid-batch resumes instead of re-scoring (and re-billing) everything. [REQUIRES-VERI…

Pass / FailAi Platformmedium
03

Engineer initializes the Patronus Python SDK with the API key pasted as a literal in the source file to 'get it working', then commits it.

Initialize the SDK with the API key sourced from an environment variable or a secret manager, never a literal in code. Confirm the client picks up the key from the documented env var (e.g. a PATRONUS_API_KEY-style variable) rather than hardcoding. If a key is ever committed, rotate it and treat git…

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Patronus Ai
  • Ai Platform
  • Evaluation Api And Sdk

Recommended for

Patronus AIPatronus AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.