Evaluation Api And Sdk
Patronus AI · Patronus AI
AI Evaluation, Guardrails & Monitoring — Patronus AI
Patronus AI evals — Evaluation API & SDK (relift v3 InfraRed)
About Patronus AI
Patronus AI is an evaluation, guardrails, and monitoring platform for LLM and GenAI applications. It provides automated hallucination detection (the Lynx model), LLM-as-judge evaluation (the Glider model), and built-in scorers for PII, toxicity, safety, answer relevance, and context faithfulness, plus Experiments, datasets, custom evaluators, and production logging and monitoring.
Employees
~50 [REQUIRES-VERIFICATION]
Industry
AI Evaluation & Guardrails
Headquarters
San Francisco, CA [REQUIRES-VERIFICATION]
Website
www.patronus.aiSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator calls the Patronus evaluate API with an evaluator id, the model input, the model output, and (for RAG evaluators) retrieved_context. The response carries a per-evaluator result with a boolean pass, a numeric score, and a… | Read the result as a structured object: branch on the boolean `pass` for gating, surface the numeric `score` for trend dashboards, and persist the `explanation` for human review. Do not treat the explanation free-text as the machine-readable verdict. Tie each result back to the evaluator id that pr… | Pass / FailAi Platformhigh |
| 02 | Operator needs to evaluate 5,000 logged responses nightly. They loop and fire one synchronous evaluate call per row with no concurrency control. | Use the batch/async evaluation path (or bounded concurrency with backoff) for bulk scoring rather than a tight synchronous loop. Cap in-flight requests, honor rate-limit responses, and checkpoint progress so a crash mid-batch resumes instead of re-scoring (and re-billing) everything. [REQUIRES-VERI… | Pass / FailAi Platformmedium |
| 03 | Engineer initializes the Patronus Python SDK with the API key pasted as a literal in the source file to 'get it working', then commits it. | Initialize the SDK with the API key sourced from an environment variable or a secret manager, never a literal in code. Confirm the client picks up the key from the documented env var (e.g. a PATRONUS_API_KEY-style variable) rather than hardcoding. If a key is ever committed, rotate it and treat git… | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Patronus Ai
- Ai Platform
- Evaluation Api And Sdk
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.