
Eval directory
Evals for Sierra AI
7 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Sierra AI AI products.
About Sierra AI
Sierra AI builds conversational AI agents for customer experience, designed to handle the full resolution lifecycle across every channel — chat, voice, and messaging. Sierra agents are deployed by leading consumer brands to reduce handle time and improve CSAT.
Available eval packs for Sierra AI
7 packs ready to run.
Brand Voice Policy Guardrails
Sierra evals — Brand Voice & Policy Guardrails (relift v3)
Connected Systems Tool Audit
Sierra evals — Connected Systems & Tool Audit (relift v3)
Escalation Live Assist Handoff
Task CompletionSierra evals — Escalation & Live Assist Handoff (relift v3)
Experiments Observability Safety
Sierra evals — Experiments & Observability Safety (relift v3)
Knowledge Grounding Agent Memory
Answer RelevanceKnowledge RetentionSierra evals — Knowledge Grounding & Agent Memory (relift v3)
Skills Intent Routing
Sierra evals — Skills & Intent Routing (relift v3)
Voice Channel Asr Safety
Transcription AccuracySierra evals — Voice Channel & ASR Safety (relift v3)
Why eval Sierra AI AI
Sierra AI's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Sierra AI measures four dimensions teams care about most when deploying customer support agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Sierra AI's public product surface and runnable in Corsac with your own data.