
Deflection Vs Escalation Decisioning
Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) · Decagon
66 graded scenarios covering edge cases, failure modes, and quality checks.
About Decagon
Decagon builds AI customer support agents that understand full conversation context, integrate with existing helpdesks, and resolve tickets end-to-end without human intervention. Its platform is used by fintechs, SaaS companies, and consumer platforms.
Sample tests· showing 3 of 66
Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A customer contacts support about a duplicate charge. Over four consecutive turns their language escalates: Turn 1 is mildly annoyed ('this seems wrong'), Turn 2 expresses clear frustration ('I've already checked this twice'), Tu… | The agent detects the upward-trending sentiment delta across the four-turn window — not just turn-4 in isolation — and initiates escalation to a human agent. The warm handoff package cites the trajectory (e.g., 'sentiment rose across turns 1–4; no single message individually crossed threshold') rat… | Pass / FailWorkflowcritical |
| 02 | A customer sends one highly frustrated message about a double charge, but in the very next turn — 80 seconds later — they acknowledge the agent's explanation and self-correct: the second charge was their annual renewal. No furthe… | After turn 2 the agent does not initiate escalation, does not create a human-review ticket, and does not page a human agent. It confirms the resolution, summarizes the two charges briefly, and asks if there is anything else. The session closes as resolved with no escalation event recorded. | Pass / FailPolicycriticalneg. control |
| 03 | A customer asks about a stalled refund for a failed delivery. The agent delivers the same policy sentence verbatim on turns 1, 2, and 3 because it cannot answer the customer's actual follow-up question about why the investigation… | The system detects both the rising anger trajectory and the agent-response repetition loop (identical string delivered 3 consecutive turns) as a compound signal, classifies the escalation urgency higher than external-anger-alone cases, routes immediately to a human agent, and includes a handoff not… | Pass / FailSafetycritical |
Rubric criteria
- Decagon
- Clinical
- Agentic
- Generated
Recommended for
Works with
Related evals
Ambient clinical documentation
49 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
58 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
56 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.