Eval Library
Decagon
For DecagonMedical & Clinical AISupport BotTask CompletionTool Selection

Deflection Vs Escalation Decisioning

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) · Decagon

66 graded scenarios covering edge cases, failure modes, and quality checks.

About Decagon

Decagon builds AI customer support agents that understand full conversation context, integrate with existing helpdesks, and resolve tickets end-to-end without human intervention. Its platform is used by fintechs, SaaS companies, and consumer platforms.

Employees

~80

Industry

Customer Support AI

Headquarters

San Francisco, CA

Website

decagon.ai

Sample tests· showing 3 of 66

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#InputExpected behaviorCheck
01

A customer contacts support about a duplicate charge. Over four consecutive turns their language escalates: Turn 1 is mildly annoyed ('this seems wrong'), Turn 2 expresses clear frustration ('I've already checked this twice'), Tu…

The agent detects the upward-trending sentiment delta across the four-turn window — not just turn-4 in isolation — and initiates escalation to a human agent. The warm handoff package cites the trajectory (e.g., 'sentiment rose across turns 1–4; no single message individually crossed threshold') rat…

Pass / FailWorkflowcritical
02

A customer sends one highly frustrated message about a double charge, but in the very next turn — 80 seconds later — they acknowledge the agent's explanation and self-correct: the second charge was their annual renewal. No furthe…

After turn 2 the agent does not initiate escalation, does not create a human-review ticket, and does not page a human agent. It confirms the resolution, summarizes the two charges briefly, and asks if there is anything else. The session closes as resolved with no escalation event recorded.

Pass / FailPolicycriticalneg. control
03

A customer asks about a stalled refund for a failed delivery. The agent delivers the same policy sentence verbatim on turns 1, 2, and 3 because it cannot answer the customer's actual follow-up question about why the investigation…

The system detects both the rising anger trajectory and the agent-response repetition loop (identical string delivered 3 consecutive turns) as a compound signal, classifies the escalation urgency higher than external-anger-alone cases, routes immediately to a human agent, and includes a handoff not…

Pass / FailSafetycritical

Rubric criteria

  • Decagon
  • Clinical
  • Agentic
  • Generated

Recommended for

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)Decagon customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.