For DecagonCustomer SupportSupport BotTask CompletionTool Selection

Deflection Vs Escalation Decisioning

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) · Decagon

66 graded scenarios covering edge cases, failure modes, and quality checks.

About Decagon

Decagon builds AI customer support agents that understand full conversation context, integrate with existing helpdesks, and resolve tickets end-to-end without human intervention. Its platform is used by fintechs, SaaS companies, and consumer platforms.

Employees

~80

Industry

Customer Support AI

Headquarters

San Francisco, CA

Website

decagon.ai

Sample tests· showing 3 of 66

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#	Input	Expected behavior	Check
01	A customer contacts support about a duplicate charge. Over four consecutive turns their language escalates: Turn 1 is mildly annoyed ('this seems wrong'), Turn 2 expresses clear frustration ('I've already checked this twice'), Tu…	The agent detects the upward-trending sentiment delta across the four-turn window — not just turn-4 in isolation — and initiates escalation to a human agent. The warm handoff package cites the trajectory (e.g., 'sentiment rose across turns 1–4; no single message individually crossed threshold') rat…	Pass / FailWorkflowcritical
02	A customer sends one highly frustrated message about a double charge, but in the very next turn — 80 seconds later — they acknowledge the agent's explanation and self-correct: the second charge was their annual renewal. No furthe…	After turn 2 the agent does not initiate escalation, does not create a human-review ticket, and does not page a human agent. It confirms the resolution, summarizes the two charges briefly, and asks if there is anything else. The session closes as resolved with no escalation event recorded.	Pass / FailPolicycriticalneg. control
03	A customer asks about a stalled refund for a failed delivery. The agent delivers the same policy sentence verbatim on turns 1, 2, and 3 because it cannot answer the customer's actual follow-up question about why the investigation…	The system detects both the rising anger trajectory and the agent-response repetition loop (identical string delivered 3 consecutive turns) as a compound signal, classifies the escalation urgency higher than external-anger-alone cases, routes immediately to a human agent, and includes a handoff not…	Pass / FailSafetycritical
Unlock full benchmark 63 more test cases Use this benchmark

How this eval is graded

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

Decagon
Agentic
Generated

Recommended for

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)Decagon customers

Works with

Decagon

Related evals

Customer Support

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)

61 graded scenarios covering edge cases, failure modes, and quality checks.

View Customer Support

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)

60 graded scenarios covering edge cases, failure modes, and quality checks.

View Customer Support

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)

54 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Deflection Vs Escalation Decisioning eval for Decagon Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) test?+

66 graded scenarios covering edge cases, failure modes, and quality checks.

How is the Deflection Vs Escalation Decisioning eval scored?+

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test. The judge rubric: Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain. A criterion passes at a judge score of 4 or higher.

How many test cases does this eval pack include?+

The Deflection Vs Escalation Decisioning pack for Decagon Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) contains 66 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Deflection Vs Escalation Decisioning pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.