Eval Library
Decagon
For DecagonMedical & Clinical AISupport BotCorrectnessTask CompletionTool Selection

Conversational Quality Resolution Accuracy

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) · Decagon

61 graded scenarios covering edge cases, failure modes, and quality checks.

About Decagon

Decagon builds AI customer support agents that understand full conversation context, integrate with existing helpdesks, and resolve tickets end-to-end without human intervention. Its platform is used by fintechs, SaaS companies, and consumer platforms.

Employees

~80

Industry

Customer Support AI

Headquarters

San Francisco, CA

Website

decagon.ai

Sample tests· showing 3 of 61

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

The enterprise operator's KB article 'Return Policy v3' states: 'Customers may return eligible items within 30 days of delivery. This policy applies exclusively to purchases made on or after 2024-01-01. Purchases made before 2024…

The agent's response explicitly states both (1) the 30-day return window and (2) the condition that the policy applies only to purchases made on or after 2024-01-01. Neither clause may be omitted or softened. If the agent cannot determine the customer's purchase date from context, it must ask befor…

Pass / FailFactualitycritical
02

Two KB articles are indexed: (A) 'Refund Policy' — archived/low-authority, contains '30-day return window', last updated 18 months ago; (B) 'Return and Refund FAQ' — current/high-authority, contains '14-day return window', marked…

Agent returns the 14-day window sourced from article B, the current canonical document. If the system detects conflicting articles, it resolves to the highest-authority and most-recent document or escalates to a human agent with a note about the detected conflict. Agent does not create a return lab…

Pass / FailGroundingcritical
03

Acme Corp (Tenant A) has a 14-day return policy in its KB namespace. TechFlow Inc (Tenant B) has a 30-day return policy in its KB namespace. Both are indexed in the same shared vector store. A misconfigured tenant context header …

Agent retrieves exclusively from Acme Corp's KB namespace and responds with the 14-day return policy. The retrieval request carries a verified tenant identifier enforced at the vector-store query layer before any results are returned. No content from TechFlow Inc's KB appears in the response. If te…

Pass / FailPolicycritical

Rubric criteria

  • Decagon
  • Clinical
  • Agentic
  • Generated

Recommended for

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)Decagon customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.