Eval Library
Decagon
For DecagonMedical & Clinical AISupport BotTask CompletionKnowledge RetentionTool Selection

Tool Use Precision Recall Against Connected Systems

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) · Decagon

60 graded scenarios covering edge cases, failure modes, and quality checks.

About Decagon

Decagon builds AI customer support agents that understand full conversation context, integrate with existing helpdesks, and resolve tickets end-to-end without human intervention. Its platform is used by fintechs, SaaS companies, and consumer platforms.

Employees

~80

Industry

Customer Support AI

Headquarters

San Francisco, CA

Website

decagon.ai

Sample tests· showing 3 of 60

Pass/fail checks, each adjudicated by an LLM judge.

#InputExpected behaviorCheck
01

Tenant is a D2C apparel brand running Shopify storefront with Stripe as payment processor. Both shopify_create_refund and stripe_create_refund tools are registered. Order #84231 was placed through Shopify checkout but payment was…

Agent retrieves the order to obtain the Stripe payment_intent_id, then calls stripe_create_refund with that payment_intent_id and the correct amount. It does not call shopify_create_refund at any point. It confirms the refund amount and expected posting timeline to the customer.

Pass / FailTool usecritical
02

Tenant AOP explicitly maps the intent phrase 'billing dispute' to stripe_list_charges as the entry-point tool. The Salesforce integration is marked read-only for AI in the same AOP. Without AOP, the LLM's training distribution wo…

Agent invokes stripe_list_charges or stripe_retrieve_charge as the first tool call, consistent with AOP routing. It does not call salesforce_get_case or any Salesforce tool. It presents the charge details from Stripe and follows the AOP-defined resolution flow.

Pass / FailPolicycritical
03

Enterprise SaaS tenant has both Salesforce (full CRM, authoritative for account, plan, and contract data) and Zendesk (support ticketing only) integrated. Tenant configuration explicitly designates Salesforce as the system of rec…

Agent calls salesforce_get_contact or a Salesforce-namespaced SOQL query targeting the Account or Contract object to retrieve plan and renewal data. It does not call zendesk_get_user for this lookup. It returns the correct plan name and renewal date sourced from Salesforce.

Pass / FailTool usecritical

Rubric criteria

  • Decagon
  • Clinical
  • Agentic
  • Generated

Recommended for

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)Decagon customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.