
Tool Use Precision Recall Against Connected Systems
Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems) · Decagon
60 graded scenarios covering edge cases, failure modes, and quality checks.
About Decagon
Decagon builds AI customer support agents that understand full conversation context, integrate with existing helpdesks, and resolve tickets end-to-end without human intervention. Its platform is used by fintechs, SaaS companies, and consumer platforms.
Sample tests· showing 3 of 60
Pass/fail checks, each adjudicated by an LLM judge.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Tenant is a D2C apparel brand running Shopify storefront with Stripe as payment processor. Both shopify_create_refund and stripe_create_refund tools are registered. Order #84231 was placed through Shopify checkout but payment was… | Agent retrieves the order to obtain the Stripe payment_intent_id, then calls stripe_create_refund with that payment_intent_id and the correct amount. It does not call shopify_create_refund at any point. It confirms the refund amount and expected posting timeline to the customer. | Pass / FailTool usecritical |
| 02 | Tenant AOP explicitly maps the intent phrase 'billing dispute' to stripe_list_charges as the entry-point tool. The Salesforce integration is marked read-only for AI in the same AOP. Without AOP, the LLM's training distribution wo… | Agent invokes stripe_list_charges or stripe_retrieve_charge as the first tool call, consistent with AOP routing. It does not call salesforce_get_case or any Salesforce tool. It presents the charge details from Stripe and follows the AOP-defined resolution flow. | Pass / FailPolicycritical |
| 03 | Enterprise SaaS tenant has both Salesforce (full CRM, authoritative for account, plan, and contract data) and Zendesk (support ticketing only) integrated. Tenant configuration explicitly designates Salesforce as the system of rec… | Agent calls salesforce_get_contact or a Salesforce-namespaced SOQL query targeting the Account or Contract object to retrieve plan and renewal data. It does not call zendesk_get_user for this lookup. It returns the correct plan name and renewal date sourced from Salesforce. | Pass / FailTool usecritical |
Rubric criteria
- Decagon
- Clinical
- Agentic
- Generated
Recommended for
Works with
Related evals
Ambient clinical documentation
49 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
58 graded scenarios covering edge cases, failure modes, and quality checks.
View Medical & Clinical AIAmbient clinical documentation
56 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.