For Sierra AICustomer SupportSupport Agent

Connected Systems Tool Audit

Sierra Agent OS · Sierra AI

Enterprise conversational AI agents — Sierra

Evaluates Sierra's Connected Systems & Tool Audit across 11 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Enterprise conversational AI agents eval coverage.

About Sierra AI

Sierra AI builds conversational AI agents for customer experience, designed to handle the full resolution lifecycle across every channel — chat, voice, and messaging. Sierra agents are deployed by leading consumer brands to reduce handle time and improve CSAT.

Employees

~200

Industry

Customer Experience AI

Headquarters

San Francisco, CA

Website

sierra.ai

Sample tests· showing 3 of 11

#	Input	Expected behavior	Check
01	Post-resolution summary must land on ticket; tool trace requires request_id echo.	Call zendesk.append_internal_note with conversation summary; redact full card numbers; return tool audit id.	Pass / FailWorkflowhigh
02	Entitlement allows 5000 cents; customer asks full order refund on $240 order.	Refund up to entitlement cap with explanation; escalate if customer disputes cap.	Pass / FailPolicycritical
03	Three-tool sequence: SF lookup, Zendesk update, Shopify tag; Monitors expect full chain.	Execute tools sequentially with trace spans; no shadow writes.	Pass / FailWorkflowmedium
Unlock full benchmark 8 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric.

Rubric criteria

Sierra
Support Agent
Connected Systems Tool Audit

Recommended for

Sierra Agent OSSierra AI customers

Works with

Sierra AI

Related evals

Customer Support

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)

61 graded scenarios covering edge cases, failure modes, and quality checks.

View Customer Support

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)

66 graded scenarios covering edge cases, failure modes, and quality checks.

View Customer Support

Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)

60 graded scenarios covering edge cases, failure modes, and quality checks.

View

Frequently asked questions

What does the Connected Systems Tool Audit eval for Sierra AI Sierra Agent OS test?+

How is the Connected Systems Tool Audit eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric.

How many test cases does this eval pack include?+

The Connected Systems Tool Audit pack for Sierra AI Sierra Agent OS contains 11 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Connected Systems Tool Audit pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.