
Admin Workflow Safety V1
Admin Workflow Safety · Zendesk
Zendesk admin workflow eval focused on automation reliability, trigger safety, and configuration clarity.
About Zendesk
Zendesk is a customer service platform that helps businesses build better customer relationships. Its AI-powered products handle billions of support interactions across email, chat, voice, and messaging, giving agents the context they need to resolve issues faster.
Sample tests· showing 3 of 100
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Review this workflow signal for `Audit rule changes`: `Rule revision history allows admins to review and iterate trigger logic over time.`. Should it be flagged as friction? | should_flag_painpoint: false · painpoint_label: none · severity: none | Pass / Fail |
| 02 | A support_admin reports: "A trigger update unexpectedly changes downstream trigger behavior and agents report unpredictable notifications." while using `Configure triggers and automations`. Diagnose likely root cause. | should_flag_painpoint: true · painpoint_label: trigger_order_coupling · severity: high | Pass / Fail |
| 03 | Given this observation in `Configure triggers and automations`: "After enabling a time-based automation, extra ticket updates fire and status changes loop unexpectedly.", decide whether to flag a pain point. | should_flag_painpoint: true · painpoint_label: trigger_automation_interference · severity: high | Pass / Fail |
Rubric criteria
- Automation Safety
- Configuration Reliability
- Operational Guardrails
Recommended for
Works with
Related evals
Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
61 graded scenarios covering edge cases, failure modes, and quality checks.
View Customer SupportAgentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
66 graded scenarios covering edge cases, failure modes, and quality checks.
View Customer SupportAgentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
60 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.