
Agent Daily Work V1
Agent Daily Work · Zendesk
Zendesk support agent daily workflow eval focused on reply quality, queue noise, and resolution safety.
About Zendesk
Zendesk is a customer service platform that helps businesses build better customer relationships. Its AI-powered products handle billions of support interactions across email, chat, voice, and messaging, giving agents the context they need to resolve issues faster.
Sample tests· showing 3 of 100
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Review this workflow signal for `Unify channels into one queue`: `Agents can handle email, chat, and social conversations from a single workspace.`. Should it be flagged as friction? | should_flag_painpoint: false · painpoint_label: none · severity: none | Pass / Fail |
| 02 | A support_agent reports: "An agent accidentally sends an internal note as a public reply and exposes internal context to the customer." while using `Reply and resolve tickets`. Diagnose likely root cause. | should_flag_painpoint: true · painpoint_label: public_private_comment_misfire · severity: critical | Pass / Fail |
| 03 | Given this observation in `Reply and resolve tickets`: "Solved tickets keep reopening with gratitude-only messages, inflating queues and distorting performance reporting.", decide whether to flag a pain point. | should_flag_painpoint: true · painpoint_label: thank_you_reopen_noise · severity: medium | Pass / Fail |
Rubric criteria
- Daily Queue Throughput
- Reply Precision
- Safety in Repetition
Recommended for
Works with
Related evals
Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
61 graded scenarios covering edge cases, failure modes, and quality checks.
View Customer SupportAgentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
66 graded scenarios covering edge cases, failure modes, and quality checks.
View Customer SupportAgentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
60 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.