
Ingest Painpoint Eval V2
Zendesk Support / Agent Workspace · Zendesk
Zendesk ingest eval pack, persona-balanced and source-traceable per test row.
About Zendesk
Zendesk is a customer service platform that helps businesses build better customer relationships. Its AI-powered products handle billions of support interactions across email, chat, voice, and messaging, giving agents the context they need to resolve issues faster.
Sample tests· showing 3 of 100
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Review this workflow signal for `Audit rule changes`: `Rule revision history allows admins to review and iterate trigger logic over time.`. Should it be flagged as friction? | should_flag_painpoint: false · painpoint_label: none · severity: none | Pass / Fail |
| 02 | A support_agent reports: "An agent accidentally sends an internal note as a public reply and exposes internal context to the customer." while using `Reply and resolve tickets`. Diagnose likely root cause. | should_flag_painpoint: true · painpoint_label: public_private_comment_misfire · severity: critical | Pass / Fail |
| 03 | Given this observation in `Operate at high ticket volume`: "When ticket volume is high, the interface feels cluttered and slower for agents triaging large queues.", decide whether to flag a pain point. | should_flag_painpoint: true · painpoint_label: ui_clutter_under_load · severity: medium | Pass / Fail |
How this eval is graded
Score whether the model correctly identifies support workflow pain points. Evaluate: - Pain-point detection correctness - Severity calibration - Actionability of recommended fix - Persona-aware reasoning quality - Avoiding false positives on neutral observations
Pass threshold: a criterion passes at a judge score of 4 or higher.
Rubric criteria
- Ingest Pipeline Fault Detection
- Evidence-Linked Diagnosis
- Remediation Prioritization
Recommended for
Works with
Related evals
Agentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
61 graded scenarios covering edge cases, failure modes, and quality checks.
View Customer SupportAgentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
66 graded scenarios covering edge cases, failure modes, and quality checks.
View Customer SupportAgentic AI for enterprise customer support (deflection, resolution, escalation, tool-use against connected systems)
60 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.