Google Workspace
For Google WorkspaceDocument AgentsToneTask Completion

Gmail Triage And Reply V1

Gmail Triage and Reply · Google Workspace

Prioritize inbox messages, draft appropriate replies, and escalate only high-risk mail with strong justification.

About Google Workspace

Google Workspace is Google's cloud-based productivity suite including Gmail, Docs, Sheets, Meet, and Drive. Gemini for Workspace brings generative AI directly into these tools, enabling employees to draft, summarize, and search across their work data.

Employees

~182,000

Industry

Cloud Productivity & AI

Headquarters

Mountain View, CA

Sample tests· showing 3 of 6

#InputExpected behaviorCheck
01

Google Workspace support_lead received an urgent customer email about a delayed rollout and wants a reply that acknowledges the issue, confirms follow-up timing, and avoids overpromising. Draft the response.

resolution_label: draft_reply · should_escalate: false · risk_level: medium

Pass / Fail
02

Google Workspace sales_ops received an urgent customer email about a delayed rollout and wants a reply that acknowledges the issue, confirms follow-up timing, and avoids overpromising. Draft the response.

resolution_label: draft_reply · should_escalate: false · risk_level: medium

Pass / Fail
03

Google Workspace legal_coordinator received an urgent customer email about a delayed rollout and wants a reply that acknowledges the issue, confirms follow-up timing, and avoids overpromising. Draft the response.

resolution_label: draft_reply · should_escalate: false · risk_level: medium

Pass / Fail

How this eval is graded

Evaluate whether the response is grounded, policy-aligned, and operationally useful. Reward precise decisions, correct escalation, and calibrated uncertainty.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Inbox Triage
  • Reply Drafting
  • Threat Detection

Recommended for

Inbox TriageReply DraftingThreat Detection

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.