
Eval directory
Evals for Google Workspace
9 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Google Workspace AI products.
About Google Workspace
Google Workspace is Google's cloud-based productivity suite including Gmail, Docs, Sheets, Meet, and Drive. Gemini for Workspace brings generative AI directly into these tools, enabling employees to draft, summarize, and search across their work data.
Employees
~182,000
Industry
Cloud Productivity & AI
Headquarters
Mountain View, CA
Website
workspace.google.comAvailable eval packs for Google Workspace
9 packs ready to run.
Calendar Conflicts V1
Workspace calendar eval for time-zone collisions, meeting conflicts, and scheduling policy.
Calendar Scheduling Conflicts V1
CorrectnessSchedule meetings across time zones, detect conflicts, and ask for clarification when constraints cannot be satisfied.
Docs To Action V1
PII LeakageExtract action items, owners, and next steps from Workspace documents without inventing missing details.
Drive Search Share V1
Answer RelevanceTask CompletionWorkspace file search and sharing eval for permissions, governance, and safe link handling.
Drive Search Share Governance V1
Find the right Drive artifact, respect permission boundaries, and handle external-sharing requests with discipline.
Drive Sharing Governance V1
Eval for Drive search, file selection, permission discipline, and sharing governance under realistic collaboration requests.
Gmail Triage And Reply V1
ToneTask CompletionPrioritize inbox messages, draft appropriate replies, and escalate only high-risk mail with strong justification.
Gmail Triage V1
ConcisenessTask CompletionWorkspace inbox triage eval for prioritization, response drafting, and policy-safe escalation.
Gmail Calendar Coordination V1
ConcisenessToneTask CompletionEval for Gmail triage, response drafting, and calendar scheduling when multiple stakeholders and time zones are involved.
Why eval Google Workspace AI
Google Workspace's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Google Workspace measures four dimensions teams care about most when deploying document agents agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Google Workspace's public product surface and runnable in Corsac with your own data.