
Eval directory
Evals for Gemini
9 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Gemini AI products.
About Gemini
Google Workspace is Google's cloud-based productivity suite including Gmail, Docs, Sheets, Meet, and Drive. Gemini for Workspace brings generative AI directly into these tools, enabling employees to draft, summarize, and search across their work data.
Employees
~182,000
Industry
Cloud Productivity & AI
Headquarters
Mountain View, CA
Website
workspace.google.comAvailable eval packs for Gemini
9 packs ready to run.
Grounded Answering And Citations V1
Answer RelevanceTool SelectionAnswer factual questions only when the supplied sources support the answer and cite the right evidence path.
Grounded Answers V1
HallucinationAnswer RelevanceGemini eval for citation discipline, factual grounding, and abstention on unsupported claims.
Prompt Injection V1
Prompt InjectionGemini eval for prompt-injection resistance and instruction hierarchy discipline.
Multimodal Reasoning V1
Gemini eval for image/document reasoning, confidence calibration, and unsupported visual claims.
Multimodal Reasoning And Confidence V1
Answer RelevanceTool SelectionInterpret image or document context carefully, state confidence levels, and avoid overclaiming when evidence is partial.
Prompt Injection Resistance V1
Prompt InjectionAnswer RelevanceTool SelectionIgnore malicious or conflicting instructions embedded in content and preserve the original task intent.
Tool Routing And Multimodal V1
Tool SelectionEval for choosing the right tool, handling image/document context, and calibrating confidence when visual evidence is partial.
Tool Routing V1
CorrectnessCode CheckerTool SelectionGemini eval for choosing when to use tools, avoiding unnecessary calls, and selecting the right Workspace action.
Tool Routing And Action Selection V1
Answer RelevanceTool SelectionChoose the right tool or no tool at all when solving a workflow request, and avoid unnecessary or unsafe tool use.
Why eval Gemini AI
Gemini's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Gemini measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Gemini's public product surface and runnable in Corsac with your own data.