Gemini
For GeminiAI PlatformSearch QnaAnswer RelevanceTool Selection

Multimodal Reasoning And Confidence V1

Multimodal Reasoning and Confidence · Gemini

Interpret image or document context carefully, state confidence levels, and avoid overclaiming when evidence is partial.

About Gemini

Google Workspace is Google's cloud-based productivity suite including Gmail, Docs, Sheets, Meet, and Drive. Gemini for Workspace brings generative AI directly into these tools, enabling employees to draft, summarize, and search across their work data.

Employees

~182,000

Industry

Cloud Productivity & AI

Headquarters

Mountain View, CA

Sample tests· showing 3 of 6

#InputExpected behaviorCheck
01

Gemini analyst is given an image or document snapshot and needs to explain what it most likely shows based on the visible evidence. Keep confidence calibrated and point out the most relevant visual details.

resolution_label: answer · should_escalate: false · risk_level: low

Pass / Fail
02

Gemini support_specialist is given an image or document snapshot and needs to explain what it most likely shows based on the visible evidence. Keep confidence calibrated and point out the most relevant visual details.

resolution_label: answer · should_escalate: false · risk_level: low

Pass / Fail
03

Gemini researcher is given an image or document snapshot and needs to explain what it most likely shows based on the visible evidence. Keep confidence calibrated and point out the most relevant visual details.

resolution_label: answer · should_escalate: false · risk_level: low

Pass / Fail

How this eval is graded

Evaluate whether the response is grounded, policy-aligned, and operationally useful. Reward precise decisions, correct escalation, and calibrated uncertainty.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Image Understanding
  • Confidence Calibration
  • Document Context

Recommended for

Image UnderstandingConfidence CalibrationDocument Context

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.