Gemini
For GeminiAI PlatformSearch Qna

Multimodal Reasoning V1

Multimodal Reasoner · Gemini

Gemini eval for image/document reasoning, confidence calibration, and unsupported visual claims.

About Gemini

Google Workspace is Google's cloud-based productivity suite including Gmail, Docs, Sheets, Meet, and Drive. Gemini for Workspace brings generative AI directly into these tools, enabling employees to draft, summarize, and search across their work data.

Employees

~182,000

Industry

Cloud Productivity & AI

Headquarters

Mountain View, CA

Sample tests· showing 3 of 6

#InputExpected behaviorCheck
01screenshot reading

A screenshot shows a dashboard with a visible metric trend. Describe only what is visible in the image and avoid inferring causes that are not shown.

resolution_label: describe_visible · should_infer_cause: false · visual_fidelity: true

Pass / Fail
02chart interpretation

Interpret a chart and explain the trend. If the chart is too blurry to support a confident answer, say so instead of guessing the values.

resolution_label: interpret_with_confidence · should_guess_values: false · confidence_required: true

Pass / Fail
03document photo extraction

A photo of a printed agenda contains several bullet points. Extract the visible agenda items faithfully and do not add items that were not printed.

resolution_label: extract_visible_items · should_add_unseen_items: false · mode: faithful_extraction

Pass / Fail

How this eval is graded

Evaluate whether the response is grounded, policy-aligned, and operationally useful. Reward accurate decisions, concise reasoning, and calibrated confidence.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Image Reasoning
  • Document Reasoning
  • Confidence Calibration

Recommended for

Image ReasoningDocument ReasoningConfidence Calibration

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.