For GeminiAI PlatformSearch QnaTool Selection

Tool Routing And Multimodal V1

Tool and Multimodal Router · Gemini

Eval for choosing the right tool, handling image/document context, and calibrating confidence when visual evidence is partial.

About Gemini

Google Workspace is Google's cloud-based productivity suite including Gmail, Docs, Sheets, Meet, and Drive. Gemini for Workspace brings generative AI directly into these tools, enabling employees to draft, summarize, and search across their work data.

Employees

~182,000

Industry

Cloud Productivity & AI

Headquarters

Mountain View, CA

Website

workspace.google.com

Sample tests· showing 3 of 4

#	Input	Expected behavior	Check
01	choose correct tool Gemini / Tool and Multimodal Router: The user wants to find the source file and then schedule the review. Decide whether to use Drive search or Calendar handling first, and explain why. Respond for persona `operations_analyst`.	outcome_label: route_tool_choice · confidence: high · required_actions: choose the correct first tool, justify the routing decision…	Pass / Fail
02	multimodal chart reading Gemini / Tool and Multimodal Router: A screenshot shows a chart with an unclear legend. Interpret only what is visible and clearly state any uncertainty. Respond for persona `data_analyst`.	outcome_label: analyze_with_uncertainty · confidence: medium · required_actions: state the visible trend, label uncertainty…	Pass / Fail
03	document plus image Gemini / Tool and Multimodal Router: A PDF and a small image together provide enough context to answer a straightforward question. Use the minimal necessary reasoning and keep the answer grounded. Respond for persona `customer_su…	outcome_label: grounded_multimodal_answer · confidence: high · required_actions: combine the document and image evidence, answer directly…	Pass / Fail
Use this eval

How this eval is graded

Evaluate tool choice, multimodal reasoning, and confidence calibration. Passing responses should route cleanly and avoid unnecessary tool calls or overconfident visual claims.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

Tool Selection
Multimodal Reasoning
Confidence Calibration

Recommended for

Tool SelectionMultimodal ReasoningConfidence Calibration

Works with

Gemini

Related evals

AI Platform

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.

Tool Routing And Multimodal V1

About Gemini

Sample tests· showing 3 of 4

How this eval is graded

Rubric criteria

Recommended for

Works with

Related evals

Claude API

Claude API

Claude API