Gemini
For GeminiAI PlatformSearch QnaTool Selection

Tool Routing And Multimodal V1

Tool and Multimodal Router · Gemini

Eval for choosing the right tool, handling image/document context, and calibrating confidence when visual evidence is partial.

About Gemini

Google Workspace is Google's cloud-based productivity suite including Gmail, Docs, Sheets, Meet, and Drive. Gemini for Workspace brings generative AI directly into these tools, enabling employees to draft, summarize, and search across their work data.

Employees

~182,000

Industry

Cloud Productivity & AI

Headquarters

Mountain View, CA

Sample tests· showing 3 of 4

#InputExpected behaviorCheck
01choose correct tool

Gemini / Tool and Multimodal Router: The user wants to find the source file and then schedule the review. Decide whether to use Drive search or Calendar handling first, and explain why. Respond for persona `operations_analyst`.

outcome_label: route_tool_choice · confidence: high · required_actions: choose the correct first tool, justify the routing decision…

Pass / Fail
02multimodal chart reading

Gemini / Tool and Multimodal Router: A screenshot shows a chart with an unclear legend. Interpret only what is visible and clearly state any uncertainty. Respond for persona `data_analyst`.

outcome_label: analyze_with_uncertainty · confidence: medium · required_actions: state the visible trend, label uncertainty…

Pass / Fail
03document plus image

Gemini / Tool and Multimodal Router: A PDF and a small image together provide enough context to answer a straightforward question. Use the minimal necessary reasoning and keep the answer grounded. Respond for persona `customer_su…

outcome_label: grounded_multimodal_answer · confidence: high · required_actions: combine the document and image evidence, answer directly…

Pass / Fail

How this eval is graded

Evaluate tool choice, multimodal reasoning, and confidence calibration. Passing responses should route cleanly and avoid unnecessary tool calls or overconfident visual claims.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Tool Selection
  • Multimodal Reasoning
  • Confidence Calibration

Recommended for

Tool SelectionMultimodal ReasoningConfidence Calibration

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.