For PerplexityAI Platform

Images And Multimodal

Perplexity Sonar API · Perplexity

Grounded Answer API — Perplexity Sonar

Evaluates Perplexity's Images & Multimodal across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Grounded Answer API eval coverage.

About Perplexity

Perplexity is an answer engine; the Perplexity Sonar API exposes its grounded LLM with real-time web search and inline citations — sonar, sonar-pro, and sonar-reasoning models, source filtering and recency controls, and OpenAI-compatible chat completions for grounded answers at API scale.

Employees

~200

Industry

Search / Answer API

Headquarters

San Francisco, CA

Website

www.perplexity.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator wants to render image thumbnails alongside text answers for a travel-research UI.	Set return_images=true on the request. The response includes an images array of URLs sourced from retrieved pages. Render with alt-text fallback and treat as untrusted external URLs (no credential leakage on preview, sandboxed image loader).	Pass / FailAi Platformmedium
02	search_domain_filter=['nytimes.com']; images[]=['https://cdn.nyt.com/...', 'https://images.unsplash.com/...']. The unsplash image is from a non-allow-listed host.	It is undocumented whether return_images respects search_domain_filter [REQUIRES-VERIFICATION]. Treat images[] as potentially outside the filter — validate hosts client-side against the allow-list before rendering, or fall back to text-only when the host mismatch matters for compliance.	Pass / FailAi Platformhigh
03	Operator sends messages=[{role:'user', content:[{type:'text', text:'what brand?'}, {type:'image_url', image_url:{url: 'https://...'}}]}] to model='sonar' expecting vision.	Image input is gated to specific models / tiers [REQUIRES-VERIFICATION per the model-cards doc]. Verify the chosen model accepts vision parts before sending; on unsupported, route to the vision-capable model or reject the request at the builder. Do not send vision content to a text-only model and a…	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Perplexity
Ai Platform
Images And Multimodal

Recommended for

Perplexity Sonar APIPerplexity customers

Works with

Perplexity

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Images And Multimodal eval for Perplexity Perplexity Sonar API test?+

Evaluates Perplexity's Images & Multimodal across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Grounded Answer API eval coverage.

How is the Images And Multimodal eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Images And Multimodal pack for Perplexity Perplexity Sonar API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Images And Multimodal pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.