For PerplexityAI Platform

Reasoning Models Sonar Reasoning

Perplexity Sonar API · Perplexity

Grounded Answer API — Perplexity Sonar

Evaluates Perplexity's Reasoning Models (Sonar Reasoning) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Grounded Answer API eval coverage.

About Perplexity

Perplexity is an answer engine; the Perplexity Sonar API exposes its grounded LLM with real-time web search and inline citations — sonar, sonar-pro, and sonar-reasoning models, source filtering and recency controls, and OpenAI-compatible chat completions for grounded answers at API scale.

Employees

~200

Industry

Search / Answer API

Headquarters

San Francisco, CA

Website

www.perplexity.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator routes a user-facing autocomplete to model='sonar-reasoning' and sees p95 latency blow past their 1.5s SLO.	sonar-reasoning carries a meaningfully higher latency floor than sonar or sonar-pro — the chain-of-thought tokens take time. Route interactive / low-latency surfaces to sonar; reserve sonar-reasoning for offline / batch / high-judgment tasks. Verify against your SLO with measurements, not the model…	Pass / FailAi Platformhigh
02	Finance reconciliation: sonar-reasoning request returns completion_tokens=4200 for a 600-token visible answer. Operator assumes a billing bug.	Reasoning tokens are part of completion_tokens — billed at the output rate. The 4200 - 600 ≈ 3600 reasoning-token overhead is real billable output. Tag sonar-reasoning usage distinctly in cost telemetry and forecast against the reasoning multiplier, not the visible-answer length.	Pass / FailAi Platformhigh
03	sonar-reasoning's response.choices[0].message.content may begin with a chain-of-thought trace before the final answer.	Confirm the response shape — whether reasoning is in a separate field (e.g., reasoning) or inlined in content with a marker — against the current model-card doc [REQUIRES-VERIFICATION]. Extract the final answer for user-facing rendering; preserve the raw content for audit. Do not show raw reasoning…	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Perplexity
Ai Platform
Reasoning Models Sonar Reasoning

Recommended for

Perplexity Sonar APIPerplexity customers

Works with

Perplexity

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Reasoning Models Sonar Reasoning eval for Perplexity Perplexity Sonar API test?+

How is the Reasoning Models Sonar Reasoning eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Reasoning Models Sonar Reasoning pack for Perplexity Perplexity Sonar API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Reasoning Models Sonar Reasoning pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.