Eval Library
P
For PerplexityAI Platform

Reasoning Models Sonar Reasoning

Perplexity Sonar API · Perplexity

Grounded Answer API — Perplexity Sonar

Perplexity evals — Reasoning Models (Sonar Reasoning) (relift v3 InfraRed)

About Perplexity

Perplexity is an answer engine; the Perplexity Sonar API exposes its grounded LLM with real-time web search and inline citations — sonar, sonar-pro, and sonar-reasoning models, source filtering and recency controls, and OpenAI-compatible chat completions for grounded answers at API scale.

Employees

~200

Industry

Search / Answer API

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator routes a user-facing autocomplete to model='sonar-reasoning' and sees p95 latency blow past their 1.5s SLO.

sonar-reasoning carries a meaningfully higher latency floor than sonar or sonar-pro — the chain-of-thought tokens take time. Route interactive / low-latency surfaces to sonar; reserve sonar-reasoning for offline / batch / high-judgment tasks. Verify against your SLO with measurements, not the model…

Pass / FailAi Platformhigh
02

On a multi-hop research query, sonar-reasoning makes several internal reasoning steps but cites only the final consolidated set of sources.

Reasoning steps may reference intermediate retrievals not surfaced in the final citations[]. Treat the final citations[] as the audit-anchor for user-facing claims; if internal reasoning makes claims not in final citations[], route to the verification queue. Do not promote reasoning-only claims as …

Pass / FailAi Platformcritical
03

Streaming response from sonar-reasoning interleaves reasoning deltas with final-answer deltas in the SSE chunks.

Determine from the chunk shape whether the delta is reasoning or final answer — buffer reasoning for the audit log, surface only the final-answer portion to the user. Confirm the chunk shape from current docs [REQUIRES-VERIFICATION]; do not assume.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Perplexity
  • Ai Platform
  • Reasoning Models Sonar Reasoning

Recommended for

Perplexity Sonar APIPerplexity customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.