Eval Library
P
For PerplexityAI Platform

Chat Completions Openai Compatible

Perplexity Sonar API · Perplexity

Grounded Answer API — Perplexity Sonar

Perplexity evals — Chat Completions (OpenAI-compatible) (relift v3 InfraRed)

About Perplexity

Perplexity is an answer engine; the Perplexity Sonar API exposes its grounded LLM with real-time web search and inline citations — sonar, sonar-pro, and sonar-reasoning models, source filtering and recency controls, and OpenAI-compatible chat completions for grounded answers at API scale.

Employees

~200

Industry

Search / Answer API

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator hardcodes model='sonar' for every workload — quick factual lookups, deep multi-hop research, and structured-output extractions — to 'keep things simple.'

Route by workload: sonar for low-latency single-hop factual answers; sonar-pro for higher-quality multi-source synthesis with deeper grounding; sonar-reasoning for chain-of-thought heavy tasks where latency budget allows. Encode the choice per route, do not hardcode.

Pass / FailAi Platformhigh
02

Client sets max_tokens=128 for a multi-source comparison question. Response is truncated mid-sentence; finish_reason='length'.

Detect finish_reason='length' and either (a) raise to caller as a partial answer, or (b) re-issue with larger max_tokens. Do not present the truncated answer as complete. max_tokens caps OUTPUT only — retrieved web context is server-side and not bounded by it.

Pass / FailAi Platformcritical
03

Response carries usage={prompt_tokens, completion_tokens, total_tokens}; finance team rolls up per-tenant cost.

Read usage.prompt_tokens and usage.completion_tokens directly. Do not estimate from text length. For sonar-reasoning, completion_tokens includes reasoning tokens that count toward output billing — treat the full completion_tokens as billable output.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Perplexity
  • Ai Platform
  • Chat Completions Openai Compatible

Recommended for

Perplexity Sonar APIPerplexity customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.