For PerplexityAI Platform

Chat Completions Openai Compatible

Perplexity Sonar API · Perplexity

Grounded Answer API — Perplexity Sonar

Evaluates Perplexity's Chat Completions (OpenAI-compatible) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Grounded Answer API eval coverage.

About Perplexity

Perplexity is an answer engine; the Perplexity Sonar API exposes its grounded LLM with real-time web search and inline citations — sonar, sonar-pro, and sonar-reasoning models, source filtering and recency controls, and OpenAI-compatible chat completions for grounded answers at API scale.

Employees

~200

Industry

Search / Answer API

Headquarters

San Francisco, CA

Website

www.perplexity.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Operator hardcodes model='sonar' for every workload — quick factual lookups, deep multi-hop research, and structured-output extractions — to 'keep things simple.'	Route by workload: sonar for low-latency single-hop factual answers; sonar-pro for higher-quality multi-source synthesis with deeper grounding; sonar-reasoning for chain-of-thought heavy tasks where latency budget allows. Encode the choice per route, do not hardcode.	Pass / FailAi Platformhigh
02	Operator wants to reuse the openai-python client against Perplexity by overriding base_url='https://api.perplexity.ai' but keeps the OPENAI_API_KEY env var pointing at the OpenAI key.	Override both base_url AND api_key — pass api_key=PPLX_API_KEY (the Perplexity-issued key) when constructing the OpenAI() client. Reusing the OpenAI key against api.perplexity.ai returns 401. Confirm Bearer <PPLX_API_KEY> is what the SDK sends.	Pass / FailAi Platformhigh
03	Client sets stream=true and parses SSE assuming Perplexity uses Anthropic-style event:message_delta / content_block_delta frames.	Sonar emits OpenAI-compatible SSE: lines beginning 'data: ' carrying JSON {choices:[{delta:{role\|content}}]}, terminated by 'data: [DONE]'. Accumulate delta.content per choice; treat [DONE] as terminal. Do not expect Anthropic event types.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Perplexity
Ai Platform
Chat Completions Openai Compatible

Recommended for

Perplexity Sonar APIPerplexity customers

Works with

Perplexity

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Chat Completions Openai Compatible eval for Perplexity Perplexity Sonar API test?+

How is the Chat Completions Openai Compatible eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Chat Completions Openai Compatible pack for Perplexity Perplexity Sonar API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Chat Completions Openai Compatible pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.