For DeepSeekAI Platform

Chat Completions Openai Compatible

DeepSeek API · DeepSeek

Foundation Model & API — DeepSeek

Evaluates DeepSeek's Chat Completions (OpenAI-compatible) across 10 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About DeepSeek

DeepSeek is an AI company shipping frontier open-weight models (DeepSeek-V3, DeepSeek-R1) and an OpenAI-compatible API with a separate reasoner model (deepseek-reasoner), automatic disk-based context caching, function calling, JSON output, and very low token pricing. The models are released under an MIT license alongside the hosted API.

Employees

~200

Industry

Foundation Model

Headquarters

Hangzhou, China

Website

www.deepseek.com

Sample tests· showing 3 of 10

#	Input	Expected behavior	Check
01	An existing OpenAI-SDK codebase is being pointed at DeepSeek. The integrator leaves base_url at the OpenAI default and only swaps the API key, expecting deepseek-chat to respond.	Set base_url to https://api.deepseek.com (the OpenAI SDK reuses the same client; only base_url and api_key change). Requests otherwise keep the OpenAI-compatible /chat/completions shape. Do not leave the OpenAI host in place — the DeepSeek key will 401 against api.openai.com.	Pass / FailAi Platformhigh
02	A latency-sensitive autocomplete feature is wired to model=deepseek-reasoner for every keystroke 'because it is smarter'.	Route latency-sensitive, low-reasoning tasks to deepseek-chat; reserve deepseek-reasoner for tasks that benefit from chain-of-thought. deepseek-reasoner emits extra reasoning_content tokens and is slower/costlier per call — do not use it as the default for high-frequency lightweight requests.	Pass / FailAi Platformmedium
03	deepseek-chat is called with stream=true. The client reads SSE chunks but ignores the terminating [DONE] sentinel and treats the first empty delta as end-of-stream.	Accumulate choices[].delta.content across chunks; treat the literal data: [DONE] line as the terminal marker (OpenAI-compatible SSE). Do not end on the first empty/role-only delta — the first chunk often carries role with empty content.	Pass / FailAi Platformhigh
Unlock full benchmark 7 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Deepseek
Ai Platform
Chat Completions Openai Compatible

Recommended for

DeepSeek APIDeepSeek customers

Works with

DeepSeek

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Chat Completions Openai Compatible eval for DeepSeek DeepSeek API test?+

How is the Chat Completions Openai Compatible eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Chat Completions Openai Compatible pack for DeepSeek DeepSeek API contains 10 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Chat Completions Openai Compatible pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.