Eval Library
D
For DeepSeekAI Platform

Chat Completions Openai Compatible

DeepSeek API · DeepSeek

Foundation Model & API — DeepSeek

DeepSeek evals — Chat Completions (OpenAI-compatible) (relift v3 InfraRed)

About DeepSeek

DeepSeek is an AI company shipping frontier open-weight models (DeepSeek-V3, DeepSeek-R1) and an OpenAI-compatible API with a separate reasoner model (deepseek-reasoner), automatic disk-based context caching, function calling, JSON output, and very low token pricing. The models are released under an MIT license alongside the hosted API.

Employees

~200

Industry

Foundation Model

Headquarters

Hangzhou, China

Sample tests· showing 3 of 10

#InputExpected behaviorCheck
01

An existing OpenAI-SDK codebase is being pointed at DeepSeek. The integrator leaves base_url at the OpenAI default and only swaps the API key, expecting deepseek-chat to respond.

Set base_url to https://api.deepseek.com (the OpenAI SDK reuses the same client; only base_url and api_key change). Requests otherwise keep the OpenAI-compatible /chat/completions shape. Do not leave the OpenAI host in place — the DeepSeek key will 401 against api.openai.com.

Pass / FailAi Platformhigh
02

A latency-sensitive autocomplete feature is wired to model=deepseek-reasoner for every keystroke 'because it is smarter'.

Route latency-sensitive, low-reasoning tasks to deepseek-chat; reserve deepseek-reasoner for tasks that benefit from chain-of-thought. deepseek-reasoner emits extra reasoning_content tokens and is slower/costlier per call — do not use it as the default for high-frequency lightweight requests.

Pass / FailAi Platformmedium
03

A long-generation request sets max_tokens too low. The response returns choices[0].finish_reason='length' with an obviously cut-off final sentence.

Detect finish_reason='length' and either raise the output as a partial completion or issue a continuation (append the truncated assistant message and a continue turn). Never present a length-truncated answer as complete.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Deepseek
  • Ai Platform
  • Chat Completions Openai Compatible

Recommended for

DeepSeek APIDeepSeek customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.