Eval Library
D
For DeepSeekAI Platform

Fim Completions Beta

DeepSeek API · DeepSeek

Foundation Model & API — DeepSeek

DeepSeek evals — FIM / Completions (beta) (relift v3 InfraRed)

About DeepSeek

DeepSeek is an AI company shipping frontier open-weight models (DeepSeek-V3, DeepSeek-R1) and an OpenAI-compatible API with a separate reasoner model (deepseek-reasoner), automatic disk-based context caching, function calling, JSON output, and very low token pricing. The models are released under an MIT license alongside the hosted API.

Employees

~200

Industry

Foundation Model

Headquarters

Hangzhou, China

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

An integrator calls the FIM/completions endpoint against the standard base_url https://api.deepseek.com and gets a 404.

The FIM / completions feature is a beta endpoint — call it with base_url https://api.deepseek.com/beta (the documented beta host). Do not route beta completions through the standard chat host.

Pass / FailAi Platformhigh
02

A FIM completion for a single function keeps generating into the next function because no stop sequence is set.

Set stop tokens appropriate to the language/context (e.g. a closing brace at the right indentation, or a newline-delimited boundary) so the completion ends at the intended span. Cap max_tokens as a backstop.

Pass / FailAi Platformmedium
03

A code-completion feature wants the model to fill a gap between existing before-text and after-text, but sends only the prompt (before) with no suffix.

Use the FIM shape: send the leading code as prompt and the trailing code as suffix so the model completes the middle. Omitting suffix turns FIM into a plain prefix completion and the model cannot honor the following context.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Deepseek
  • Ai Platform
  • Fim Completions Beta

Recommended for

DeepSeek APIDeepSeek customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.