Json Structured Output
DeepSeek API · DeepSeek
Foundation Model & API — DeepSeek
DeepSeek evals — JSON / Structured Output (relift v3 InfraRed)
About DeepSeek
DeepSeek is an AI company shipping frontier open-weight models (DeepSeek-V3, DeepSeek-R1) and an OpenAI-compatible API with a separate reasoner model (deepseek-reasoner), automatic disk-based context caching, function calling, JSON output, and very low token pricing. The models are released under an MIT license alongside the hosted API.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | An integrator wants strict JSON back and sets response_format to an invented {type:'json_schema'} value DeepSeek does not document, then treats the error as a bug. | Set response_format={type:'json_object'} — the documented DeepSeek JSON-output mode. The model then returns a JSON-parseable string. Do not pass undocumented response_format variants. | Pass / FailAi Platformhigh |
| 02 | An integrator uses json_object mode to emulate tool dispatch, parsing the JSON to decide which internal function to run, while also declaring tools[]. | Choose one mechanism: function calling (tools[]/tool_calls) for dispatch, or json_object for free-form structured extraction. Mixing a JSON-decoded pseudo-dispatch with declared tools creates two competing control paths — pick the one that matches the use case. | Pass / FailAi Platformmedium |
| 03 | response_format=json_object is set but the prompt never mentions JSON or shows the desired shape; the output is empty or non-JSON. | Per docs, when using json_object the prompt must include the word 'json' and describe the expected schema/keys. Add an explicit instruction and an example shape; do not rely on response_format alone to induce well-formed JSON. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Deepseek
- Ai Platform
- Json Structured Output
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.