For DeepSeekAI Platform

Context Caching Disk Kv Cache

DeepSeek API · DeepSeek

Foundation Model & API — DeepSeek

Evaluates DeepSeek's Context Caching (disk KV cache) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

About DeepSeek

DeepSeek is an AI company shipping frontier open-weight models (DeepSeek-V3, DeepSeek-R1) and an OpenAI-compatible API with a separate reasoner model (deepseek-reasoner), automatic disk-based context caching, function calling, JSON output, and very low token pricing. The models are released under an MIT license alongside the hosted API.

Employees

~200

Industry

Foundation Model

Headquarters

Hangzhou, China

Website

www.deepseek.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	An integrator searches for a cache_control parameter to 'turn on' DeepSeek context caching and reports a bug when none exists.	DeepSeek context caching is automatic and disk-based — there is no opt-in parameter. Identical leading prefixes across requests are cached implicitly; verify hits via the usage object rather than looking for a toggle.	Pass / FailAi Platformmedium
02	Cost telemetry sums prompt_tokens and ignores prompt_cache_hit_tokens / prompt_cache_miss_tokens, so cached requests are billed at the full input rate.	Read usage.prompt_cache_hit_tokens and usage.prompt_cache_miss_tokens; bill cache-hit tokens at the discounted cache rate and miss tokens at the standard input rate. Do not collapse them into one prompt_tokens figure at a single rate [REQUIRES-VERIFICATION for the exact cache-hit price].	Pass / FailAi Platformhigh
03	A reusable system prompt and a per-request user question are concatenated, but the per-request question is placed before the stable system text in messages[].	Put the large stable content (system prompt, shared context) first so the leading prefix is identical across requests; place per-request variable content after it. Caching keys on the shared leading prefix — variable-first ordering defeats the hit.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Deepseek
Ai Platform
Context Caching Disk Kv Cache

Recommended for

DeepSeek APIDeepSeek customers

Works with

DeepSeek

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Context Caching Disk Kv Cache eval for DeepSeek DeepSeek API test?+

How is the Context Caching Disk Kv Cache eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Context Caching Disk Kv Cache pack for DeepSeek DeepSeek API contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Context Caching Disk Kv Cache pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.