
Prompt Caching
Claude API · Anthropic
Foundation Model & API — Anthropic (Claude)
Anthropic evals — Prompt Caching (relift v3 InfraRed)
About Anthropic
Anthropic is an AI safety company and the maker of Claude. Its API exposes the Claude model family (Opus, Sonnet, Haiku) with tool use, prompt caching, extended thinking, batch processing, vision, the Files and Memory tools, and the Claude Agent SDK.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent reuses a 12k-token system prompt across many requests. First request adds cache_control={type:'ephemeral'} on the system content block. | First request returns usage.cache_creation_input_tokens covering the system block. Subsequent requests within TTL return usage.cache_read_input_tokens for the same span. Verify both counters before claiming cost savings. | Pass / FailAi Platformhigh |
| 02 | Agent attempts 5 cache_control breakpoints: tools, system, messages[0], messages[2], messages[4]. | Per docs the maximum is 4 cache breakpoints in a single request. The 5th breakpoint causes invalid_request_error. Plan breakpoint placement (typically tools, system, and last 2 stable conversation prefixes). | Pass / FailAi Platformmedium |
| 03 | Large tools[] array (40 tools, 8k tokens of schemas) is reused. Place cache_control on the last tool entry. | cache_control on the trailing tool establishes a breakpoint covering the entire tools[] array (and any earlier cached prefix). Subsequent requests with the identical tools[] hit; any reordering or schema mutation invalidates. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Anthropic
- Ai Platform
- Prompt Caching
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.