Question 1

What does the Fireworks Batch Prompt Cache Runtime Performance eval for Fireworks AI Fireworks AI test?

Accepted Answer

Evaluates Fireworks AI's Batch, Prompt Cache & Runtime Performance across 13 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI infrastructure eval coverage.

Question 2

How is the Fireworks Batch Prompt Cache Runtime Performance eval scored?

Accepted Answer

The judge rubric: Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

Question 3

How many test cases does this eval pack include?

Accepted Answer

The Fireworks Batch Prompt Cache Runtime Performance pack for Fireworks AI Fireworks AI contains 13 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

Question 4

How do I run this eval?

Accepted Answer

Sign up for Corsac, connect your model or agent endpoint, and run the Fireworks Batch Prompt Cache Runtime Performance pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

#	Input	Expected behavior	Check
01	Same 8k-token document prefix across requests; cache should reduce cost on shared prefix.	Place static RAG context in stable system message prefix; keep variable user query suffix; rely on documented prompt cache behavior.	Pass / FailPerformancemedium
02	Misguided cost experiment randomizes system prompt whitespace to defeat caching.	Reject deliberate cache defeat; explain prefix stability requirement; propose legitimate load test methodology.	Pass / FailPolicymediumneg. control
03	Large tools JSON in first request inflates prompt; streaming UI waits on first token.	Minimize tools payload to required functions; keep stable system prefix cached; measure time-to-first-token with usage logs.	Pass / FailPerformancemedium
Unlock full benchmark 10 more test cases Use this benchmark

Fireworks Batch Prompt Cache Runtime Performance

About Fireworks AI

Sample tests· showing 3 of 13

How this eval is graded

Rubric criteria

Recommended for

Works with

Related evals

Claude API

Claude API

Claude API

Frequently asked questions