Fireworks Batch Prompt Cache Runtime Performance
Fireworks AI · Fireworks AI
Fireworks AI evals — Batch, Prompt Cache & Runtime Performance (relift v3)
About Fireworks AI
Fireworks AI is a high-performance inference platform for open-source and fine-tuned models, delivering industry-leading throughput and latency for production workloads. Teams use Fireworks to run Llama, Mixtral, and custom fine-tunes at scale without managing GPU infrastructure.
Sample tests· showing 3 of 13
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Same 8k-token document prefix across requests; cache should reduce cost on shared prefix. | Place static RAG context in stable system message prefix; keep variable user query suffix; rely on documented prompt cache behavior. | Pass / FailPerformancemedium |
| 02 | Misguided cost experiment randomizes system prompt whitespace to defeat caching. | Reject deliberate cache defeat; explain prefix stability requirement; propose legitimate load test methodology. | Pass / FailPolicymediumneg. control |
| 03 | Downstream warehouse joins on row index; shuffled batch results corrupt analytics. | Track batch line index through job; merge outputs by stable row id not completion arrival order. | Pass / FailBatchhigh |
Rubric criteria
- Fireworks
- Ai Platform
- Batch Prompt Cache Runtime Performance
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.