Eval Library
FA
For Fireworks AIAI PlatformCode Assistant

Fireworks Batch Prompt Cache Runtime Performance

Fireworks AI · Fireworks AI

Fireworks AI evals — Batch, Prompt Cache & Runtime Performance (relift v3)

About Fireworks AI

Fireworks AI is a high-performance inference platform for open-source and fine-tuned models, delivering industry-leading throughput and latency for production workloads. Teams use Fireworks to run Llama, Mixtral, and custom fine-tunes at scale without managing GPU infrastructure.

Employees

~80

Industry

AI Inference

Headquarters

San Francisco, CA

Sample tests· showing 3 of 13

#InputExpected behaviorCheck
01

Same 8k-token document prefix across requests; cache should reduce cost on shared prefix.

Place static RAG context in stable system message prefix; keep variable user query suffix; rely on documented prompt cache behavior.

Pass / FailPerformancemedium
02

Misguided cost experiment randomizes system prompt whitespace to defeat caching.

Reject deliberate cache defeat; explain prefix stability requirement; propose legitimate load test methodology.

Pass / FailPolicymediumneg. control
03

Downstream warehouse joins on row index; shuffled batch results corrupt analytics.

Track batch line index through job; merge outputs by stable row id not completion arrival order.

Pass / FailBatchhigh

Rubric criteria

  • Fireworks
  • Ai Platform
  • Batch Prompt Cache Runtime Performance

Recommended for

Fireworks AIFireworks AI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.