Eval directory
Evals for Fireworks AI
6 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Fireworks AI AI products.
About Fireworks AI
Fireworks AI is a high-performance inference platform for open-source and fine-tuned models, delivering industry-leading throughput and latency for production workloads. Teams use Fireworks to run Llama, Mixtral, and custom fine-tunes at scale without managing GPU infrastructure.
Available eval packs for Fireworks AI
6 packs ready to run.
Fireworks Batch Prompt Cache Runtime Performance
Fireworks AI evals — Batch, Prompt Cache & Runtime Performance (relift v3)
Fireworks Deployment Topology Capacity
Fireworks AI evals — Deployment Topology & Capacity (relift v3)
Fireworks Fine Tuning Multi Lora Serving
Fireworks AI evals — Fine-Tuning & Multi-LoRA Serving (relift v3)
Fireworks Function Calling Tool Orchestration
Fireworks AI evals — Function Calling & Tool Orchestration (relift v3)
Fireworks Safety Moderation Observability
Fireworks AI evals — Safety, Moderation & Observability (relift v3)
Fireworks Structured Outputs Grammar Constraints
Fireworks AI evals — Structured Outputs & Grammar Constraints (relift v3)
Why eval Fireworks AI AI
Fireworks AI's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Fireworks AI measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Fireworks AI's public product surface and runnable in Corsac with your own data.