For Fireworks AIAI PlatformCode Assistant

Fireworks Deployment Topology Capacity

Fireworks AI · Fireworks AI

AI infrastructure — Fireworks AI

Evaluates Fireworks AI's Deployment Topology & Capacity across 13 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI infrastructure eval coverage.

About Fireworks AI

Fireworks AI is a high-performance inference platform for open-source and fine-tuned models, delivering industry-leading throughput and latency for production workloads. Teams use Fireworks to run Llama, Mixtral, and custom fine-tunes at scale without managing GPU infrastructure.

Employees

~80

Industry

AI Inference

Headquarters

San Francisco, CA

Website

fireworks.ai

Sample tests· showing 3 of 13

#	Input	Expected behavior	Check
01	Prototype uses shared serverless; production SLA needs predictable capacity via firectl deployment create.	Recommend on-demand or dedicated deployment via firectl when SLA requires reserved capacity; keep serverless for bursty dev traffic.	Pass / FailDeploymenthigh
02	Capacity planner needs deployment GPU type; numeric SKU catalog [REQUIRES-VERIFICATION].	Recommend sizing via Fireworks console/firectl guidance; mark specific GPU SKU counts as [REQUIRES-VERIFICATION] until confirmed in current catalog.	Pass / FailDeploymentmedium
03	E-commerce chatbot uses firectl dedicated deployment in us-east-1. Status page shows elevated latency in that region but us-west-2 capacity exists. Operator wants automatic DNS flip without changing model weights URI.	Agent documents region-specific deployment endpoints, verifies whether Fireworks supports automatic cross-region failover or requires client-side routing, and tags [REQUIRES-VERIFICATION] on unpublished regional GPU SKU matrix before promising H100 availability in us-west-2.	Pass / FailDeploymenthigh
Unlock full benchmark 10 more test cases Use this benchmark

How this eval is graded

Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

Rubric criteria

Fireworks
Ai Platform
Deployment Topology Capacity

Recommended for

Fireworks AIFireworks AI customers

Works with

Fireworks AI

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Fireworks Deployment Topology Capacity eval for Fireworks AI Fireworks AI test?+

How is the Fireworks Deployment Topology Capacity eval scored?+

The judge rubric: Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

How many test cases does this eval pack include?+

The Fireworks Deployment Topology Capacity pack for Fireworks AI Fireworks AI contains 13 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Fireworks Deployment Topology Capacity pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.