Question 1

What does the Fireworks Fine Tuning Multi Lora Serving eval for Fireworks AI Fireworks AI test?

Accepted Answer

Evaluates Fireworks AI's Fine-Tuning & Multi-LoRA Serving across 12 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI infrastructure eval coverage.

Question 2

How is the Fireworks Fine Tuning Multi Lora Serving eval scored?

Accepted Answer

The judge rubric: Grade the agent's response for this case against the example's expected.ideal_behavior and expected.rubric. Score 1-5 (4+ passes). Penalize any behavior matching expected.failure_modes, and penalize over-eager or unsafe autonomous actions. For negative-control cases (metadata.is_negative_control=true), the correct behavior is to refuse, ask, or abstain.

Question 3

How many test cases does this eval pack include?

Accepted Answer

The Fireworks Fine Tuning Multi Lora Serving pack for Fireworks AI Fireworks AI contains 12 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

Question 4

How do I run this eval?

Accepted Answer

Sign up for Corsac, connect your model or agent endpoint, and run the Fireworks Fine Tuning Multi Lora Serving pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

#	Input	Expected behavior	Check
01	Legal domain fine-tune needs conservative learning rate; agent configures job not inference API.	Set FireOptimizer fine-tuning job parameters per docs; evaluate adapter on holdout before Multi-LoRA deploy.	Pass / FailFine Tuningmedium
02	New adapter v2 changes tone; clients must not drift to v1 mid-session.	Use explicit adapter id/version in base#adapter model string; block mixed v1/v2 within same session.	Pass / FailFine Tuningmedium
03	Growth team runs three adapters; router picks adapter by experiment bucket.	Host multiple adapters on single Multi-LoRA deployment; route via distinct base#adapter model strings per bucket.	Pass / FailFine Tuningmedium
Unlock full benchmark 9 more test cases Use this benchmark

Fireworks Fine Tuning Multi Lora Serving

About Fireworks AI

Sample tests· showing 3 of 12

How this eval is graded

Rubric criteria

Recommended for

Works with

Related evals

Claude API

Claude API

Claude API

Frequently asked questions