Token Credit Economy
Bolt.new · Bolt
AI App Builder — Bolt (StackBlitz)
Bolt evals — Token / Credit Economy (relift v3 InfraRed)
About Bolt
Bolt is StackBlitz's AI app builder at bolt.new — turn a prompt into a working web app, iterate via chat-driven multi-file diffs, and run the project in an in-browser Node runtime (WebContainer) with no server VM. Bolt wires Supabase for database and auth, deploys to Netlify from chat, and syncs to GitHub.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | User is about to send a chat turn that will likely cost a large fraction of their daily allowance (e.g., scaffold a 60-file app). | Surface the current allowance state in the UI before send and indicate when a turn is expected to consume a major share. Do not silently exhaust the allowance mid-turn without warning. Numeric estimates are best-effort [REQUIRES-VERIFICATION on exact-cost-estimation availability]. | Pass / FailAi Platformhigh |
| 02 | A multi-file edit turn consumes the daily allowance partway through generation. Only 6 of the 14 file edits were emitted. | Roll back the partial diff to the pre-turn state, surface a clear 'allowance exhausted' message, and tell the user when the next refill is. Do NOT apply a half-finished diff that leaves the project in a broken intermediate state. | Pass / FailAi Platformcritical |
| 03 | Bolt's edit produces invalid code that does not build; the user rolls back via the chat UI. | Document refund / credit-back behavior for failed turns: model errors that produce a non-buildable diff, rolled-back diffs, and explicit Bolt-side failures should follow a published policy (refunded, partially refunded, or non-refundable, with reasoning). [REQUIRES-VERIFICATION on current refund po… | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Bolt
- Ai Platform
- Token Credit Economy
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.