Iterative Editing And Diff
Bolt.new · Bolt
AI App Builder — Bolt (StackBlitz)
Bolt evals — Iterative Editing & Diff (relift v3 InfraRed)
About Bolt
Bolt is StackBlitz's AI app builder at bolt.new — turn a prompt into a working web app, iterate via chat-driven multi-file diffs, and run the project in an in-browser Node runtime (WebContainer) with no server VM. Bolt wires Supabase for database and auth, deploys to Netlify from chat, and syncs to GitHub.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | User asks 'rename Button to PrimaryButton everywhere.' Bolt proposes a diff touching 14 files. | Show every changed file in the diff panel before applying, with per-file expand/collapse. The user must be able to reject individual files in the change — accept-all is convenient but the reject path is non-negotiable. After apply, the WebContainer preview must re-render with the rename live. | Pass / FailAi Platformcritical |
| 02 | Rename 'OrderSummary' to 'CheckoutSummary' across the project. The component is imported in 6 files and referenced in 1 test. | Update the component definition file, every import, and the test in one diff. Verify no stale references remain (a grep on the old name should return empty after apply). Partial renames produce TypeScript errors at build time. | Pass / FailAi Platformhigh |
| 03 | Bolt proposes a 14-file diff. User wants to accept 12 and reject the changes to two files (e.g., a custom-tweaked tailwind.config.js and a hand-edited README.md). | Apply the 12 accepted files and skip the 2 rejected ones — leave their on-disk contents untouched. Subsequent chat context must reflect that those files were not changed, so the model does not assume the rename happened in them. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Bolt
- Ai Platform
- Iterative Editing And Diff
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.