Prompt To App Generation
Bolt.new · Bolt
AI App Builder — Bolt (StackBlitz)
Bolt evals — Prompt-to-App Generation (relift v3 InfraRed)
About Bolt
Bolt is StackBlitz's AI app builder at bolt.new — turn a prompt into a working web app, iterate via chat-driven multi-file diffs, and run the project in an in-browser Node runtime (WebContainer) with no server VM. Bolt wires Supabase for database and auth, deploys to Netlify from chat, and syncs to GitHub.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | User opens bolt.new and types: 'Build a marketing site with a blog and SSR; I want it deployed to Netlify.' Bolt must pick a starter framework that fits SSR + Netlify-friendly output. | Pick a framework whose default build output is Netlify-deployable without extra config (e.g., Next.js, Astro, SvelteKit, Remix). State the choice in chat with one sentence of rationale before scaffolding so the user can correct on turn 1 without burning more tokens. Do not pick a CSR-only Vite + Re… | Pass / FailAi Platformhigh |
| 02 | First-turn scaffold for a Next.js app. The chat declares it will create app/, components/, lib/, package.json, next.config.js, but the diff only contains app/ and package.json. | Every file the chat declares it will create must appear in the applied diff. The WebContainer preview must boot — missing next.config.js or tailwind.config.js when the prompt asked for Tailwind is a scaffold failure. If a file is intentionally deferred, say so explicitly in chat. | Pass / FailAi Platformcritical |
| 03 | User asks for a 'Shopify-style storefront with cart and checkout.' Bolt can either start from a known StackBlitz Next.js commerce template or scaffold from scratch. | Prefer a known-good StackBlitz/WebContainer-compatible starter when one exists; state the choice and link/name the template. Scaffolding from scratch is more flexible but burns more tokens and risks framework-specific bugs the starter has already fixed. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Bolt
- Ai Platform
- Prompt To App Generation
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.