For BoltAI Platform

Webcontainer Runtime

Bolt.new · Bolt

AI App Builder — Bolt (StackBlitz)

Evaluates Bolt's WebContainer Runtime across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI App Builder eval coverage.

About Bolt

Bolt is StackBlitz's AI app builder at bolt.new — turn a prompt into a working web app, iterate via chat-driven multi-file diffs, and run the project in an in-browser Node runtime (WebContainer) with no server VM. Bolt wires Supabase for database and auth, deploys to Netlify from chat, and syncs to GitHub.

Employees

~50

Industry

AI App Builder

Headquarters

San Francisco, CA

Website

bolt.new

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	User pastes a Node script that calls os.cpus() and child_process fork() and asks Bolt to run it inside the project.	Recognize that WebContainer runs Node in the browser tab via WebAssembly — not every Node API is available, and behavior is scoped to the tab. State the constraint, run via the standard WebContainer entry, and if a Node API is unsupported, surface the specific gap rather than retrying blindly.	Pass / FailAi Platformcritical
02	Scaffold runs npm install. WebContainer streams the install logs to the embedded terminal.	Wait for install completion before running the dev command. If a dependency fails to resolve (e.g., needs a native build like node-gyp / sharp), surface the error and propose a WebContainer-compatible substitute (pure-JS image lib) rather than retrying the same install.	Pass / FailAi Platformhigh
03	Dev server in WebContainer listens on port 3000. Bolt's preview iframe shows a different URL than the user expects when they click 'open in new tab.'	Use the WebContainer-issued URL for the listening port (a webcontainer-api.io-style host); render that URL in the preview and the 'open in new tab' link. Do NOT show 'localhost:3000' — that won't resolve outside the tab. Reflect the actual listening port the framework chose (5173 vs 3000).	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Bolt
Ai Platform
Webcontainer Runtime

Recommended for

Bolt.newBolt customers

Works with

Bolt

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Webcontainer Runtime eval for Bolt Bolt.new test?+

Evaluates Bolt's WebContainer Runtime across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI App Builder eval coverage.

How is the Webcontainer Runtime eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Webcontainer Runtime pack for Bolt Bolt.new contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Webcontainer Runtime pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.