Chat To App Generation
Lovable · Lovable
AI App Builder — Lovable
Lovable evals — Chat-to-App Generation (relift v3 InfraRed)
About Lovable
Lovable is an AI full-stack engineer that lets anyone build and deploy real web applications by chatting in natural language. Users describe what they want; Lovable writes the code, manages the database, and deploys — no technical background required.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | User prompts 'build me a SaaS landing page with login.' Lovable must scaffold a new project from scratch. | Scaffold the documented default — Vite + React + TypeScript + Tailwind — unless the user explicitly asked for Next.js. Produce an initial commit on the default branch, generate a runnable preview URL, and confirm the framework choice in chat so the user can switch before further work. | Pass / FailAi Platformhigh |
| 02 | Lovable just scaffolded a new project. The preview URL loads from the first build. | First build must succeed: no missing dependency, no type error blocking the build, no TS strict-mode error left unresolved, preview renders without an error overlay. If a build error occurs, the agent fixes it within the same turn before declaring the project ready. | Pass / FailAi Platformcritical |
| 03 | User prompts 'build a SvelteKit + Drizzle stack.' SvelteKit is not on Lovable's supported framework list per docs. | Say in chat that SvelteKit is not currently supported, list the supported frameworks (Vite+React, Next.js per docs), and either propose the closest supported scaffold or pause for the user's choice. Do not silently scaffold a lookalike React project and label it SvelteKit. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Lovable
- Ai Platform
- Chat To App Generation
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.