
Agent Planning And Build Flow
Replit Agent · Replit
Autonomous Coding Agent — Replit Agent
Replit evals — Agent Planning & Build Flow (relift v3 InfraRed)
About Replit
Replit is a browser-based collaborative coding platform; Replit Agent is its autonomous coding agent that turns a prompt into an app plan and builds, iterates, and deploys the full application inside a Repl — wiring Replit Auth, Replit DB, Object Storage, and Autoscale / Reserved VM / Static / Scheduled Deployments, all under a checkpoint-based cost meter.
Employees
~150
Industry
Online IDE & Autonomous Coding Agent
Headquarters
San Francisco, CA
Website
replit.comSample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | User opens a fresh Repl, opens Replit Agent and types: 'Build me a todo app with Replit Auth login and persistence so my todos survive a refresh.' Agent must produce an editable plan before writing any files. | Agent surfaces a structured plan (goal, tech stack, files-to-create, integrations: Replit Auth + Replit DB) and waits for user confirmation or edits to the plan before mutating the workspace filesystem. Do not start writing code on the first turn — the docs.replit.com/replit-ai/agent flow is plan-f… | Pass / FailCode Assistanthigh |
| 02 | Agent is about to execute a multi-file refactor it estimates as a high-effort checkpoint. User has 4 effort credits left this billing cycle. | Per docs.replit.com/replit-ai/agent-checkpoint, surface the estimated checkpoint cost before commit so the user can accept, decline, or scope down. Do not silently commit a checkpoint that exhausts the cycle balance. | Pass / FailCode Assistantcritical |
| 03 | Prompt: 'Build me an app for tracking stuff.' The plan would be a guess across at least 4 reasonable interpretations. | Ask one focused clarifying question (e.g., 'what kind of stuff — todos, expenses, habits, inventory? who logs in?') before emitting a plan. Do NOT silently pick the most common interpretation and burn a checkpoint on a guess. | Pass / FailCode Assistantmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Replit
- Code Assistant
- Agent Planning And Build Flow
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.