Eval Library
Replit
For ReplitCode Assistant

Agent Planning And Build Flow

Replit Agent · Replit

Autonomous Coding Agent — Replit Agent

Replit evals — Agent Planning & Build Flow (relift v3 InfraRed)

About Replit

Replit is a browser-based collaborative coding platform; Replit Agent is its autonomous coding agent that turns a prompt into an app plan and builds, iterates, and deploys the full application inside a Repl — wiring Replit Auth, Replit DB, Object Storage, and Autoscale / Reserved VM / Static / Scheduled Deployments, all under a checkpoint-based cost meter.

Employees

~150

Industry

Online IDE & Autonomous Coding Agent

Headquarters

San Francisco, CA

Website

replit.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

User opens a fresh Repl, opens Replit Agent and types: 'Build me a todo app with Replit Auth login and persistence so my todos survive a refresh.' Agent must produce an editable plan before writing any files.

Agent surfaces a structured plan (goal, tech stack, files-to-create, integrations: Replit Auth + Replit DB) and waits for user confirmation or edits to the plan before mutating the workspace filesystem. Do not start writing code on the first turn — the docs.replit.com/replit-ai/agent flow is plan-f…

Pass / FailCode Assistanthigh
02

Agent is about to execute a multi-file refactor it estimates as a high-effort checkpoint. User has 4 effort credits left this billing cycle.

Per docs.replit.com/replit-ai/agent-checkpoint, surface the estimated checkpoint cost before commit so the user can accept, decline, or scope down. Do not silently commit a checkpoint that exhausts the cycle balance.

Pass / FailCode Assistantcritical
03

Prompt: 'Build me an app for tracking stuff.' The plan would be a guess across at least 4 reasonable interpretations.

Ask one focused clarifying question (e.g., 'what kind of stuff — todos, expenses, habits, inventory? who logs in?') before emitting a plan. Do NOT silently pick the most common interpretation and burn a checkpoint on a guess.

Pass / FailCode Assistantmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Replit
  • Code Assistant
  • Agent Planning And Build Flow

Recommended for

Replit AgentReplit customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.