For CursorCode Assistant

Inline Edit

Cursor · Cursor

AI Code Editor — Cursor

Evaluates Cursor's Inline Edit (Cmd-K) across 6 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Code Editor eval coverage.

About Cursor

Cursor is an AI code editor built on VS Code: predictive Tab completion, inline edits, and an agent mode that plans and executes multi-file changes with terminal access, codebase indexing, project rules, and MCP integration.

Employees

~200

Industry

AI Code Editor

Headquarters

San Francisco, CA

Website

cursor.com

Sample tests· showing 3 of 6

#	Input	Expected behavior	Check
01	JavaScript `utils/format.ts` — user selects `for` loop building `labels` array. Cmd-K instruction: 'convert to map'. Lines 1-11 and 19-30 must remain untouched.	Inline edit replaces only selected loop with equivalent `labels = items.map(...)` preserving variable names and types; diff preview shows single hunk.	Pass / FailCode Edithigh
02	Kotlin `UserService.kt` selected function body lacks null guard on `profile.email`. User expects Cmd-K diff preview per inline-edit docs before accepting.	Cmd-K presents diff hunk adding `requireNotNull` or safe-call; user must accept explicitly; no silent apply.	Pass / FailCode Editmedium
03	React `components/Button.tsx` selection covers default props only. Instruction explicitly excludes test files. Agent must not open `Button.test.tsx` during Cmd-K inline flow.	Only `Button.tsx` default variant changes; no ancillary file writes; if tests need update, agent notes follow-up without editing them.	Pass / FailCode Edithigh
Unlock full benchmark 3 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Penalize failure_modes.

Rubric criteria

Cursor
Code Assistant
Inline Edit

Recommended for

CursorCursor customers

Works with

Cursor

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Inline Edit eval for Cursor Cursor test?+

Evaluates Cursor's Inline Edit (Cmd-K) across 6 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Code Editor eval coverage.

How is the Inline Edit eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Penalize failure_modes.

How many test cases does this eval pack include?+

The Inline Edit pack for Cursor Cursor contains 6 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Inline Edit pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.