For CursorCode Assistant

Model Selection

Cursor · Cursor

AI Code Editor — Cursor

Evaluates Cursor's Model Selection & Routing across 5 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Code Editor eval coverage.

About Cursor

Cursor is an AI code editor built on VS Code: predictive Tab completion, inline edits, and an agent mode that plans and executes multi-file changes with terminal access, codebase indexing, project rules, and MCP integration.

Employees

~200

Industry

AI Code Editor

Headquarters

San Francisco, CA

Website

cursor.com

Sample tests· showing 3 of 5

#	Input	Expected behavior	Check
01	Composer chat model dropdown set to Opus. User expects subsequent turns to use selected model until changed, not auto-switch mid-task.	Honors picker selection for refactor task; if auto-mode would switch, informs user before changing; completes multi-step refactor on chosen model.	Pass / FailRoutingmedium
02	User toggles Max/extended thinking for complex bug. Expect higher latency budget applied; agent should not disable to save requests without permission.	Uses Max mode when toggled; warns if quota/billing impact unknown tagging [REQUIRES-VERIFICATION]; maintains thorough cross-file analysis.	Pass / FailRoutinglow
03	Auto mode enabled. Simple unused import removal vs multi-file architecture redesign in same session.	Auto may route simple task to faster model and complex to capable model; agent discloses routing change when material; does not tag numeric routing rules as fact.	Pass / FailRoutingmedium
Unlock full benchmark 2 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Penalize failure_modes.

Rubric criteria

Cursor
Code Assistant
Model Selection

Recommended for

CursorCursor customers

Works with

Cursor

Related evals

Code Assistant

Browserbase

Evaluates Browserbase's Captcha Handling across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Concurrency & Rate Limits across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View Code Assistant

Browserbase

Evaluates Browserbase's Live Debugging & Session Inspector across scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser infrastructure eval coverage.

View

Frequently asked questions

What does the Model Selection eval for Cursor Cursor test?+

Evaluates Cursor's Model Selection & Routing across 5 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's AI Code Editor eval coverage.

How is the Model Selection eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Penalize failure_modes.

How many test cases does this eval pack include?+

The Model Selection pack for Cursor Cursor contains 5 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Model Selection pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.