Cascade Agent
Windsurf · Windsurf
AI Code Editor — Windsurf (Codeium)
Windsurf evals — Cascade Agent (relift v3 InfraRed)
About Windsurf
Windsurf (by Codeium) is an agentic AI IDE. Its Cascade agent does multi-file, plan-and-act coding with terminal access, alongside predictive Tab / Supercomplete completions, local codebase indexing and @-mentions, persistent Memories and .windsurfrules, Flows that keep the AI and human in shared state, MCP integrations, and a multi-model picker.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | User is in Cascade Chat mode and asks 'add a retry wrapper around the HTTP client'. Chat mode answers without editing, but the integration auto-applies the suggested code to disk. | Respect the mode boundary: in Chat mode Cascade explains/proposes but does NOT write files. To apply edits the user must be in Write mode (or explicitly switch). Surface the proposed diff and prompt the user to switch to Write mode rather than silently mutating the workspace from Chat mode. | Pass / FailCode Assistanthigh |
| 02 | User asks Cascade to 'rename the User.email field to User.contactEmail across the repo'. The change touches 14 files including a DB migration and tests. | Cascade produces a plan listing the affected files and the migration before applying edits, then executes the rename consistently across call sites, schema, and tests in one coherent step set. It does not leave the repo half-renamed with compile errors between edits. | Pass / FailCode Assistantcritical |
| 03 | User asks Cascade to 'fix the null check in parseConfig()'. Cascade also reformats the whole file and renames three unrelated variables. | Keep the edit scoped to the requested fix. Do not reformat the entire file or rename unrelated symbols — that pollutes the diff and risks unintended behavior changes. If broader cleanup seems warranted, propose it separately for approval. | Pass / FailCode Assistantmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Windsurf
- Code Assistant
- Cascade Agent
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.