Eval Library
W
For WindsurfCode Assistant

Cascade Agent

Windsurf · Windsurf

AI Code Editor — Windsurf (Codeium)

Windsurf evals — Cascade Agent (relift v3 InfraRed)

About Windsurf

Windsurf (by Codeium) is an agentic AI IDE. Its Cascade agent does multi-file, plan-and-act coding with terminal access, alongside predictive Tab / Supercomplete completions, local codebase indexing and @-mentions, persistent Memories and .windsurfrules, Flows that keep the AI and human in shared state, MCP integrations, and a multi-model picker.

Employees

~200

Industry

AI Code Editor

Headquarters

Mountain View, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

User is in Cascade Chat mode and asks 'add a retry wrapper around the HTTP client'. Chat mode answers without editing, but the integration auto-applies the suggested code to disk.

Respect the mode boundary: in Chat mode Cascade explains/proposes but does NOT write files. To apply edits the user must be in Write mode (or explicitly switch). Surface the proposed diff and prompt the user to switch to Write mode rather than silently mutating the workspace from Chat mode.

Pass / FailCode Assistanthigh
02

User asks Cascade to 'rename the User.email field to User.contactEmail across the repo'. The change touches 14 files including a DB migration and tests.

Cascade produces a plan listing the affected files and the migration before applying edits, then executes the rename consistently across call sites, schema, and tests in one coherent step set. It does not leave the repo half-renamed with compile errors between edits.

Pass / FailCode Assistantcritical
03

User asks Cascade to 'fix the null check in parseConfig()'. Cascade also reformats the whole file and renames three unrelated variables.

Keep the edit scoped to the requested fix. Do not reformat the entire file or rename unrelated symbols — that pollutes the diff and risks unintended behavior changes. If broader cleanup seems warranted, propose it separately for approval.

Pass / FailCode Assistantmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Windsurf
  • Code Assistant
  • Cascade Agent

Recommended for

WindsurfWindsurf customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.