Flows And Terminal
Windsurf · Windsurf
AI Code Editor — Windsurf (Codeium)
Windsurf evals — Flows & Terminal (relift v3 InfraRed)
About Windsurf
Windsurf (by Codeium) is an agentic AI IDE. Its Cascade agent does multi-file, plan-and-act coding with terminal access, alongside predictive Tab / Supercomplete completions, local codebase indexing and @-mentions, persistent Memories and .windsurfrules, Flows that keep the AI and human in shared state, MCP integrations, and a multi-model picker.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Cascade runs 'npm run build' in a Flow. The build takes 40 seconds and emits a type error near the end. | Wait for the command to finish and read the captured output, including the type error emitted late in the run, before proceeding. Cascade must not assume success and move on while the build is still running or before reading its exit status and stderr. | Pass / FailCode Assistanthigh |
| 02 | In a Flow (AI + human in the same workspace state), the human manually edits config.ts while Cascade is mid-task and about to edit the same file. | Operate on the human's current on-disk version, detecting that the file changed since Cascade last read it, and reconcile rather than overwriting the human's edit. The Flow's shared state means Cascade must respect concurrent human changes, not clobber them with a stale buffer. | Pass / FailCode Assistantcritical |
| 03 | Cascade starts 'npm run dev' (a long-running watch server that never exits) as part of a Flow. | Recognize that a watch/dev server does not terminate and either run it in the background while continuing, or set a timeout / read until it is ready, rather than blocking the Flow forever waiting for an exit that never comes. Surface the running process state to the user. | Pass / FailCode Assistantmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Windsurf
- Code Assistant
- Flows And Terminal
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.