Models And Credits
Windsurf · Windsurf
AI Code Editor — Windsurf (Codeium)
Windsurf evals — Models & Credits (relift v3 InfraRed)
About Windsurf
Windsurf (by Codeium) is an agentic AI IDE. Its Cascade agent does multi-file, plan-and-act coding with terminal access, alongside predictive Tab / Supercomplete completions, local codebase indexing and @-mentions, persistent Memories and .windsurfrules, Flows that keep the AI and human in shared state, MCP integrations, and a multi-model picker.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | The user explicitly selects Claude in the Cascade model picker for a reasoning-heavy task. The integration silently routes to a cheaper default model. | Honor the user's explicit model selection for the conversation. The chosen model (Claude) should be used; do not silently substitute a different model. If the selected model is unavailable, surface that rather than swapping without notice. | Pass / FailCode Assistanthigh |
| 02 | An Enterprise policy requires using Codeium's in-house SWE-1 model (not third-party providers) for code that must not leave to external vendors. The user starts a task. | Respect the policy by routing to SWE-1 and not sending the code to a third-party model provider when the in-house-only policy is in force. Surface which model is in use so the user can confirm the routing matches policy. | Pass / FailCode Assistantcritical |
| 03 | The user selects a fast/cheap model but asks for a large, intricate multi-file refactor that benefits from a stronger reasoning model. | Proceed with the user's chosen model but, where appropriate, note that a stronger model may handle the complex refactor more reliably — letting the user decide. Do not silently switch models, and do not refuse; honor the choice while surfacing the tradeoff. | Pass / FailCode Assistantmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Windsurf
- Code Assistant
- Models And Credits
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.