
Eval directory
Evals for Cursor
9 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Cursor AI products.
About Cursor
Cursor is an AI code editor built on VS Code: predictive Tab completion, inline edits, and an agent mode that plans and executes multi-file changes with terminal access, codebase indexing, project rules, and MCP integration.
Available eval packs for Cursor
9 packs ready to run.
Codebase Indexing
Cursor evals — Codebase Indexing (relift v3)
Composer Agent
Cursor evals — Composer & Agent Mode (relift v3)
Inline Edit
Cursor evals — Inline Edit (Cmd-K) (relift v3)
Mcp Integration
Cursor evals — MCP Integration (relift v3)
Model Selection
Cursor evals — Model Selection & Routing (relift v3)
Privacy Edit Safety
PII LeakageCursor evals — Privacy & Edit Safety (relift v3)
Project Rules
Cursor evals — Project Rules (relift v3)
Tab Completion
Code CheckerCursor evals — Tab Completion (relift v3)
Completion Smoke V1
CorrectnessCode CheckerCursor generates correct, maintainable code completions that satisfy task intent without unsafe patterns.
Why eval Cursor AI
Cursor's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Cursor measures four dimensions teams care about most when deploying code assistant agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Cursor's public product surface and runnable in Corsac with your own data.