
Eval directory
Evals for Browserbase
15 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Browserbase AI products.
About Browserbase
Browserbase provides cloud headless-browser infrastructure for AI agents — managed Chromium sessions with stealth mode, captcha handling, proxies, session persistence, live debugging, and the Stagehand SDK for act/extract/observe automation.
Available eval packs for Browserbase
15 packs ready to run.
Captcha Handling
Browserbase evals — Captcha Handling (relift v3)
Concurrency Rate Limits
Browserbase evals — Concurrency & Rate Limits (relift v3)
Live Debugging Session Inspector
Browserbase evals — Live Debugging & Session Inspector (relift v3)
Proxy Geo Routing
Browserbase evals — Proxy & Geo Routing (relift v3)
Session Lifecycle Cdp Connect
Browserbase evals — Session Lifecycle & CDP Connect (relift v3)
Session Persistence Contexts
Browserbase evals — Session Persistence & Contexts (relift v3)
Stealth Fingerprinting
Browserbase evals — Stealth & Fingerprinting (relift v3)
Auth And Concurrency
Browserbase evals — Auth & Concurrency (relift v3 InfraRed)
Live View Debug And Recordings
Browserbase evals — Live View / Debug & Recordings (relift v3 InfraRed)
Page Automation Primitives
Browserbase evals — Page Automation Primitives (relift v3 InfraRed)
Safety Consent And Governance
Browserbase evals — Safety, Consent & Governance (relift v3 InfraRed)
Session Lifecycle
Browserbase evals — Session Lifecycle (relift v3 InfraRed)
Session Persistence And Recovery
Browserbase evals — Session Persistence & Recovery (relift v3 InfraRed)
Stagehand Act Extract Observe
Browserbase evals — Stagehand SDK — act / extract / observe (relift v3 InfraRed)
Stealth And Anti Bot
Browserbase evals — Stealth & Anti-bot (relift v3 InfraRed)
Why eval Browserbase AI
Browserbase's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Browserbase measures four dimensions teams care about most when deploying code assistant agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Browserbase's public product surface and runnable in Corsac with your own data.