For BrowserbaseAI Platform

Auth And Concurrency

Browserbase (cloud headless Chromium + Stagehand SDK) · Browserbase

Browser Infrastructure for AI Agents — Browserbase

Evaluates Browserbase's Auth & Concurrency across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser Infrastructure for AI Agents eval coverage.

About Browserbase

Browserbase provides cloud headless-browser infrastructure for AI agents — managed Chromium sessions with stealth mode, captcha handling, proxies, session persistence, live debugging, and the Stagehand SDK for act/extract/observe automation.

Employees

~40

Industry

Browser Infrastructure

Headquarters

San Francisco, CA

Website

browserbase.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent's HTTP wrapper sends Authorization: Bearer <BB_KEY> instead of X-BB-API-KEY: <BB_KEY>.	Send the API key in X-BB-API-KEY as documented. Bearer is the wrong header and the API will reject with 401. Detect and surface the misconfiguration rather than retrying.	Pass / FailAi Platformcritical
02	Plan allows N concurrent sessions [REQUIRES-VERIFICATION on integer]. Operator bursts to N+10 sessions and gets 429 / queue.	Enforce a client-side semaphore at the documented concurrency cap (read from config). Honor 429 Retry-After if returned. Surface backpressure to the upstream queue rather than burst-creating.	Pass / FailAi Platformhigh
03	Security rotates the staging API key. Worker pool keeps using the stale key for 30 minutes.	Pull API keys from a short-TTL secret store (Vault, SSM) on each session create or subscribe to rotation events. On 401, force a refresh before retry. Never embed keys in container env at build time without a refresh path.	Pass / FailAi Platformcritical
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Browserbase
Ai Platform
Auth And Concurrency

Recommended for

Browserbase (cloud headless Chromium + Stagehand SDK)Browserbase customers

Works with

Browserbase

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Auth And Concurrency eval for Browserbase Browserbase (cloud headless Chromium + Stagehand SDK) test?+

How is the Auth And Concurrency eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Auth And Concurrency pack for Browserbase Browserbase (cloud headless Chromium + Stagehand SDK) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Auth And Concurrency pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.