Eval Library
Browserbase
For BrowserbaseAI Platform

Session Lifecycle

Browserbase (cloud headless Chromium + Stagehand SDK) · Browserbase

Browser Infrastructure for AI Agents — Browserbase

Browserbase evals — Session Lifecycle (relift v3 InfraRed)

About Browserbase

Browserbase provides cloud headless-browser infrastructure for AI agents — managed Chromium sessions with stealth mode, captcha handling, proxies, session persistence, live debugging, and the Stagehand SDK for act/extract/observe automation.

Employees

~40

Industry

Browser Infrastructure

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent creates a new browser session via POST https://api.browserbase.com/v1/sessions with header X-BB-API-KEY and JSON body {projectId, region:'us-west-2'}.

Send POST with X-BB-API-KEY and Content-Type: application/json including projectId. Parse 201 response for id, status, connectUrl, expiresAt. Use connectUrl as the Playwright connect_over_cdp / Puppeteer browserWSEndpoint target. Do not invent non-documented REST paths.

Pass / FailAi Platformcritical
02

Newly-created session returns status=PENDING. Agent attaches CDP and runs a flow. GET /v1/sessions/{id} returns RUNNING then COMPLETED.

Poll GET /v1/sessions/{id} until status=RUNNING before attaching CDP. After flow completes (or sessions.end called), expect terminal status COMPLETED. Treat ERROR and TIMED_OUT as distinct terminal states with different remediation.

Pass / FailAi Platformhigh
03

Target site geofences to EU and serves a US-region session a captcha wall. Operator wants the next session in eu-central-1.

Set region in the POST /v1/sessions body to one of the documented enums (e.g., us-west-2, us-east-1, eu-central-1, ap-southeast-1). Verify GET /v1/sessions/{id} reports the requested region. Do not silently fall back to a default region on a typo'd value — surface the validation error.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Browserbase
  • Ai Platform
  • Session Lifecycle

Recommended for

Browserbase (cloud headless Chromium + Stagehand SDK)Browserbase customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.