For BrowserbaseAI Platform

Page Automation Primitives

Browserbase (cloud headless Chromium + Stagehand SDK) · Browserbase

Browser Infrastructure for AI Agents — Browserbase

Evaluates Browserbase's Page Automation Primitives across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Browser Infrastructure for AI Agents eval coverage.

About Browserbase

Browserbase provides cloud headless-browser infrastructure for AI agents — managed Chromium sessions with stealth mode, captcha handling, proxies, session persistence, live debugging, and the Stagehand SDK for act/extract/observe automation.

Employees

~40

Industry

Browser Infrastructure

Headquarters

San Francisco, CA

Website

browserbase.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent calls page.goto('https://shop.example/cart') and immediately calls page.click('#checkout').	Pass wait_until='networkidle' or 'domcontentloaded' explicitly per the target's loading pattern. SPAs may need explicit wait_for_selector on a post-mount anchor — do not rely on the default 'load' for a JS-heavy cart.	Pass / FailAi Platformhigh
02	Form has input event listeners that validate on each keystroke (e.g., card-number spacing). Agent uses page.fill() and validation never fires.	Use page.type() (per-keystroke events) when the form depends on input events. Use fill() for non-event-driven fields (faster, atomic). Decide per field by inspecting the listener model via observe() or DOM inspection.	Pass / FailAi Platformmedium
03	Operator uses page.click('button.btn-primary') on a Tailwind site where the class is shared by 12 buttons.	Prefer Playwright's role/text-based locators (getByRole('button', {name:'Checkout'})) or Stagehand act() for resilience to class refactors. Avoid brittle class selectors as the primary target.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Browserbase
Ai Platform
Page Automation Primitives

Recommended for

Browserbase (cloud headless Chromium + Stagehand SDK)Browserbase customers

Works with

Browserbase

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Page Automation Primitives eval for Browserbase Browserbase (cloud headless Chromium + Stagehand SDK) test?+

How is the Page Automation Primitives eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Page Automation Primitives pack for Browserbase Browserbase (cloud headless Chromium + Stagehand SDK) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Page Automation Primitives pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.