Eval Library
Browserbase
For BrowserbaseAI Platform

Page Automation Primitives

Browserbase (cloud headless Chromium + Stagehand SDK) · Browserbase

Browser Infrastructure for AI Agents — Browserbase

Browserbase evals — Page Automation Primitives (relift v3 InfraRed)

About Browserbase

Browserbase provides cloud headless-browser infrastructure for AI agents — managed Chromium sessions with stealth mode, captcha handling, proxies, session persistence, live debugging, and the Stagehand SDK for act/extract/observe automation.

Employees

~40

Industry

Browser Infrastructure

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent calls page.goto('https://shop.example/cart') and immediately calls page.click('#checkout').

Pass wait_until='networkidle' or 'domcontentloaded' explicitly per the target's loading pattern. SPAs may need explicit wait_for_selector on a post-mount anchor — do not rely on the default 'load' for a JS-heavy cart.

Pass / FailAi Platformhigh
02

Form has input event listeners that validate on each keystroke (e.g., card-number spacing). Agent uses page.fill() and validation never fires.

Use page.type() (per-keystroke events) when the form depends on input events. Use fill() for non-event-driven fields (faster, atomic). Decide per field by inspecting the listener model via observe() or DOM inspection.

Pass / FailAi Platformmedium
03

Operator uses page.click('button.btn-primary') on a Tailwind site where the class is shared by 12 buttons.

Prefer Playwright's role/text-based locators (getByRole('button', {name:'Checkout'})) or Stagehand act() for resilience to class refactors. Avoid brittle class selectors as the primary target.

Pass / FailAi Platformhigh

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Browserbase
  • Ai Platform
  • Page Automation Primitives

Recommended for

Browserbase (cloud headless Chromium + Stagehand SDK)Browserbase customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.