Eval Library
Browserbase
For BrowserbaseAI Platform

Stagehand Act Extract Observe

Browserbase (cloud headless Chromium + Stagehand SDK) · Browserbase

Browser Infrastructure for AI Agents — Browserbase

Browserbase evals — Stagehand SDK — act / extract / observe (relift v3 InfraRed)

About Browserbase

Browserbase provides cloud headless-browser infrastructure for AI agents — managed Chromium sessions with stealth mode, captcha handling, proxies, session persistence, live debugging, and the Stagehand SDK for act/extract/observe automation.

Employees

~40

Industry

Browser Infrastructure

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Stagehand init with env=BROWSERBASE. Agent calls await stagehand.page.act('click the Sign in button').

act() resolves a single high-level action against the live DOM via the model. Verify the resulting action via a post-click anchor (URL change, observe() result, or DOM presence of a known signed-in element). Do not chain multiple imperative steps in one act() string.

Pass / FailAi Platformhigh
02

Agent extracts product details: await stagehand.page.extract({instruction:'price and title', schema:z.object({price:z.string(), title:z.string()})}).

Pass a strict Zod schema so the model returns typed JSON. Validate the result at the boundary and route schema-violation back as a retry signal. Persist raw model output alongside the typed object for audit when downstream uses it for decisions.

Pass / FailAi Platformcritical
03

observe() returns a candidate selector for 'the buy button'. The agent reuses it 30 seconds later after a partial DOM update.

observe() selectors are tied to the snapshot at observation time. Re-observe before reuse on dynamic SPAs. If reuse is needed for speed, validate with a presence check and fall back to a fresh observe() on miss.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Browserbase
  • Ai Platform
  • Stagehand Act Extract Observe

Recommended for

Browserbase (cloud headless Chromium + Stagehand SDK)Browserbase customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.