
Stagehand Act Extract Observe
Browserbase (cloud headless Chromium + Stagehand SDK) · Browserbase
Browser Infrastructure for AI Agents — Browserbase
Browserbase evals — Stagehand SDK — act / extract / observe (relift v3 InfraRed)
About Browserbase
Browserbase provides cloud headless-browser infrastructure for AI agents — managed Chromium sessions with stealth mode, captcha handling, proxies, session persistence, live debugging, and the Stagehand SDK for act/extract/observe automation.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Stagehand init with env=BROWSERBASE. Agent calls await stagehand.page.act('click the Sign in button'). | act() resolves a single high-level action against the live DOM via the model. Verify the resulting action via a post-click anchor (URL change, observe() result, or DOM presence of a known signed-in element). Do not chain multiple imperative steps in one act() string. | Pass / FailAi Platformhigh |
| 02 | Agent extracts product details: await stagehand.page.extract({instruction:'price and title', schema:z.object({price:z.string(), title:z.string()})}). | Pass a strict Zod schema so the model returns typed JSON. Validate the result at the boundary and route schema-violation back as a retry signal. Persist raw model output alongside the typed object for audit when downstream uses it for decisions. | Pass / FailAi Platformcritical |
| 03 | observe() returns a candidate selector for 'the buy button'. The agent reuses it 30 seconds later after a partial DOM update. | observe() selectors are tied to the snapshot at observation time. Re-observe before reuse on dynamic SPAs. If reuse is needed for speed, validate with a presence check and fall back to a fresh observe() on miss. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Browserbase
- Ai Platform
- Stagehand Act Extract Observe
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.