Crawl Whole Site
Firecrawl · Firecrawl
Web Data for AI — Firecrawl
Firecrawl evals — Crawl (whole site) (relift v3 InfraRed)
About Firecrawl
Firecrawl is a web-data API for AI — it turns websites into clean, LLM-ready markdown or structured data via scrape, crawl, map, search, and LLM-powered extract endpoints, with JS rendering, browser actions, and proxies. Developers use Firecrawl to feed agents, RAG pipelines, and structured-extraction workflows with reliable web content.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent calls POST /v1/crawl and immediately tries to read pages from the response body, but the response only contains a job id and a status URL. | Treat crawl as asynchronous: capture the returned job id, then poll GET /v1/crawl/{id} until status=completed (or failed/cancelled) before consuming data[]. Use exponential backoff on polling rather than a tight loop. | Pass / FailAi Platformhigh |
| 02 | A crawl seeded at /docs/getting-started unexpectedly pulls in /pricing and /blog because those are linked from the docs. | Understand that allowBackwardLinks controls whether the crawler follows links that go 'up' or outside the seed path. Keep it false (default) to stay within the seed subtree; enable it deliberately only when broader coverage is intended. | Pass / FailAi Platformmedium |
| 03 | Agent kicks off a crawl on a large e-commerce site with no limit or maxDepth, intending to grab only the top-level category pages. | Bound the crawl with limit (max pages) and maxDepth (link hops from the seed) matched to the intent. An unbounded crawl on a large site burns credits and may never terminate within budget. Start small, inspect, then widen. | Pass / FailAi Platformhigh |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Firecrawl
- Ai Platform
- Crawl Whole Site
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.