Eval Library
F
For FirecrawlAI Platform

Search

Firecrawl · Firecrawl

Web Data for AI — Firecrawl

Firecrawl evals — Search (relift v3 InfraRed)

About Firecrawl

Firecrawl is a web-data API for AI — it turns websites into clean, LLM-ready markdown or structured data via scrape, crawl, map, search, and LLM-powered extract endpoints, with JS rendering, browser actions, and proxies. Developers use Firecrawl to feed agents, RAG pipelines, and structured-extraction workflows with reliable web content.

Employees

~30

Industry

Web Data / Scraping

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent wants the top web results for a query AND their cleaned content, and currently runs /v1/search then a separate /v1/scrape per result URL.

Pass scrapeOptions on the /v1/search call so each result is scraped to markdown in a single request, returning query results with content inline. Only fall back to per-URL scrape when a result needs different scrape settings.

Pass / FailAi Platformmedium
02

Agent needs results from a specific source type (e.g. web vs news) but accepts whatever the default sources return.

Use the sources parameter to select the intended source set for the query so results match the task (e.g. news for current events). Confirm the returned results come from the selected sources before downstream use.

Pass / FailAi Platformlow
03

Agent needs only the top 5 results for a RAG step but leaves limit unset and ingests the full default result set, scraping each one.

Set the search limit (result count) to what the task needs (e.g. 5). With scrapeOptions enabled, every extra result scraped costs credits, so bound the count deliberately rather than over-fetching.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Firecrawl
  • Ai Platform
  • Search

Recommended for

FirecrawlFirecrawl customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.