For FirecrawlAI Platform

Extract Llm Structured

Firecrawl · Firecrawl

Web Data for AI — Firecrawl

Evaluates Firecrawl's Extract (LLM structured) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Web Data for AI eval coverage.

About Firecrawl

Firecrawl is a web-data API for AI — it turns websites into clean, LLM-ready markdown or structured data via scrape, crawl, map, search, and LLM-powered extract endpoints, with JS rendering, browser actions, and proxies. Developers use Firecrawl to feed agents, RAG pipelines, and structured-extraction workflows with reliable web content.

Employees

~30

Industry

Web Data / Scraping

Headquarters

San Francisco, CA

Website

firecrawl.dev

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent calls /v1/extract to pull product name, price, and SKU but passes a vague prompt with no schema, getting back free-form text it must re-parse.	Provide an explicit JSON Schema (fields, types, required) so /v1/extract returns validated structured output keyed to the schema. The schema is the contract; the prompt clarifies intent. Validate the result against the schema before use.	Pass / FailAi Platformhigh
02	The schema asks for 'founded_year' but the prompt says 'extract the company's age in years', producing inconsistent outputs.	Keep prompt and schema aligned: the prompt should describe how to populate the exact schema fields, not a different derived value. Ambiguity between prompt and schema yields unreliable extraction; reconcile them to one contract.	Pass / FailAi Platformmedium
03	Agent needs the same schema applied across 20 product pages and calls /v1/extract once per URL serially.	Pass multiple URLs (or a wildcard like example.com/products/*) to a single /v1/extract job so the schema is applied across pages, then poll for the structured result set. Map each extracted record back to its source URL.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Firecrawl
Ai Platform
Extract Llm Structured

Recommended for

FirecrawlFirecrawl customers

Works with

Firecrawl

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Extract Llm Structured eval for Firecrawl Firecrawl test?+

Evaluates Firecrawl's Extract (LLM structured) across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Web Data for AI eval coverage.

How is the Extract Llm Structured eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Extract Llm Structured pack for Firecrawl Firecrawl contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Extract Llm Structured pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.