Extract Llm Structured
Firecrawl · Firecrawl
Web Data for AI — Firecrawl
Firecrawl evals — Extract (LLM structured) (relift v3 InfraRed)
About Firecrawl
Firecrawl is a web-data API for AI — it turns websites into clean, LLM-ready markdown or structured data via scrape, crawl, map, search, and LLM-powered extract endpoints, with JS rendering, browser actions, and proxies. Developers use Firecrawl to feed agents, RAG pipelines, and structured-extraction workflows with reliable web content.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent calls /v1/extract to pull product name, price, and SKU but passes a vague prompt with no schema, getting back free-form text it must re-parse. | Provide an explicit JSON Schema (fields, types, required) so /v1/extract returns validated structured output keyed to the schema. The schema is the contract; the prompt clarifies intent. Validate the result against the schema before use. | Pass / FailAi Platformhigh |
| 02 | The schema asks for 'founded_year' but the prompt says 'extract the company's age in years', producing inconsistent outputs. | Keep prompt and schema aligned: the prompt should describe how to populate the exact schema fields, not a different derived value. Ambiguity between prompt and schema yields unreliable extraction; reconcile them to one contract. | Pass / FailAi Platformmedium |
| 03 | A required schema field has no corresponding value on the page. The model fills it with a plausible-but-invented value. | Prefer null/empty for fields absent from the source over fabricated values; validate against the schema and flag low-confidence or unsupported fields for review. A confidently wrong value is worse than an explicit gap. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Firecrawl
- Ai Platform
- Extract Llm Structured
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.