Eval directory
Evals for Firecrawl
8 evaluation packs covering adversarial robustness, safety gates, workflow quality, and operator-level checks for Firecrawl AI products.
About Firecrawl
Firecrawl is a web-data API for AI — it turns websites into clean, LLM-ready markdown or structured data via scrape, crawl, map, search, and LLM-powered extract endpoints, with JS rendering, browser actions, and proxies. Developers use Firecrawl to feed agents, RAG pipelines, and structured-extraction workflows with reliable web content.
Available eval packs for Firecrawl
8 packs ready to run.
Actions And Dynamic Pages
Firecrawl evals — Actions & Dynamic Pages (relift v3 InfraRed)
Auth Rate Limits Credits Webhooks
Firecrawl evals — Auth, Rate Limits, Credits & Webhooks (relift v3 InfraRed)
Crawl Whole Site
Firecrawl evals — Crawl (whole site) (relift v3 InfraRed)
Extract Llm Structured
Firecrawl evals — Extract (LLM structured) (relift v3 InfraRed)
Map Url Discovery
Firecrawl evals — Map (URL discovery) (relift v3 InfraRed)
Safety Legality And Governance
Firecrawl evals — Safety, Legality & Governance (relift v3 InfraRed)
Scrape Single Url
Firecrawl evals — Scrape (single URL) (relift v3 InfraRed)
Search
Firecrawl evals — Search (relift v3 InfraRed)
Why eval Firecrawl AI
Firecrawl's AI features ship behind brand promises about accuracy, safety, and reliability. Buyers and integrators need to know those promises hold up under adversarial prompts, edge-case workflows, and the long tail of real customer inputs — not just the demo path.
The Corsac eval library for Firecrawl measures four dimensions teams care about most when deploying ai platform agents:
- Adversarial robustness — does the agent resist prompt injection, jailbreaks, and social-engineering attempts?
- Workflow quality— does it complete the task buyers were shown in the demo, on inputs that don't look like the demo?
- Safety gates — does it escalate or refuse when it should, and only then?
- Operator quality — does it preserve analyst trust by surfacing the right context at the right time?
Every eval pack above is hand-authored against Firecrawl's public product surface and runnable in Corsac with your own data.