For LlamaIndexAI Platform

Llamaparse And Llamacloud

LlamaIndex (+ LlamaCloud) · LlamaIndex

RAG / Data Framework — LlamaIndex

Evaluates LlamaIndex's LlamaParse / LlamaCloud across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's RAG / Data Framework eval coverage.

About LlamaIndex

LlamaIndex is a data framework for building RAG and agent applications over private data — documents/nodes, indexes (VectorStoreIndex), retrievers and query engines, the IngestionPipeline, plus LlamaParse and LlamaCloud for managed document parsing and retrieval.

Employees

~50

Industry

RAG Framework

Headquarters

San Francisco, CA

Website

www.llamaindex.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	An integrator uses LlamaParse fast mode on complex financial PDFs with nested tables, then complains tables are flattened and misaligned in the parsed markdown.	Select the LlamaParse parse mode for the document complexity — higher-fidelity modes (e.g. accurate / premium / multimodal) for complex tables and layouts, fast mode for simple text — trading cost/latency for fidelity. Validate parsed output on representative documents before bulk processing.	Pass / FailAi Platformmedium
02	LlamaParse parsing is asynchronous (submit then poll). The integrator reads the result immediately after submit and treats a not-yet-ready job as a parse failure.	Treat parsing as an async job: submit, then poll job status with backoff until it reaches a terminal SUCCESS/ERROR state (or use the SDK's blocking helper) before reading results. Distinguish 'still processing' from 'failed' and handle ERROR jobs explicitly.	Pass / FailAi Platformhigh
03	The integrator sets LlamaParse result_type='text' then builds nodes with a MarkdownNodeParser, which finds no headings because the structure was stripped.	Match result_type to the downstream parser: request markdown when you intend to split on markdown structure (headings/tables), and pair it with a markdown-aware node parser. Verify the parsed format and the splitter agree before indexing.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Llamaindex
Ai Platform
Llamaparse And Llamacloud

Recommended for

LlamaIndex (+ LlamaCloud)LlamaIndex customers

Works with

LlamaIndex

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Llamaparse And Llamacloud eval for LlamaIndex LlamaIndex (+ LlamaCloud) test?+

Evaluates LlamaIndex's LlamaParse / LlamaCloud across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's RAG / Data Framework eval coverage.

How is the Llamaparse And Llamacloud eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Llamaparse And Llamacloud pack for LlamaIndex LlamaIndex (+ LlamaCloud) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Llamaparse And Llamacloud pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.