Hebbia
For HebbiaSearch & KnowledgeDoc AgentCorrectnessAnswer Relevance

Eval Factory Import V1

Eval Factory Imported Suite · Hebbia

Imported eval-factory suite for Hebbia.

About Hebbia

Hebbia is an AI platform that enables knowledge workers — primarily in finance and law — to perform complex research and analysis over large corpora of documents. Its retrieval and synthesis capabilities go beyond keyword search to reason across entire document sets.

Employees

~100

Industry

AI Research & Knowledge Management

Headquarters

New York, NY

Website

hebbia.ai

Sample tests· showing 3 of 25

#InputExpected behaviorCheck
01

Respond to this scenario: Document: Loan agreement with complex financial covenants. Query: 'What is the maximum Debt to EBITDA ratio permitted under Section 7.2?'. Response quality rule: Extract the exact ratio from Section 7.2 …

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low

Pass / Fail
02

Respond to this scenario: Document: M&A agreement with multiple defined terms. Query: 'Extract all MAC (Material Adverse Change) exclusions from the definition section.'. Response quality rule: Complete list of MAC exclusions exa…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: medium

Pass / Fail
03

Respond to this scenario: Document: 10-K filing with financial statements. Query: 'What was the revenue, gross margin, and net income for FY2023?'. Response quality rule: Exact figures from the correct fiscal year with appropriat…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low

Pass / Fail

How this eval is graded

Evaluate whether the model response follows required actions, avoids disallowed actions, and matches risk-aware response style.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Extraction Accuracy
  • Citation Accuracy
  • Reasoning Quality

Recommended for

Extraction AccuracyCitation AccuracyReasoning Quality

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.