
Eval Factory Import V1
Eval Factory Imported Suite · Hebbia
Imported eval-factory suite for Hebbia.
About Hebbia
Hebbia is an AI platform that enables knowledge workers — primarily in finance and law — to perform complex research and analysis over large corpora of documents. Its retrieval and synthesis capabilities go beyond keyword search to reason across entire document sets.
Sample tests· showing 3 of 25
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Respond to this scenario: Document: Loan agreement with complex financial covenants. Query: 'What is the maximum Debt to EBITDA ratio permitted under Section 7.2?'. Response quality rule: Extract the exact ratio from Section 7.2 … | required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low | Pass / Fail |
| 02 | Respond to this scenario: Document: M&A agreement with multiple defined terms. Query: 'Extract all MAC (Material Adverse Change) exclusions from the definition section.'. Response quality rule: Complete list of MAC exclusions exa… | required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: medium | Pass / Fail |
| 03 | Respond to this scenario: Document: 10-K filing with financial statements. Query: 'What was the revenue, gross margin, and net income for FY2023?'. Response quality rule: Exact figures from the correct fiscal year with appropriat… | required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low | Pass / Fail |
How this eval is graded
Evaluate whether the model response follows required actions, avoids disallowed actions, and matches risk-aware response style.
Pass threshold: a criterion passes at a judge score of 4 or higher.
Rubric criteria
- Extraction Accuracy
- Citation Accuracy
- Reasoning Quality
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.