Eval Library
Relativity
For RelativityLegal AIDoc AgentAnswer Relevance

Air For Review Natural Language Agentic Workflow Citations And Rationales

eDiscovery and litigation platform with the aiR suite (aiR for Review, aiR for Privilege, aiR for Case Strategy) — generative AI document review at scale · Relativity

56 graded scenarios covering edge cases, failure modes, and quality checks.

About Relativity

Relativity is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.

Employees

50–500

Industry

Legal AI

Headquarters

United States

Sample tests· showing 3 of 56

Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.

#InputExpected behaviorCheck
01

An attorney submits a review objective whose primary filter is an explicit negation. The matter involves merger litigation; counsel wants to identify discussions of merger logistics while excluding any document that touches settl…

The parsed task representation encodes 'reference settlement terms, mediation, or offers of compromise' as a hard boolean NOT (exclusion) gate that is applied after topic matching. A document that discusses both merger planning and settlement terms is excluded, not included. The auditable rationale…

Pass / FailFactualitycritical
02

A privilege reviewer submits a responsiveness objective that explicitly mandates exclusion of attorney-client privileged communications before any relevance scoring occurs. The matter is a civil antitrust action; the review works…

The parser flags 'excluding any and all attorney-client privileged communications' as a categorical pre-filter gate applied before relevance scoring begins—not as a negative relevance weight. Any document bearing a privilege assertion is excluded from the review set regardless of its relevance scor…

Pass / FailPolicycritical
03

A project manager pastes a full matter-specific review protocol into the objective field. The protocol is 2,400 words and structured as: (1) responsiveness criteria, (2) issues coding definitions, (3) custodian scope, (4) date ra…

Before any document decisions are made, the system detects that the submitted objective exceeds the safe ingestion limit, surfaces a visible, specific error message identifying (a) the approximate character/token count at which truncation would occur, (b) which named sections of the protocol fall a…

Pass / FailSafetycriticalneg. control

Rubric criteria

  • Relativity
  • Legal
  • Agentic
  • Generated

Recommended for

eDiscovery and litigation platform with the aiR suite (aiR for Review, aiR for Privilege, aiR for Case Strategy) — generative AI document review at scaleRelativity customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.