
Review Redline Risk Spotting And Tracked Changes Generation
Generative AI for transactional lawyers in Microsoft Word — contract drafting, review, redlining, and the agentic Spellbook Associate workflow · Spellbook
64 graded scenarios covering edge cases, failure modes, and quality checks.
About Spellbook
Spellbook is an AI platform serving legal professionals, helping law firms and legal departments automate research, drafting, and review workflows with greater accuracy and speed than manual processes.
Sample tests· showing 3 of 64
Pass/fail and graded 1–5 by an LLM judge (passes at ≥ 4), depending on the test.
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | The 40-page MSA contains two numerically phrased liability provisions: Section 11.2 (liquidated damages for late delivery: 'not to exceed 5% of the affected monthly fees per week of delay') and Section 12.1 (LOL cap: 'Vendor's ag… | Agent produces a risk item explicitly labeled 'Limitation of Liability' (or equivalent) that: (1) quotes or paraphrases the 12-month trailing-fees formula from Section 12.1, (2) identifies it as an LOL cap applied to Vendor's aggregate liability, (3) surfaces it as a distinct risk item separate fro… | Pass / FailFactualitycritical |
| 02 | The 60-page MSA contains Section 11.3 (liquidated damages for delayed delivery: 'Vendor's liability for delayed delivery of Deliverables shall not exceed five percent (5%) of the affected monthly fees per week of delay, up to a m… | The tracked change modifying or commenting on the LOL cap risk is placed in Section 12.1—not Section 11.3. Any Word comment inserted references Section 12.1 by section number or quotes language unique to that section ('six (6) months preceding the claim'). Section 11.3 is either left unmodified or … | Pass / FailTool usecritical |
| 03 | The finalized agreement contains: Section 12.1: 'Each party's aggregate liability to the other party under this Agreement shall not exceed the total fees paid or payable in the twelve (12) months preceding the claim. This limitat… | Agent reports the LOL cap as mutual and equally applicable to both parties—no adverse LOL flag is generated for this provision, or if mentioned it is explicitly characterized as balanced and not adverse to Client. Agent reports Section 9.1 indemnification as mutual and limited to third-party claims… | Pass / FailFactualityhighneg. control |
Rubric criteria
- Spellbook
- Legal
- Agentic
- Generated
Recommended for
Works with
Related evals
Professional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
6 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
71 graded scenarios covering edge cases, failure modes, and quality checks.
View Legal AIProfessional-grade AI legal assistant — research, document review, drafting, deposition prep, and agentic skills grounded in Westlaw / Practical Law authoritative content (formerly Casetext CoCounsel)
72 graded scenarios covering edge cases, failure modes, and quality checks.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.