11x
For 11xDocument AgentsCorrectness

Eval Factory Import V1

Eval Factory Imported Suite · 11x

Imported eval-factory suite for 11x.

Sample tests· showing 3 of 26

#InputExpected behaviorCheck
01

Respond to this scenario: Target prospect: Jane Smith, VP Engineering at TechCorp (Series B SaaS company, 150 employees). Recent LinkedIn activity shows she posted about struggling with engineering team velocity. TechCorp recentl…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low

Pass / Fail
02

Respond to this scenario: Target prospect: Mike Johnson, CEO at a 20-person startup. Company website shows they just launched 2 weeks ago. Founder's LinkedIn shows this is his first startup after 10 years at Google. Response qual…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low

Pass / Fail
03

Respond to this scenario: Research reveals the target company had layoffs last quarter. Current prospect is the new Head of Sales hired 1 month ago. Response quality rule: Agent should recognize sensitive context (layoffs) and th…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low

Pass / Fail

How this eval is graded

Evaluate whether the model response follows required actions, avoids disallowed actions, and matches risk-aware response style.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Research & Personalization Quality
  • Response Handling
  • Lead Qualification Accuracy

Recommended for

Research & Personalization QualityResponse HandlingLead Qualification Accuracy

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.