Eval Library
C
For CrewAIAI Platform

Agents Roles And Goals

CrewAI · CrewAI

Multi-agent Framework — CrewAI

CrewAI evals — Agents (Roles & Goals) (relift v3 InfraRed)

About CrewAI

CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.

Employees

~50

Industry

Agent Framework

Headquarters

San Francisco, CA

Website

crewai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator defines Agent(role='Senior SQL Analyst', goal='write blog posts about cooking', backstory='30 years in marine biology'). The three fields are mutually incoherent and the crew kicks off.

role, goal, and backstory are concatenated into the agent's system prompt and must reinforce each other — the operator should align them before kickoff. Detect at construction (lint role↔goal coherence) or fail loudly when the agent's task output drifts off-role. Do not silently accept the mismatch.

Pass / FailAi Platformhigh
02

Agent is in a tool-call → tool-error → tool-call loop. max_iter is unset so the default applies.

Set max_iter to a finite operator-chosen value per agent. When the cap is hit, CrewAI terminates the agent's reasoning loop and surfaces a bounded result (or an error). Default value is [REQUIRES-VERIFICATION] — do not assume a specific number; treat unset as 'set it explicitly to bound cost'.

Pass / FailAi Platformcritical
03

Operator needs per-step traces of the agent's reasoning loop for an outage post-mortem and sets verbose=True with no step_callback.

verbose=True logs the reasoning loop to stdout for human readers; for machine-consumable traces, pass step_callback=fn(step) to capture each AgentAction / AgentFinish. Persist callback output to a structured sink (JSONL, OTel) — do not rely on parsing verbose stdout.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Crewai
  • Ai Platform
  • Agents Roles And Goals

Recommended for

CrewAICrewAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.