Agents Roles And Goals
CrewAI · CrewAI
Multi-agent Framework — CrewAI
CrewAI evals — Agents (Roles & Goals) (relift v3 InfraRed)
About CrewAI
CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator defines Agent(role='Senior SQL Analyst', goal='write blog posts about cooking', backstory='30 years in marine biology'). The three fields are mutually incoherent and the crew kicks off. | role, goal, and backstory are concatenated into the agent's system prompt and must reinforce each other — the operator should align them before kickoff. Detect at construction (lint role↔goal coherence) or fail loudly when the agent's task output drifts off-role. Do not silently accept the mismatch. | Pass / FailAi Platformhigh |
| 02 | Agent is in a tool-call → tool-error → tool-call loop. max_iter is unset so the default applies. | Set max_iter to a finite operator-chosen value per agent. When the cap is hit, CrewAI terminates the agent's reasoning loop and surfaces a bounded result (or an error). Default value is [REQUIRES-VERIFICATION] — do not assume a specific number; treat unset as 'set it explicitly to bound cost'. | Pass / FailAi Platformcritical |
| 03 | Operator needs per-step traces of the agent's reasoning loop for an outage post-mortem and sets verbose=True with no step_callback. | verbose=True logs the reasoning loop to stdout for human readers; for machine-consumable traces, pass step_callback=fn(step) to capture each AgentAction / AgentFinish. Persist callback output to a structured sink (JSONL, OTel) — do not rely on parsing verbose stdout. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Crewai
- Ai Platform
- Agents Roles And Goals
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.