Eval Library
C
For CrewAIAI Platform

Tasks

CrewAI · CrewAI

Multi-agent Framework — CrewAI

CrewAI evals — Tasks (relift v3 InfraRed)

About CrewAI

CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.

Employees

~50

Industry

Agent Framework

Headquarters

San Francisco, CA

Website

crewai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator constructs Task(description='research the competitor', agent=researcher) with no expected_output.

expected_output is required by CrewAI and shapes the agent's task-completion criterion. Set a concrete, observable expected_output (e.g., 'A markdown table of 5 competitors with name/funding/HQ columns'). Missing or vague expected_output leaves the agent guessing when to stop — verify the field is …

Pass / FailAi Platformhigh
02

Operator sets both output_json={schema} and output_pydantic=ReportModel on the same Task expecting belt-and-suspenders enforcement.

Use one or the other — output_pydantic is the richer contract (validators, computed fields). Setting both is ambiguous; pick output_pydantic if you have a model class. Verify CrewAI's documented precedence rule and don't rely on order of mutation.

Pass / FailAi Platformmedium
03

Operator sets Task(output_file=user_supplied_filename) where user_supplied_filename comes from request input — value is '../../../etc/passwd'.

Treat output_file path as untrusted when it contains user-controlled segments. Resolve against an allowlist directory, reject '..' segments, and write within the sandbox root. CrewAI writes to the supplied path as-is — sanitize before the Task is constructed.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Crewai
  • Ai Platform
  • Tasks

Recommended for

CrewAICrewAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.