Eval Library
C
For CrewAIAI Platform

Flows

CrewAI · CrewAI

Multi-agent Framework — CrewAI

CrewAI evals — Flows (relift v3 InfraRed)

About CrewAI

CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.

Employees

~50

Industry

Agent Framework

Headquarters

San Francisco, CA

Website

crewai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Flow class declares two @start methods. Operator calls flow.kickoff() expecting one to run.

Multiple @start methods are valid — all fire as parallel entry points. If you want one entry, declare a single @start (or parameterize selection). Verify the actual entry set by inspecting the Flow class. Document which entries exist so kickoff behavior is predictable.

Pass / FailAi Platformhigh
02

Flow declares class MyFlow(Flow[MyState]) with MyState as a Pydantic model. Operator mutates self.state.users.append(...) inside a listener.

Typed Flow state is a Pydantic model; mutations through self.state are valid for declared fields. Adding undeclared fields fails Pydantic validation. Use model_validate / model_copy for atomic updates if you need crash-safe mutation ordering. Document state-shape changes alongside the flow class.

Pass / FailAi Platformmedium
03

Flow ingests a webhook payload into self.state.message, then a downstream @listen feeds self.state.message into a Crew's Task description.

Untrusted state values that flow into agent prompts must be sanitized or delimited (e.g., wrap in <user_content>...</user_content> tags with explicit instructions to treat as data). Treat the flow state as an injection surface — the agent will follow whatever is in the prompt.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Crewai
  • Ai Platform
  • Flows

Recommended for

CrewAICrewAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.