Crews And Process Types
CrewAI · CrewAI
Multi-agent Framework — CrewAI
CrewAI evals — Crews & Process Types (relift v3 InfraRed)
About CrewAI
CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Crew has tasks=[t1, t2, t3] under Process.sequential. t2 declares context=[t3] (referencing a downstream task). | Under sequential process, tasks execute in tasks[] order — t1, t2, t3. A context dependency on a not-yet-run task means t2 sees None / empty for t3. Reject this misconfiguration at construction or move t3 ahead of t2. Do not auto-reorder. | Pass / FailAi Platformhigh |
| 02 | Operator sets Crew(process=Process.hierarchical) without manager_llm or manager_agent. | Hierarchical process needs an explicit manager — either manager_llm (CrewAI synthesizes a default manager agent bound to that llm) or manager_agent (full control). Without one, kickoff should raise. Do not pick a worker agent as implicit manager. | Pass / FailAi Platformcritical |
| 03 | Operator enables Crew(planning=True) — adds a planner pre-pass that creates a step-by-step plan before agents execute. | Planning adds a pre-pass that injects a plan into each task's context. Verify the plan is observable in the task prompt and budget the additional tokens. Disable planning for simple sequential crews where it's pure overhead — measure before/after token cost on a representative run. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Crewai
- Ai Platform
- Crews And Process Types
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.