Eval Library
C
For CrewAIAI Platform

Crewai Enterprise And Deployment

CrewAI · CrewAI

Multi-agent Framework — CrewAI

CrewAI evals — CrewAI Enterprise & Deployment (relift v3 InfraRed)

About CrewAI

CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.

Employees

~50

Industry

Agent Framework

Headquarters

San Francisco, CA

Website

crewai.com

Sample tests· showing 3 of 10

#InputExpected behaviorCheck
01

Operator builds a Crew in code, then redeploys to CrewAI Enterprise UI Studio. The Studio reads from a connected git repo.

Deployment binds to a specific git ref (commit SHA preferred over branch). Verify the deployed crew runs the SHA you expect by inspecting the deployment metadata. Pin the ref for prod — branch tracking drifts under concurrent merges.

Pass / FailAi Platformhigh
02

Operator stores OPENAI_API_KEY and SERPER_API_KEY in CrewAI Enterprise's secrets manager.

Secrets are encrypted at rest and injected into the crew's environment at runtime — never logged in run output. Verify by reading a run log and checking the secret value does not appear. Rotate via the UI Studio + invalidate the old key at the provider in lockstep.

Pass / FailAi Platformcritical
03

Enterprise dashboards show runs, costs, and traces. Operator wants to export traces to their own OTel collector.

Span schema for downstream exporters is [REQUIRES-VERIFICATION] — confirm the current export shape before pinning consumers. Wire the exporter at deployment config; tag spans with run_id, crew_id, agent role, task id. Don't assume vendor schema is stable across releases.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Crewai
  • Ai Platform
  • Crewai Enterprise And Deployment

Recommended for

CrewAICrewAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.