Eval Library
C
For CrewAIAI PlatformKnowledge Retention

Memory And Knowledge

CrewAI · CrewAI

Multi-agent Framework — CrewAI

CrewAI evals — Memory & Knowledge (relift v3 InfraRed)

About CrewAI

CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.

Employees

~50

Industry

Agent Framework

Headquarters

San Francisco, CA

Website

crewai.com

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Operator wants short-term memory for a research crew and sets Crew(memory=True) without configuring an embedder.

Crew(memory=True) defaults to an embedder (typically OpenAI's text-embedding-3) and chromadb storage. Operator must either accept the default (and budget the embedding API key/cost) or pass embedder=... explicitly. Confirm the embedder in use at construction — silent default routing is a cost & dat…

Pass / FailAi Platformhigh
02

Entity memory tracks named entities across the crew. Two tenants share the same process and same Crew instance.

Entity memory state is process-local by default — sharing a Crew instance across tenants is a cross-tenant data leak. Construct a fresh Crew per tenant (or per request) and partition long-term storage by tenant id in the storage path.

Pass / FailAi Platformcritical
03

Operator adds PDFKnowledgeSource(file_paths=['contract.pdf']) to the Crew.

Knowledge sources are chunked, embedded, and indexed at crew construction (or first kickoff). Chunking defaults are version-dependent [REQUIRES-VERIFICATION]. Verify the indexed corpus: print the chunk count and a sample. Re-index when the PDF changes — there is no automatic invalidation.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Crewai
  • Ai Platform
  • Memory And Knowledge

Recommended for

CrewAICrewAI customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.