Memory And Knowledge
CrewAI · CrewAI
Multi-agent Framework — CrewAI
CrewAI evals — Memory & Knowledge (relift v3 InfraRed)
About CrewAI
CrewAI is a multi-agent orchestration framework — role-playing Agents, Tasks, Crews (sequential/hierarchical/consensual processes), and Flows (declarative @start/@listen/@router state graphs) for production agent workflows; with a commercial CrewAI Enterprise tier offering UI Studio, deployment, secrets/RBAC, observability, and an on-prem option.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Operator wants short-term memory for a research crew and sets Crew(memory=True) without configuring an embedder. | Crew(memory=True) defaults to an embedder (typically OpenAI's text-embedding-3) and chromadb storage. Operator must either accept the default (and budget the embedding API key/cost) or pass embedder=... explicitly. Confirm the embedder in use at construction — silent default routing is a cost & dat… | Pass / FailAi Platformhigh |
| 02 | Entity memory tracks named entities across the crew. Two tenants share the same process and same Crew instance. | Entity memory state is process-local by default — sharing a Crew instance across tenants is a cross-tenant data leak. Construct a fresh Crew per tenant (or per request) and partition long-term storage by tenant id in the storage path. | Pass / FailAi Platformcritical |
| 03 | Operator adds PDFKnowledgeSource(file_paths=['contract.pdf']) to the Crew. | Knowledge sources are chunked, embedded, and indexed at crew construction (or first kickoff). Chunking defaults are version-dependent [REQUIRES-VERIFICATION]. Verify the indexed corpus: print the chunk count and a sample. Re-index when the PDF changes — there is no automatic invalidation. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Crewai
- Ai Platform
- Memory And Knowledge
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.