Memory Extraction And Consolidation
Mem0 (Platform + OSS) · Mem0
Agent Memory — Mem0
Mem0 evals — Memory Extraction & Consolidation (relift v3 InfraRed)
About Mem0
Mem0 is a memory layer for AI agents and assistants — it extracts, stores, and retrieves long-term facts across sessions via an add/search API, with user/agent/run scoping and optional graph memory, available as a managed Platform and open source.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A turn contains 'Haha thanks! Anyway, I'm gluten-free and I have two kids.' with infer=True. | Extraction should persist the durable facts ('gluten-free', 'has two children') and drop conversational filler ('Haha thanks'). Verify the stored memories are the salient facts; do not expect pleasantries to be remembered or treat their absence as data loss. | Pass / FailAi Platformhigh |
| 02 | Stored: 'Budget is $50/month'. New turn: 'Actually my budget is $100/month'. | Consolidation should UPDATE the budget memory to $100/month (or DELETE+ADD) so a single current value remains. Read the event to confirm the old value did not linger as a competing memory that could be retrieved later. | Pass / FailAi Platformcritical |
| 03 | A travel app sets custom_categories (e.g., 'destinations', 'dietary', 'loyalty programs') so extracted memories are tagged into the operator's taxonomy. | Configure custom categories per the documented Platform/project setting so extracted memories are classified into the operator's taxonomy and become filterable by category. Verify categories appear on stored memories; do not rely on default categories if a custom taxonomy is required. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Mem0
- Ai Platform
- Memory Extraction And Consolidation
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.