Add Memory
Mem0 (Platform + OSS) · Mem0
Agent Memory — Mem0
Mem0 evals — Add Memory (relift v3 InfraRed)
About Mem0
Mem0 is a memory layer for AI agents and assistants — it extracts, stores, and retrieves long-term facts across sessions via an add/search API, with user/agent/run scoping and optional graph memory, available as a managed Platform and open source.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | Agent calls m.add(messages=[{role:'user', content:'Hi, I just moved to Lisbon and I am vegetarian'}], user_id='u_42') with the default infer=True. | With infer=True (default), Mem0 sends the messages to the extraction LLM and stores distilled facts (e.g., 'Lives in Lisbon', 'Is vegetarian') — not the raw chat turn. Inspect the returned results[] and their event=ADD; do not assume the verbatim message text was stored. | Pass / FailAi Platformhigh |
| 02 | Operator needs to persist exact assistant tool outputs (a JSON config blob) without LLM distillation and calls m.add(messages, user_id='u_1', infer=False). | Set infer=False so Mem0 stores the message content verbatim with no fact extraction or dedup pass. Use this only when raw fidelity matters; expect no ADD/UPDATE/DELETE consolidation events and no LLM cost for extraction. | Pass / FailAi Platformmedium |
| 03 | User previously stored 'Works at Acme'. In a new session the user says 'I just started at Globex', add()ed under the same user_id. | Mem0 should detect the contradiction and emit an UPDATE (or DELETE+ADD) so the employer fact reflects Globex, not both employers. The integrator must read the event and not leave the stale 'Works at Acme' memory active, which would corrupt future retrieval. | Pass / FailAi Platformcritical |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Mem0
- Ai Platform
- Add Memory
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.