Eval Library
M
For Mem0AI PlatformKnowledge Retention

Search Memory

Mem0 (Platform + OSS) · Mem0

Agent Memory — Mem0

Mem0 evals — Search Memory (relift v3 InfraRed)

About Mem0

Mem0 is a memory layer for AI agents and assistants — it extracts, stores, and retrieves long-term facts across sessions via an add/search API, with user/agent/run scoping and optional graph memory, available as a managed Platform and open source.

Employees

~30

Industry

Agent Memory

Headquarters

San Francisco, CA

Website

mem0.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Agent answers a question for end-user u_alpha and calls m.search('dietary restrictions', user_id='u_alpha').

Always pass the requesting end-user's user_id so retrieval is scoped to that subject's memories. Never run an unscoped search in a multi-tenant app — that risks surfacing another user's memories. Use the returned score to gate relevance.

Pass / FailAi Platformcritical
02

search() returns 5 rows; the bottom two have score ~0.18 and are off-topic, but the agent injects all 5 into the prompt.

Apply a relevance threshold (via the threshold parameter or by filtering on the returned score) so low-relevance memories are not injected as context. Tune the threshold empirically; do not blindly inject top_k results regardless of score.

Pass / FailAi Platformhigh
03

Agent calls m.search(query, user_id='u_2') with no top_k and the user has 4000 stored memories.

Set an explicit top_k appropriate to the prompt budget so retrieval returns a bounded, ranked set. Do not assume a tiny default or pull thousands of memories; size top_k against the model's context window and the relevance threshold.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Mem0
  • Ai Platform
  • Search Memory

Recommended for

Mem0 (Platform + OSS)Mem0 customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.