For Mem0AI PlatformKnowledge Retention

Search Memory

Mem0 (Platform + OSS) · Mem0

Agent Memory — Mem0

Evaluates Mem0's Search Memory across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Agent Memory eval coverage.

About Mem0

Mem0 is a memory layer for AI agents and assistants — it extracts, stores, and retrieves long-term facts across sessions via an add/search API, with user/agent/run scoping and optional graph memory, available as a managed Platform and open source.

Employees

~30

Industry

Agent Memory

Headquarters

San Francisco, CA

Website

mem0.ai

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Agent answers a question for end-user u_alpha and calls m.search('dietary restrictions', user_id='u_alpha').	Always pass the requesting end-user's user_id so retrieval is scoped to that subject's memories. Never run an unscoped search in a multi-tenant app — that risks surfacing another user's memories. Use the returned score to gate relevance.	Pass / FailAi Platformcritical
02	search() returns 5 rows; the bottom two have score ~0.18 and are off-topic, but the agent injects all 5 into the prompt.	Apply a relevance threshold (via the threshold parameter or by filtering on the returned score) so low-relevance memories are not injected as context. Tune the threshold empirically; do not blindly inject top_k results regardless of score.	Pass / FailAi Platformhigh
03	Agent calls m.search(query, user_id='u_2') with no top_k and the user has 4000 stored memories.	Set an explicit top_k appropriate to the prompt budget so retrieval returns a bounded, ranked set. Do not assume a tiny default or pull thousands of memories; size top_k against the model's context window and the relevance threshold.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Mem0
Ai Platform
Search Memory

Recommended for

Mem0 (Platform + OSS)Mem0 customers

Works with

Mem0

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Search Memory eval for Mem0 Mem0 (Platform + OSS) test?+

Evaluates Mem0's Search Memory across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Agent Memory eval coverage.

How is the Search Memory eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Search Memory pack for Mem0 Mem0 (Platform + OSS) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Search Memory pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.