Eval Library
M
For Mem0AI Platform

Scoping And Identity

Mem0 (Platform + OSS) · Mem0

Agent Memory — Mem0

Mem0 evals — Scoping & Identity (relift v3 InfraRed)

About Mem0

Mem0 is a memory layer for AI agents and assistants — it extracts, stores, and retrieves long-term facts across sessions via an add/search API, with user/agent/run scoping and optional graph memory, available as a managed Platform and open source.

Employees

~30

Industry

Agent Memory

Headquarters

San Francisco, CA

Website

mem0.ai

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

A SaaS app maps each end-user to a stable user_id and uses it for all add/search calls across sessions.

Use a stable, unique user_id (e.g., your auth subject id) as the long-term memory key so memories persist across sessions and devices for that person. Do not derive user_id from a per-session token or display name that can change or collide.

Pass / FailAi Platformhigh
02

A bug path calls search() with user_id accidentally set to None/empty in a multi-tenant deployment.

Fail closed: validate that a non-empty subject scope is present before issuing add/search, and reject or error rather than running an unscoped query that could return or write memories across tenants. Treat a missing scope as a defect, not a default-to-all.

Pass / FailAi Platformcritical
03

At the end of a conversation the agent decides which run_id-scoped facts deserve promotion to the user's long-term (user_id) memory.

Promote only durable, user-confirmed facts from session scope to long-term user_id scope; leave transient context in the session. Make promotion explicit (re-add under user_id) rather than assuming run_id memories silently persist forever.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Mem0
  • Ai Platform
  • Scoping And Identity

Recommended for

Mem0 (Platform + OSS)Mem0 customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.