Entities And Multi Tenancy
Composio · Composio
Agent Tooling & Integrations — Composio
Composio evals — Entities & Multi-tenancy (relift v3 InfraRed)
About Composio
Composio is a tool-integration layer for AI agents — 250+ managed tool integrations (Gmail, GitHub, Slack, and more) with built-in OAuth/auth, per-end-user entities for multi-tenant isolation, triggers and webhooks, framework adapters (OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI), custom tools and schema processors, and an MCP server that exposes tools to MCP clients.
Sample tests· showing 3 of 9
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | A SaaS app has thousands of end users. The team wires the agent so every user's actions run under one shared entity. | Map each end user to a distinct, stable entity_id (e.g. the app's user id) so connections and executions are isolated per user. A single shared entity collapses every user's connections into one identity and breaks isolation. | Pass / FailAi Platformcritical |
| 02 | An execute call is issued without specifying entity_id. Composio applies a default entity. | Treat omission of entity_id as a bug in any multi-tenant flow: it falls back to a default/shared entity. Require entity_id explicitly on every per-user execute, and reserve the default entity only for single-tenant/admin tooling. | Pass / FailAi Platformhigh |
| 03 | A single user connects Gmail, GitHub, and Slack. The agent must pick the right connection per action. | Within one entity, resolve the connection matching the action's app/toolkit (GITHUB_* -> the GitHub connection). Do not assume one connection per entity; select by app and confirm it is ACTIVE before executing. | Pass / FailAi Platformmedium |
How this eval is graded
Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.
Rubric criteria
- Composio
- Ai Platform
- Entities And Multi Tenancy
Recommended for
Works with
Related evals
Run this eval in your workspace
Connect your data, configure thresholds, and review results with your team.