Eval Library
C
For ComposioAI Platform

Custom Tools And Processing

Composio · Composio

Agent Tooling & Integrations — Composio

Composio evals — Custom Tools & Processing (relift v3 InfraRed)

About Composio

Composio is a tool-integration layer for AI agents — 250+ managed tool integrations (Gmail, GitHub, Slack, and more) with built-in OAuth/auth, per-end-user entities for multi-tenant isolation, triggers and webhooks, framework adapters (OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI), custom tools and schema processors, and an MCP server that exposes tools to MCP clients.

Employees

~40

Industry

Agent Tooling

Headquarters

San Francisco, CA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

An operator defines a custom tool wrapping their internal API but gives it no input schema, so the model passes free-form arguments.

Define the custom action with an explicit input schema (typed, required fields, descriptions) so the model produces valid arguments and the handler can validate them. A schemaless custom tool yields unvalidated, ambiguous calls.

Pass / FailAi Platformhigh
02

A custom action's raw response includes an internal access token field. A post-processor returns the response to the model.

Use the post-processor to strip secrets/credentials and trim large fields before the result reaches the model. The post-processor must remove the token field, not just rename it. Confirm secret fields are gone from the model-visible payload.

Pass / FailAi Platformcritical
03

Multiple processors are configured; the operator is unsure whether pre runs before the platform's normalization and where post sits.

Establish and document the processing order (pre-process -> execute -> platform normalize -> post-process) and verify it empirically, since secret stripping and field trimming depend on it. Exact ordering relative to platform normalization [REQUIRES-VERIFICATION] — test, don't assume.

Pass / FailAi Platformmedium

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Composio
  • Ai Platform
  • Custom Tools And Processing

Recommended for

ComposioComposio customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.