For ComposioAI Platform

Custom Tools And Processing

Composio · Composio

Agent Tooling & Integrations — Composio

Evaluates Composio's Custom Tools & Processing across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Agent Tooling & Integrations eval coverage.

About Composio

Composio is a tool-integration layer for AI agents — 250+ managed tool integrations (Gmail, GitHub, Slack, and more) with built-in OAuth/auth, per-end-user entities for multi-tenant isolation, triggers and webhooks, framework adapters (OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI), custom tools and schema processors, and an MCP server that exposes tools to MCP clients.

Employees

~40

Industry

Agent Tooling

Headquarters

San Francisco, CA

Website

composio.dev

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	An operator defines a custom tool wrapping their internal API but gives it no input schema, so the model passes free-form arguments.	Define the custom action with an explicit input schema (typed, required fields, descriptions) so the model produces valid arguments and the handler can validate them. A schemaless custom tool yields unvalidated, ambiguous calls.	Pass / FailAi Platformhigh
02	A custom action's raw response includes an internal access token field. A post-processor returns the response to the model.	Use the post-processor to strip secrets/credentials and trim large fields before the result reaches the model. The post-processor must remove the token field, not just rename it. Confirm secret fields are gone from the model-visible payload.	Pass / FailAi Platformcritical
03	A pre-processor should inject a tenant_id and normalize a date before a custom action executes, but it instead mutates unrelated fields.	Use a pre-processor (schema/request processor) to deterministically transform inputs before execution — inject server-derived context (tenant_id), normalize formats — without altering the model's intended arguments. Keep the transform narrow and idempotent.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Composio
Ai Platform
Custom Tools And Processing

Recommended for

ComposioComposio customers

Works with

Composio

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Custom Tools And Processing eval for Composio Composio test?+

How is the Custom Tools And Processing eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Custom Tools And Processing pack for Composio Composio contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Custom Tools And Processing pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.