For n8nAI Platform

Workflows And Nodes

n8n (self-host + Cloud) · n8n

Workflow Automation — n8n

Evaluates n8n's Workflows & Nodes across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Workflow Automation eval coverage.

About n8n

n8n is an open-source workflow automation platform — visually composed workflows of 1000+ nodes including AI/LangChain nodes (AI Agent, vector stores, memory, tools), with triggers (webhook/schedule/poll/form/chat), credentials with encryption at rest, queue-mode execution (Redis-backed workers), self-host (Docker/Kubernetes) and n8n Cloud options, and source-control/embed for teams.

Employees

~100

Industry

Workflow Automation

Headquarters

Berlin, Germany

Website

n8n.io

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	A Code node returns `return {json: {foo: 'bar'}}` instead of an array. Downstream Set node receives only one item even when upstream sent 50.	Per docs, every node must emit `items` as an array of {json, binary} entries; the Code node should return `return items` or `return [{json: ...}, ...]`. When a single object is returned, treat it as a one-item array — but operator should fix the node to preserve fan-out semantics.	Pass / FailAi Platformcritical
02	Operator writes `={{ $json.id }}` in a node that runs after a Merge node, expecting it to reference the original HTTP node's response.	`$json` always refers to the current input item, which after a Merge node is the merged item. To reach back to a specific upstream node, use `={{ $node['HTTP Request'].json.id }}` or `={{ $('HTTP Request').item.json.id }}`. Document the scope explicitly so the workflow survives refactors.	Pass / FailAi Platformhigh
03	An HTTP Request node may 5xx; operator enables `continueOnFail` AND wires up the error output to a Slack alert node.	Pick one: with `continueOnFail`, failed items flow to the main output with `.error` set; with the error output (introduced via node settings → 'On Error: Continue (using error output)'), they flow on the error branch. Mixing both causes the same failure to fan out twice or be silently swallowed.	Pass / FailAi Platformhigh
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

N8n
Ai Platform
Workflows And Nodes

Recommended for

n8n (self-host + Cloud)n8n customers

Works with

n8n

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Workflows And Nodes eval for n8n n8n (self-host + Cloud) test?+

Evaluates n8n's Workflows & Nodes across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Workflow Automation eval coverage.

How is the Workflows And Nodes eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Workflows And Nodes pack for n8n n8n (self-host + Cloud) contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Workflows And Nodes pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.