For Microsoft AutoGenAI Platform

Autogen Termination Conditions

AutoGen · Microsoft AutoGen

Multi-agent Framework — Microsoft AutoGen

Evaluates Microsoft AutoGen's Termination Conditions across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Multi-agent Framework eval coverage.

About Microsoft AutoGen

Microsoft is a global technology company and a leading cloud and AI provider. Microsoft Copilot embeds AI assistance across Microsoft 365, Azure, and Teams — helping employees generate content, analyze data, and automate tasks across the Microsoft ecosystem.

Employees

~221,000

Industry

Enterprise Software & Cloud

Headquarters

Redmond, WA

Website

microsoft.com

Sample tests· showing 3 of 9

#	Input	Expected behavior	Check
01	Team uses TextMentionTermination('DONE'). Agent emits 'we are done with the task' (lowercase 'done'). Team does not stop.	TextMentionTermination matches the configured substring as-is — semantics are case-sensitive substring containment. Choose unambiguous sentinels (e.g. '<<TERMINATE>>') that won't appear in normal prose, and prime the agent's system_message to emit the exact token at task completion.	Pass / FailAi Platformmedium
02	Team uses TokenUsageTermination(max_total_token=20000). One model_client doesn't report usage and the counter stays at 0.	TokenUsageTermination relies on the model_client surfacing per-call usage. Verify your model_client populates the usage field on every response (some custom clients may not). Without usage signal, the termination never fires — pair with MaxMessageTermination as a backstop.	Pass / FailAi Platformhigh
03	Team has MaxMessageTermination(max_messages=10). Operator expects 10 assistant messages but the team stops after ~7.	MaxMessageTermination counts all messages in the team's transcript — tool messages, user proxy turns, system messages count too. Set the cap based on total transcript length, not assistant turn count. Inspect the transcript shape (text + tool + tool-result) when calibrating the cap.	Pass / FailAi Platformmedium
Unlock full benchmark 6 more test cases Use this benchmark

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

Autogen
Ai Platform
Termination Conditions

Recommended for

AutoGenMicrosoft AutoGen customers

Works with

Microsoft AutoGen

Related evals

AI Platform

Claude API

Evaluates Anthropic's Batch API across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Extended Thinking across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View AI Platform

Claude API

Evaluates Anthropic's Files API & Citations across 9 scenario-based test cases, each graded against an expected-behavior rubric by an LLM judge, from Corsac's Foundation Model & API eval coverage.

View

Frequently asked questions

What does the Autogen Termination Conditions eval for Microsoft AutoGen AutoGen test?+

How is the Autogen Termination Conditions eval scored?+

The judge rubric: Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

How many test cases does this eval pack include?+

The Autogen Termination Conditions pack for Microsoft AutoGen AutoGen contains 9 test cases. 3 sample cases are shown free on this page; the full set runs in a Corsac workspace.

How do I run this eval?+

Sign up for Corsac, connect your model or agent endpoint, and run the Autogen Termination Conditions pack as-is or after customizing thresholds. Results land in your workspace with per-case scores, and you can gate releases on the pack in CI via the REST API.

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.