Eval Library
MA
For Microsoft AutoGenAI Platform

Autogen Termination Conditions

AutoGen · Microsoft AutoGen

Multi-agent Framework — Microsoft AutoGen

Microsoft AutoGen evals — Termination Conditions (relift v3 InfraRed)

About Microsoft AutoGen

Microsoft is a global technology company and a leading cloud and AI provider. Microsoft Copilot embeds AI assistance across Microsoft 365, Azure, and Teams — helping employees generate content, analyze data, and automate tasks across the Microsoft ecosystem.

Employees

~221,000

Industry

Enterprise Software & Cloud

Headquarters

Redmond, WA

Sample tests· showing 3 of 9

#InputExpected behaviorCheck
01

Team uses TextMentionTermination('DONE'). Agent emits 'we are done with the task' (lowercase 'done'). Team does not stop.

TextMentionTermination matches the configured substring as-is — semantics are case-sensitive substring containment. Choose unambiguous sentinels (e.g. '<<TERMINATE>>') that won't appear in normal prose, and prime the agent's system_message to emit the exact token at task completion.

Pass / FailAi Platformmedium
02

Team uses TokenUsageTermination(max_total_token=20000). One model_client doesn't report usage and the counter stays at 0.

TokenUsageTermination relies on the model_client surfacing per-call usage. Verify your model_client populates the usage field on every response (some custom clients may not). Without usage signal, the termination never fires — pair with MaxMessageTermination as a backstop.

Pass / FailAi Platformhigh
03

Two AssistantAgents in a RoundRobinGroupChat keep asking each other clarifying questions; no agent ever emits 'TERMINATE' or a HandoffMessage. There is no MaxMessageTermination configured.

Every team MUST have at least one always-firing termination condition (MaxMessage or TokenUsage or Timeout). Without one, two clarifying agents loop indefinitely and burn upstream model spend. Treat 'no termination condition' as a CI failure.

Pass / FailAi Platformcritical

How this eval is graded

Grade against expected.ideal_behavior and expected.rubric. Per-criterion pass requires mean >= 4.0 and no criterion below 3.

Rubric criteria

  • Autogen
  • Ai Platform
  • Termination Conditions

Recommended for

AutoGenMicrosoft AutoGen customers

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.