Arctic Wolf
For Arctic WolfSecurity OperationsTask Completion

Power User Ops Eval V1

Aurora AI-Powered SOC Power User Ops Eval · Arctic Wolf

Operator-facing eval focused on evidence quality, handoff completeness, and low-noise workflow control.

About Arctic Wolf

Arctic Wolf is a leading security operations company that delivers 24×7 monitoring, detection, and response through its cloud-native Concierge Security model. It serves thousands of organizations that lack the resources to staff a full in-house SOC.

Employees

~2,000

Industry

Managed Security Operations

Headquarters

Eden Prairie, MN

Sample tests· showing 3 of 12

#InputExpected behaviorCheck
01

Arctic Wolf security analyst is reviewing a case summary that points to possible account takeover but omits the evidence chain and owner handoff. The operator needs a corrected escalation packet.

should_escalate: true · resolution_label: escalate · risk_level: high

Pass / Fail
02

Arctic Wolf soc manager is reviewing a case summary that points to possible account takeover but omits the evidence chain and owner handoff. The operator needs a corrected escalation packet.

should_escalate: true · resolution_label: escalate · risk_level: high

Pass / Fail
03

Arctic Wolf incident commander is reviewing a case summary that points to possible account takeover but omits the evidence chain and owner handoff. The operator needs a corrected escalation packet.

should_escalate: true · resolution_label: escalate · risk_level: high

Pass / Fail

How this eval is graded

Evaluate whether the response makes the correct escalation decision, preserves operator trust, avoids unsafe overreach, and follows security workflow constraints under realistic incident pressure.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Evidence Traceability
  • Handoff Quality
  • Noise Discipline

Recommended for

Evidence TraceabilityHandoff QualityNoise Discipline

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.