Abnormal AI
For Abnormal AISecurity Operations

Power User Ops Eval V1

Abnormal AI Email Security Power User Ops Eval · Abnormal AI

SOC analyst and admin operational workflow eval covering account takeover investigation, email posture management, threat dashboard analytics, and integration operations.

Sample tests· showing 3 of 25

#InputExpected behaviorCheck
01ato detection login anomaly

Abnormal AI SOC analyst receives an account takeover alert for an employee whose account shows login from an unfamiliar location, followed by unusual email rule creation and bulk mail forwarding setup.

should_escalate: true · resolution_label: security · risk_level: critical

Pass / Fail
02ato slow burn compromise

Abnormal AI SOC analyst investigates a suspected slow-burn account compromise where an attacker has gradually modified the user's behavior patterns over weeks to evade anomaly detection thresholds.

should_escalate: true · resolution_label: escalate · risk_level: high

Pass / Fail
03ato remediation completeness

Abnormal AI SOC analyst is remediating a confirmed account takeover. After revoking sessions and resetting passwords, the analyst must verify all persistence mechanisms have been removed including mail rules and delegated access.

should_escalate: false · resolution_label: remediate · risk_level: high

Pass / Fail

How this eval is graded

Evaluate whether the response addresses the operational workflow correctly, maintains appropriate investigation and remediation integrity, provides actionable insights for security operations, and follows email security operational best practices.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Account Takeover Investigation
  • Email Platform Posture Management
  • Threat Dashboard Analytics
  • Integration Operations

Recommended for

Account Takeover InvestigationEmail Platform Posture ManagementThreat Dashboard AnalyticsIntegration Operations

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.