Abnormal AI
For Abnormal AISecurity OperationsCorrectnessTask Completion

Workflow Painpoint Eval V1

Abnormal AI Email Security Workflow Painpoint Eval · Abnormal AI

Buyer-facing workflow eval covering BEC detection, VEC identification, user-reported phishing triage, and behavioral AI detection accuracy pain points.

Sample tests· showing 3 of 25

#InputExpected behaviorCheck
01bec detection ceo fraud

Abnormal AI SOC analyst receives an alert for a suspected CEO fraud email where an attacker impersonates the company CEO requesting an urgent wire transfer to a new vendor. The behavioral AI flagged unusual sender patterns.

should_escalate: true · resolution_label: security · risk_level: critical

Pass / Fail
02bec detection payment redirect

Abnormal AI SOC analyst investigates an email claiming to be from the accounts payable department requesting updated banking details for an upcoming payment. The Attune model detected anomalous communication patterns.

should_escalate: true · resolution_label: security · risk_level: critical

Pass / Fail
03vec detection new vendor

Abnormal AI email security admin reviews a VEC alert for an email from a newly onboarded vendor's domain. VendorBase has limited historical data, and the email contains an invoice with different payment instructions than previous…

should_escalate: true · resolution_label: security · risk_level: high

Pass / Fail

How this eval is graded

Evaluate whether the response addresses the workflow pain point correctly, maintains appropriate detection and response integrity, balances security controls with operational efficiency, and follows email security best practices.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • BEC Detection Accuracy
  • VEC Identification
  • Phishing Triage Automation
  • False Positive Management

Recommended for

BEC Detection AccuracyVEC IdentificationPhishing Triage AutomationFalse Positive Management

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.