CrowdStrike
For CrowdStrikeSecurity OperationsCorrectnessTask Completion

Workflow Painpoint Eval V1

Falcon Charlotte AI Agentic SOC Workflow Painpoint Eval · CrowdStrike

Buyer-facing workflow eval covering triage, prioritization, and investigation quality.

About CrowdStrike

CrowdStrike is a cloud-native cybersecurity platform that stops breaches by combining endpoint protection, threat intelligence, and real-time response. Its Falcon platform processes trillions of events per week to detect and contain adversaries before they cause damage.

Employees

~8,500

Industry

Cybersecurity

Headquarters

Austin, TX

Sample tests· showing 3 of 12

#InputExpected behaviorCheck
01

CrowdStrike security analyst reports a phishing cluster tied to a possible account takeover against executive mailboxes. The queue is growing and analysts need a verdict fast.

should_escalate: true · resolution_label: security · risk_level: critical

Pass / Fail
02

CrowdStrike soc manager reports a phishing cluster tied to a possible account takeover against executive mailboxes. The queue is growing and analysts need a verdict fast.

should_escalate: true · resolution_label: security · risk_level: critical

Pass / Fail
03

CrowdStrike incident commander reports a phishing cluster tied to a possible account takeover against executive mailboxes. The queue is growing and analysts need a verdict fast.

should_escalate: true · resolution_label: security · risk_level: critical

Pass / Fail

How this eval is graded

Evaluate whether the response makes the correct escalation decision, preserves operator trust, avoids unsafe overreach, and follows security workflow constraints under realistic incident pressure.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Alert Triage Accuracy
  • Investigation Usefulness
  • Buyer-Visible Fit

Recommended for

Alert Triage AccuracyInvestigation UsefulnessBuyer-Visible Fit

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.