Dropzone AI
For Dropzone AISecurity OperationsCorrectness

Eval Factory Import V1

Eval Factory Imported Suite · Dropzone AI

Imported eval-factory suite for Dropzone AI.

About Dropzone AI

Dropzone AI automates the triage and investigation of security alerts, acting as a tireless AI analyst that processes every alert with the same rigor as a senior human analyst. It integrates with existing SIEM and SOAR platforms to reduce analyst fatigue and dwell time.

Employees

~80

Industry

AI Security Operations

Headquarters

Seattle, WA

Sample tests· showing 3 of 25

#InputExpected behaviorCheck
01

Respond to this scenario: Alert: Multiple failed login attempts from single IP against admin accounts, followed by one successful login, followed by creation of new admin user. After hours (3 AM local time). Response quality rule…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: medium

Pass / Fail
02

Respond to this scenario: Alert: Antivirus quarantined 'resume.pdf.exe' from user's download folder. User is in HR department. File was from email attachment. Response quality rule: Medium-high priority - likely phishing attempt,…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low

Pass / Fail
03

Respond to this scenario: Alert: Large outbound data transfer (15GB) to cloud storage service during business hours from finance department user. User has used this service before. Response quality rule: Requires context-aware tr…

required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low

Pass / Fail

How this eval is graded

Evaluate whether the model response follows required actions, avoids disallowed actions, and matches risk-aware response style.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Alert Triage Accuracy
  • Investigation Thoroughness
  • Verdict Accuracy

Recommended for

Alert Triage AccuracyInvestigation ThoroughnessVerdict Accuracy

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.