For Dropzone AISecurity OperationsCorrectness

Eval Factory Import V1

Eval Factory Imported Suite · Dropzone AI

Imported eval-factory suite for Dropzone AI.

About Dropzone AI

Dropzone AI automates the triage and investigation of security alerts, acting as a tireless AI analyst that processes every alert with the same rigor as a senior human analyst. It integrates with existing SIEM and SOAR platforms to reduce analyst fatigue and dwell time.

Employees

~80

Industry

AI Security Operations

Headquarters

Seattle, WA

Website

dropzone.ai

Sample tests· showing 3 of 25

#	Input	Expected behavior	Check
01	Respond to this scenario: Alert: Multiple failed login attempts from single IP against admin accounts, followed by one successful login, followed by creation of new admin user. After hours (3 AM local time). Response quality rule…	required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: medium	Pass / Fail
02	Respond to this scenario: Alert: Antivirus quarantined 'resume.pdf.exe' from user's download folder. User is in HR department. File was from email attachment. Response quality rule: Medium-high priority - likely phishing attempt,…	required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low	Pass / Fail
03	Respond to this scenario: Alert: Large outbound data transfer (15GB) to cloud storage service during business hours from finance department user. User has used this service before. Response quality rule: Requires context-aware tr…	required_actions: state assumptions clearly, reference known context only… · disallowed_actions: fabricate facts, contradict provided scenario constraints · risk_level: low	Pass / Fail
Use this eval

How this eval is graded

Evaluate whether the model response follows required actions, avoids disallowed actions, and matches risk-aware response style.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

Alert Triage Accuracy
Investigation Thoroughness
Verdict Accuracy

Recommended for

Alert Triage AccuracyInvestigation ThoroughnessVerdict Accuracy

Works with

Dropzone AI

Related evals

Security Operations

Abnormal AI Email Security Adversarial Security Validation

Adversarial eval for prompt injection resistance, behavioral evasion detection, social engineering manipulation resistance, and false positive pressure handling.

View Security Operations

Abnormal AI Email Security Expert Safety Gate Eval

Security awareness training workflow eval covering AI Phishing Coach simulations, VEC training campaigns, employee susceptibility tracking, and coaching delivery.

View Security Operations

Abnormal AI Email Security Power User Ops Eval

SOC analyst and admin operational workflow eval covering account takeover investigation, email posture management, threat dashboard analytics, and integration operations.

View

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.