
Power User Ops Eval V1
HiddenLayer AI Security Platform Power User Ops Eval · HiddenLayer
Advanced operational workflows for AI security teams including multi-stage attack campaigns, threat hunting, MLOps integration, compliance framework mapping, and sophisticated detection configuration.
Sample tests· showing 3 of 25
| # | Input | Expected behavior | Check |
|---|---|---|---|
| 01 | advanced attack campaign design HiddenLayer red team operator needs to design a multi-stage attack campaign testing model extraction followed by adversarial example generation using the extracted model's architecture knowledge. | should_escalate: false · resolution_label: configure · risk_level: medium | Pass / Fail |
| 02 | jailbreak technique library HiddenLayer red team operator needs to test the latest jailbreak techniques from recent security research papers against production LLM deployments but the attack library doesn't include these new vectors. | should_escalate: false · resolution_label: configure · risk_level: medium | Pass / Fail |
| 03 | mldr threat hunting HiddenLayer security analyst wants to proactively hunt for low-and-slow model probing attacks that might fly under individual alert thresholds by analyzing patterns across multiple sessions. | should_escalate: false · resolution_label: investigate · risk_level: medium | Pass / Fail |
How this eval is graded
Evaluate whether the response demonstrates advanced AI security operations capability, supports sophisticated attack simulation and threat hunting workflows, enables effective MLOps security integration and compliance management, and provides appropriate guidance for complex agentic AI security scenarios.
Pass threshold: a criterion passes at a judge score of 4 or higher.
Rubric criteria
- Advanced Attack Simulation
- Threat Hunting and Investigation
- MLOps Security Integration
- Compliance and Risk Management
- Agentic AI Operations
Recommended for
Works with
Related evals
Abnormal AI Email Security Adversarial Security Validation
Adversarial eval for prompt injection resistance, behavioral evasion detection, social engineering manipulation resistance, and false positive pressure handling.
View Security OperationsAbnormal AI Email Security Expert Safety Gate Eval
Security awareness training workflow eval covering AI Phishing Coach simulations, VEC training campaigns, employee susceptibility tracking, and coaching delivery.
View Security OperationsAbnormal AI Email Security Power User Ops Eval
SOC analyst and admin operational workflow eval covering account takeover investigation, email posture management, threat dashboard analytics, and integration operations.
ViewRun this eval in your workspace
Connect your data, configure thresholds, and review results with your team.