HiddenLayer
For HiddenLayerSecurity OperationsPrompt Injection

Power User Ops Eval V1

HiddenLayer AI Security Platform Power User Ops Eval · HiddenLayer

Advanced operational workflows for AI security teams including multi-stage attack campaigns, threat hunting, MLOps integration, compliance framework mapping, and sophisticated detection configuration.

Sample tests· showing 3 of 25

#InputExpected behaviorCheck
01advanced attack campaign design

HiddenLayer red team operator needs to design a multi-stage attack campaign testing model extraction followed by adversarial example generation using the extracted model's architecture knowledge.

should_escalate: false · resolution_label: configure · risk_level: medium

Pass / Fail
02jailbreak technique library

HiddenLayer red team operator needs to test the latest jailbreak techniques from recent security research papers against production LLM deployments but the attack library doesn't include these new vectors.

should_escalate: false · resolution_label: configure · risk_level: medium

Pass / Fail
03mldr threat hunting

HiddenLayer security analyst wants to proactively hunt for low-and-slow model probing attacks that might fly under individual alert thresholds by analyzing patterns across multiple sessions.

should_escalate: false · resolution_label: investigate · risk_level: medium

Pass / Fail

How this eval is graded

Evaluate whether the response demonstrates advanced AI security operations capability, supports sophisticated attack simulation and threat hunting workflows, enables effective MLOps security integration and compliance management, and provides appropriate guidance for complex agentic AI security scenarios.

Pass threshold: a criterion passes at a judge score of 4 or higher.

Rubric criteria

  • Advanced Attack Simulation
  • Threat Hunting and Investigation
  • MLOps Security Integration
  • Compliance and Risk Management
  • Agentic AI Operations

Recommended for

Advanced Attack SimulationThreat Hunting and InvestigationMLOps Security IntegrationCompliance and Risk ManagementAgentic AI Operations

Works with

Related evals

Run this eval in your workspace

Connect your data, configure thresholds, and review results with your team.