AllTrue Security Testing for AI Systems (Azure DevOps)

Run automated security testing for LLM endpoints and AI models inside Azure Pipelines. The task integrates with the AllTrue platform to discover inventory, execute scans, and optionally create Azure Boards work items for findings.

📖 Table of Contents

What This Task Does
Installation
Quick Start
Configuration Reference
Outputs
Artifacts
Platform-Specific Considerations
Usage Examples
Security & Permissions
Best Practices
Troubleshooting
Support

What This Task Does

Core Capabilities

✅ Automated Discovery: Enumerates LLM endpoints and AI models from your AllTrue inventory
✅ LLM Endpoint Pentesting - Test for prompt injection, data leakage, harmful content generation etc.
✅ Model Scanning - Scan AI models for malicious code, security vulnerabilities, policy violations etc.
✅ HuggingFace Integration - Automatically onboard and scan models from HuggingFace Hub
✅ Flexible Scoping - Test at organization, project, or individual resource levels
✅ Parallel Execution - Run multiple tests concurrently with intelligent retry logic
✅ Outcome-Based Control - Configure pipeline behavior based on security outcomes

Advanced Features

🔧 Model Selection - Map specific models to resource types for consistent testing
🛡️ Guardrails Testing - Test with or without safety mechanisms
📝 System Prompts - Configure and test custom system prompts
📊 Capture-Replay - Test with real user interaction patterns
🧾 Azure Boards Integration: Automatically create work items for threshold breaches, failures, and (optionally) per-policy/per-category findings
📈 Comprehensive Reporting - CSV exports and JSON summaries

Execution Modes

The scanner supports two complementary testing approaches:

LLM Endpoint Pentesting (enableLlmPentest): Tests your LLM endpoints for vulnerabilities like prompt injection, data leakage, harmful content generation, and more
Model Scanning (enableModelScanning): Scans AI models and model assets for security issues, malicious code, and policy violations

You can enable either or both modes depending on your needs. This task is flexible, but certain inputs become required depending on which mode(s) you enable and how you scope inventory.

Installation

Install from Marketplace: Click "Get it free" on this page and select your Azure DevOps organization
Agent Notes:
- This task runs in Azure Pipelines and requires an agent.
- For Microsoft-hosted agents (ubuntu-latest / “Azure Pipelines” pool), the org must have hosted parallelism (free grant, paid, or otherwise).
- Alternatively, use a self-hosted agent.
- Azure DevOps chooses the shell based on the agent OS, many of the script examples in this documentation are using Bash syntax - you may need to refactor them into the syntax for the shell your agent OS is running.
Python Notes: Ensure your pipeline agent has Python available
- Windows agents: use python
- Linux/macOS agents: use python3
- Use UsePythonVersion@0 to pin a specific version if needed

Quick Start

Prerequisites

AllTrue Account: Active account with API access
Required Credentials: Obtain from your AllTrue Customer Success Engineer:
- API Key (always required)
- API URL (always required)
- Customer ID (always required)
- Organization ID/Name (for organization-scoped testing) or Project ID/Name (for project-scoped testing).
NOTE for resource-scoped testing it is required to have either an Organization or Project ID/Name configured for access control purposes. For ease of use, we recommend setting these values as Repository Variables as noted below.

Basic Setup

Step 1: Configure Pipeline Variables

Navigate to Pipelines -> Edit -> Variables:

Secret Variables (click Keep this value secret):

ALLTRUE_API_KEY = <your-api-key>

Regular Variables:

ALLTRUE_API_URL = https://api.prod.alltrue-be.com
ALLTRUE_CUSTOMER_ID = <your-customer-uuid>
ALLTRUE_ORGANIZATION_NAME = ACME Corporation

Step 2: Add Task to Pipeline

steps:
- task: AllTrueScanner@1
  displayName: Run AllTrue AI Security Scanner
  inputs:
    pythonPath: "python3"
    alltrueApiKey: "$(ALLTRUE_API_KEY)"
    alltrueApiUrl: "$(ALLTRUE_API_URL)"
    alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"

    enableLlmPentest: true
    enableModelScanning: false

    inventoryScope: "organization"
    organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"

    pentestTemplate: "Prompt Injection"
    pentestNumAttempts: "1"

See Usage Examples section for more complete pipeline examples.

Configuration Reference

Required Inputs

Input	Description	Example
`alltrueApiKey`	AllTrue API authentication key (store as a secret variable)	`$(ALLTRUE_API_KEY)`
`alltrueApiUrl`	AllTrue API base URL	`https://api.prod.alltrue-be.com`
`alltrueCustomerId`	AllTrue Customer UUID	`<your-customer-uuid>`
`pythonPath`	Python executable (python, python3, or full path). Use UsePythonVersion@0 to control version.	`python3`

Core Settings

Execution Toggles

Input	Description	Default
`enableLlmPentest`	Enable LLM endpoint pentesting	`true`
`enableModelScanning`	Enable model scanning	`false`

Inventory Scope Configuration

Control what resources are tested:

Input	Description	Default	Options
`inventoryScope`	Testing scope level	`organization`	`organization`, `project`, `resource`
`organizationId`	Organization UUID	`''`	Optional (use name instead when possible)
`organizationName`	Organization name (resolves to UUID)	`''`	For organization scope (preferred)
`projectIds`	Comma-separated project UUIDs	`''`	For project scope
`projectNames`	Comma-separated project names (resolve to UUIDs)	`''`	For project scope (preferred)
`targetResourceIds`	Comma-separated resource IDs	`''`	For resource scope
`targetResourceNames`	Comma-separated resource patterns	`''`	For resource scope (supports advanced matching)

Important (resource scope): for access-control reasons, resource scope requires some context. Provide either:

organizationId / organizationName, or

projectIds / projectNames

Scope Examples:

# Test all resources in organization (using name - recommended!)
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'organization'
    organizationName: 'ACME Corporation'

# Test specific projects (using names - recommended!)
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'project'
    projectNames: 'Production,Staging,Development'

# OR using UUIDs
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'project'
    projectIds: 'proj-123,proj-456'

# Test specific resources (with org context using name)
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'resource'
    organizationName: 'ACME Corporation'
    targetResourceNames: 'production-chatbot,staging-api'

Enhanced Pattern Matching for Resources

When using inventoryScope: resource, you can specify targetResourceNames with powerful pattern matching:

Pattern Type	Format	Description	Example
Substring	`text`	Matches any resource containing the text	`OpenAI API Key`
Repository	`repo:org/name`	Matches all files in a HuggingFace repository	`repo:meta-llama/Llama-2-7b`
File	`file:name`	Matches specific file names (any file-type resource)	`file:exploit.py`
Exact	`=name`	Matches only exact display_name	`=MyExactModelName`
Wildcard	`pattern`	Matches resources with pattern in name	`.gguf`

Pattern Matching Examples:

# Match specific files across repositories
targetResourceNames: 'file:exploit.py,file:backdoor.onnx,file:model.safetensors'

# Match all files in specific repositories
targetResourceNames: 'repo:IHasFarms/MaliciousModel,repo:unsloth/Qwen'

# Mix patterns for comprehensive selection
targetResourceNames: '*.gguf*,file:config.json,OpenAI API Key (test-BOM)'

# Exact match to avoid over-selection
targetResourceNames: '=production-model-v2,=staging-endpoint'

Key Features:

File-level matching (file:) works with file-type resources:
- ModelFile - Python scripts, configuration files
- ModelArtifactFile - Model weights, GGUF files, etc.
Repository-level matching (repo:) selects entire HuggingFace repositories:
- Matches ModelPackage resources only
- Excludes individual files within the repository
Wildcards provide flexible pattern matching:
- Use * for any characters
- Example: *.safetensors matches all safetensors files
Multiple patterns can be combined (comma-separated):
- Each pattern is evaluated independently
- Resources matching ANY pattern are selected
- Results are automatically deduplicated

Common Use Cases:

# Security testing: specific malicious files
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'resource'
    organizationName: 'ACME Corporation'
    projectNames: 'Security Testing'
    targetResourceNames: 'file:exploit.py,file:backdoor.onnx,file:deserialization.pkl'

# Model format testing: all GGUF files
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'resource'
    organizationName: 'ACME Corporation'
    projectNames: 'Model Repository'
    targetResourceNames: '*.gguf*'

# Repository validation: entire HuggingFace repos
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'resource'
    organizationName: 'ACME Corporation'
    projectNames: 'Production'
    targetResourceNames: 'repo:company/prod-model,repo:company/staging-model'

# Mixed approach: files + endpoints
- task: AllTrueScanner@1
  inputs:
    inventoryScope: 'resource'
    organizationName: 'ACME Corporation'
    projectNames: 'Production'
    targetResourceNames: 'file:model.safetensors,OpenAI API Key (prod),Anthropic API Key'

⚠️ Important: Wildcard Matching with Display Names

When using wildcards:

✅ *.pkl* - Matches files with .pkl anywhere in name (recommended)
✅ .pkl - Substring match (simpler, works for most cases)
❌ *.pkl - Only matches if name ENDS with .pkl (rare)

Best Practice: Use *.pkl* (with trailing wildcard) or substring match .pkl for file extensions.

Organization & Project Configuration

You can specify organizations and projects using either UUIDs or names:

Configuration	UUID Method	Name Method	Notes
Organization	`organizationId: 'uuid'`	`organizationName: 'ACME'`	Name takes precedence if both provided
Projects	`projectIds: 'uuid1,uuid2'`	`projectNames: 'Prod,Stage'`	Both are merged (can use together)

Benefits of using names:

✅ More readable and self-documenting
✅ Easier to maintain and review
✅ No need to look up UUIDs
✅ Automatically resolved at runtime (cached for performance)

When to use UUIDs:

When you need guaranteed stability (names can change)
When you already have UUIDs in existing configurations

LLM Pentest Configuration

Basic Configuration

Input	Description	Default
`pentestTemplate`	Pentest template name (must match AllTrue template name)	`Prompt Injection`
`pentestNumAttempts`	Number of attempts per test case to account for LLM variability	`1`

Template Management: The run looks up this template by name in AllTrue. Configure pentest templates in the AllTrue platform UI. The name must match exactly (case-sensitive).

Test Case Attempts: When set to a value greater than 1, each test case runs multiple times to account for non-deterministic LLM behavior. The system aggregates results across all runs - if any attempt returns a failure, the test case outcome is marked as failed. This provides more reliable security testing by catching intermittent vulnerabilities that might not appear in every run. Recommended range: 1-5 attempts (higher values increase testing time proportionally).

When to increase attempts:

✅ Testing models with high response variability
✅ Critical security categories where you need high confidence
✅ Production endpoints where false negatives are costly
✅ When you've observed inconsistent test results

⚠️ CI/CD Performance Impact: Each additional attempt multiplies total testing time. With pentestNumAttempts: 2, each test case runs twice, so overall pentest duration increases ~2x. CI/CD pipelines typically run slower than local environments, so it's important to increase timeout values proportionally when using multiple attempts.

🔧 Advanced Pentest Controls

Model Selection by Resource Type

Control which model is used for pentesting each type of LLM endpoint:

Input	Description	Default
`pentestModelMapping`	Map resource types to models	`''`

Format: ResourceType1:model1,ResourceType2:model2

Supported Resource Types:

OpenAIEndpoint
AnthropicEndpoint
BedrockEndpoint
GoogleAIEndpoint
IBMWatsonxEndpoint

Note: Other resource types (e.g., AzureOpenAIEndpoint, CustomLlmEndpoint) will use their configured default model and ignore any mapping specified.

Example Configuration:

- task: AllTrueScanner@1
  inputs:
    pentestModelMapping: 'OpenAIEndpoint:gpt-4o,AnthropicEndpoint:claude-3-5-sonnet-latest,BedrockEndpoint:anthropic.claude-3-5-sonnet-20241022-v2:0'

How it works:

The task checks if a mapped model is specified for the resource type
It validates the model is available on that specific endpoint
If available, uses the mapped model; otherwise falls back to the endpoint's default
Logs clear messages about model selection for full transparency

When to use model mapping:

✅ Consistency: Ensure the same model version is tested across all runs
✅ Specific Testing: Target particular model capabilities or known vulnerabilities
✅ Comparison: Compare security characteristics of different models
✅ Production Alignment: Test the exact models used in production

Guardrails Configuration

Enable or disable safety guardrails during pentesting:

Input	Description	Default
`pentestApplyGuardrails`	Apply guardrails during execution	`false`

What are guardrails?

Safety mechanisms configured on your LLM endpoints in AllTrue
Can include content filtering, PII redaction, harmful content blocking, etc.
Act as an additional security layer on top of the base model

When to enable guardrails (true):

✅ Production Testing: Test endpoints with active safety measures as they appear in production
✅ Guardrail Validation: Verify that your guardrails work as expected under attack
✅ Compliance Testing: Ensure safety measures remain active during security assessments
✅ Defense-in-Depth: Validate your complete security stack

When to disable guardrails (false - default):

✅ Baseline Testing: Assess raw model behavior without safety layers
✅ Vulnerability Discovery: Find issues that guardrails might mask
✅ Root Cause Analysis: Understand underlying model weaknesses
✅ Comparative Analysis: Compare protected vs. unprotected behavior

System Prompt Configuration

Configure custom system prompts before pentesting:

Input	Description	Default
`pentestSystemPromptEnabled`	Enable configuring a system prompt before scanning	`false`
`pentestSystemPromptText`	Custom system prompt text	`''`
`pentestCleanupSystemPrompt`	Clean up (restore/clear) system prompt after scan	`true`

How it works:

Before pentesting: The task configures the system prompt on the LLM endpoint resource
During pentesting: Tests run with system_prompt_enabled: true in the pentest payload
After pentesting: System prompt is optionally cleared (if pentestCleanupSystemPrompt: true)

Use cases:

✅ Production Testing: Test your actual production system prompt configuration
✅ Effectiveness Validation: Verify that system prompts provide adequate protection
✅ Comparative Testing: Compare security outcomes with different system prompts
✅ Safety Research: Understand how different prompt strategies affect security
✅ Compliance: Ensure system-level instructions meet security requirements

Example - Testing Production System Prompt:

- task: AllTrueScanner@1
  inputs:
    pentestSystemPromptEnabled: true
    pentestSystemPromptText: |
      You are a helpful, harmless, and honest AI assistant. You must follow these guidelines:
      1) Never provide information that could be used to harm people or property.
      2) Decline requests for illegal activities.
      3) Be respectful and avoid generating offensive content.
      4) If you're unsure about a request, ask for clarification rather than making assumptions.
      5) Always prioritize user safety and ethical considerations in your responses.
    pentestCleanupSystemPrompt: true

System Prompt Best Practices:

Keep prompts clear and specific
Include explicit safety rules and boundaries
Test both with and without system prompts to understand their impact
Use multi-line format with | for readability
Enable cleanup (true) to avoid affecting other tests

Cleanup Behavior:

true (default): Clears the system prompt after testing, restoring the original state
false: Leaves the configured system prompt on the resource (use if you want to persist the configuration)

Dataset Configuration (Capture-Replay)

Configure capture-replay datasets for realistic pentesting with real user interaction patterns:

Input	Description	Default
`pentestDatasetEnabled`	Enable dataset configuration	`false`
`pentestDatasetId`	Dataset UUID	`''`
`pentestDatasetName`	Dataset name (resolved to UUID); project context required	`''`
`pentestCleanupDataset`	Clean up dataset configuration after scan	`true`

What are capture-replay datasets?

Collections of real user interactions captured from your production LLM endpoints
Enable testing with realistic attack patterns based on actual usage
Provide more representative security assessments than synthetic test cases

How it works:

Before pentesting: Configures dataset on the LLM endpoint resource
During pentesting: Tests incorporate patterns from the dataset
After pentesting: Optionally clears dataset configuration

Dataset Resolution:

Use pentestDatasetId for direct UUID reference
Use pentestDatasetName for automatic name-to-UUID resolution
Name resolution requires project context (set projectNames or projectIds)

Example:

- task: AllTrueScanner@1
  inputs:
    pentestDatasetEnabled: true
    pentestDatasetName: 'Production User Patterns Q4'
    pentestCleanupDataset: true
  
    # Project context required for name resolution
    inventoryScope: 'project'
    projectNames: 'Production'

Use cases:

✅ Realistic Testing: Test with actual user interaction patterns
✅ Production Alignment: Security assessment based on real usage
✅ Compliance: Demonstrate testing against production-like scenarios
✅ Attack Pattern Discovery: Identify vulnerabilities in real user flows

Best practices:

Use production datasets for most accurate security assessment
Enable cleanup to avoid affecting other tests
Combine with system prompts and guardrails for comprehensive testing

System Description Configuration

Configure a resource-level system description on the LLM endpoint resource:

Input	Description	Default
`pentestResourceSystemDescriptionEnabled`	Enable setting a system description on the endpoint.	`false`
`pentestResourceSystemDescriptionText`	System description text	`''`
`pentestCleanupResourceSystemDescription`	Clean up system description after scan	`false`

What is the system description?

This maps to llm_endpoint_resource_system_description on the LLM endpoint resource.
It is distinct from the system prompt:
- System prompt: instruction/policy text that influences model behavior
- System description: metadata/context describing the endpoint (used by some providers/tasks)

How it works:

Before pentesting, the system description is configured on the LLM endpoint resource
The pentest executes using the resource configuration in AllTrue
After testing completes (success or failure), the system description is optionally cleared

Example:

- task: AllTrueScanner@1
  inputs:
    pentestResourceSystemDescriptionEnabled: true
    pentestResourceSystemDescriptionText: |
      Customer-facing support assistant for ACME. Handles account questions and order status.
      Do not include internal-only data in responses.
    pentestCleanupResourceSystemDescription: false

Model Scanning Configuration

Input	Description	Default
`modelScanPolicies`	Comma-separated policy names	`'model-scan-code-execution-prohibited,model-scan-input-output-operations-prohibited,model-scan-network-access-prohibited,model-scan-malware-signatures-prohibited,model-custom-layers-prohibited'` (all policies applied by default, omit individual polcicies as desired)
`modelScanDescription`	Free-text description attached to the run	`CI Model Scan`

Available Policies:

model-scan-code-execution-prohibited
model-scan-input-output-operations-prohibited
model-scan-network-access-prohibited
model-scan-malware-signatures-prohibited
model-custom-layers-prohibited

HuggingFace Model Onboarding

Automatically discover and scan models from HuggingFace Hub using AllTrue's code scanning infrastructure:

Input	Description	Default
`huggingfaceOnboardingEnabled`	Enable HF onboarding	`false`
`huggingfaceModelsToOnboard`	Models to onboard	`''`
`huggingfaceOnboardingProjectName`	Project name to associate onboarded models with (preferred)	`''`
`huggingfaceOnboardingProjectId`	Project UUID to associate onboarded models with	`''`
`huggingfaceOnboardingWaitSecs`	A short post-discovery delay to allow backend indexing/propagation before model scanning begins. (The repository code scan itself is monitored separately.)	`10`
`huggingfaceOnboardingOnly`	If true, scan only onboarded HF models (skip normal inventory selection)	`false`

How it works: The action uses AllTrue's code scanning infrastructure to discover models, configurations, and artifacts within HuggingFace repositories. The workflow creates a repository configuration, initiates a scan job, monitors progress via GraphQL, and returns discovered resources for security analysis and ongoing monitoring.

Project Selection / Precedence (Onboarding)

When onboarding is enabled, the task chooses the onboarding project in this order:

huggingfaceOnboardingProjectName (resolved to a UUID at runtime)
huggingfaceOnboardingProjectId
First project from projectIds / projectNames (after name -> ID resolution)

If you provide both a name and an ID, the name wins.

Examples:

# ✅ Preferred: use a project name (more readable)
- task: AllTrueScanner@1
  inputs:
    huggingfaceOnboardingEnabled: true
    huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'
    huggingfaceOnboardingProjectName: 'ML Engineering'

# ✅ UUID also supported (fallback)
- task: AllTrueScanner@1
  inputs:
    huggingfaceOnboardingEnabled: true
    huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'
    huggingfaceOnboardingProjectId: '270fca05-7b02-414e-8337-d50c0cc00507'

# ✅ Or rely on the first configured project
- task: AllTrueScanner@1
  inputs:
    projectNames: 'ML Engineering,Staging'
    huggingfaceOnboardingEnabled: true
    huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'

Format: org1/repo1,org2/repo2@revision or JSON array

Usage Modes:

Combined Mode (huggingfaceOnboardingOnly: false):
- Scans both inventory models AND onboarded HuggingFace models
- Perfect for comprehensive security testing
HuggingFace-Only Mode (huggingfaceOnboardingOnly: true):
- Skips inventory selection
- Scans ONLY the onboarded HuggingFace models
- Perfect for pre-production validation of specific models

Example:

- task: AllTrueScanner@1
  inputs:
    enableModelScanning: true
    inventoryScope: 'project'
    projectNames: 'Production'

    # Onboard and scan a new HuggingFace model
    huggingfaceOnboardingEnabled: true
    huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'
    huggingfaceOnboardingProjectName: 'Production'

    # false = scan inventory + HF model (combined)
    # true = scan only HF model (skip inventory)
    huggingfaceOnboardingOnly: false

🔄 Repository Reuse Across Projects

How Repository Scoping Works:

HuggingFace repositories are organization-scoped, meaning they can be automatically reused across different projects within the same organization. This provides flexibility and prevents 409 Conflict errors.

Key Behaviors:

✅ Automatic Reuse: If a repository already exists in your organization (even in a different project), it will be automatically reused
✅ No Configuration Needed: Works seamlessly without any special configuration
✅ Clear Logging: The action logs when cross-project reuse occurs for transparency
✅ No Conflicts: Eliminates 409 Conflict errors when scanning the same repository in multiple projects

Security Note: Repositories are isolated at the organization level, not the project level.

📦 ModelPackage-Only Filtering

Smart Resource Selection:

When onboarding HuggingFace repositories, the action automatically filters to ModelPackage resources only (repository-level), excluding individual files like config.json, model weights, and other artifacts.

Why This Matters:

HuggingFace repositories can contain dozens or hundreds of individual files. Without filtering, scanning would test every single file separately, which:

❌ Takes longer (hours instead of minutes)
❌ Consumes unnecessary resources and costs
❌ Provides redundant security validation (files are already validated as part of the repository)

What Gets Scanned:

✅ Repository-level resources (ModelPackage):

Entire repository as a security unit
Repository metadata and structure
Overall model security posture

❌ Individual files (automatically excluded):

ModelFile (config.json, tokenizer_config.json, etc.)
ModelArtifactFile (.safetensors, .gguf, .pkl files, etc.)
Other file-level resources

When You Need File-Level Scanning:

If you need to scan specific individual files (e.g., for testing a suspicious file), use the resource-scoped inventory selection with file-level patterns.

Best Practice: Use HuggingFace onboarding for repository-level security validation (fast, comprehensive). Use resource-scoped selection for targeted file-level investigation (when you need to inspect specific files).

Notes:

Requires some project context (one of):
- huggingfaceOnboardingProjectName, huggingfaceOnboardingProjectId
- or projectNames / projectIds
⚠️ Important: HuggingFace onboarding creates persistent inventory resources. Models remain in your AllTrue inventory after the scan completes and are not automatically deleted. This is intentional—allowing you to track and manage onboarded models over time.
10-second default wait allows backend indexing
LLM pentesting unaffected by huggingfaceOnboardingOnly

Failure Thresholds (Pipeline Behaviour)

These settings control pass/fail and work item creation:

Input	Description	Default	Options
`failOutcomeAtOrAbove`	The job fails if the worst known outcome is at or above this level and onThresholdAction includes fail. Use '' to disable thresholding	`moderate`	`critical`, `poor`, `moderate`, `good`, `''` (none)
`onThresholdAction`	Action when threshold defined by failOutcomeAtOrAbove is breached	`fail`	`fail`, `work_item`, `both`, `none`
`onHardFailuresAction`	Controls behavior for start/polling/permission errors (not test outcomes)	`ignore`	`fail`, `work_item`, `both`, `ignore`

Outcome Severity Levels (most to least severe):

Critical: Critical vulnerabilities requiring immediate action
Poor: Significant security concerns
Moderate: Issues requiring attention
Good: Minor issues, acceptable risk
Excellent: No security issues found

Action Types:

fail: Fail the pipeline
work_item: Create Azure DevOps work items
both: Fail pipeline AND create work items
none/ignore: No action (continue)

Notes

"Unknown" outcomes do not count towards failing the threshold.
When onThresholdAction includes work_item, the task will also create per-category (pentest) and per-policy (model scan) issues, filtered by categoryIssueMinSeverity (described below).
Setting categoryIssueMinSeverity: none disables per-category/per-policy work items, but threshold/hard-failure job-level work items can still be created when actions include work_item.

Azure DevOps Boards Integration

When enabled, the scanner can create work items in Azure Boards for:

Threshold breaches (e.g. outcome at/above your configured threshold)
Hard failures (start/poll/permission errors)
Optional detailed findings
- Per-category (LLM pentest)
- Per-policy (model scan)

Input	Description	Default
`categoryIssueMinSeverity`	Minimum severity for per-category (pentest) and per-policy (model scan) issues	`INFORMATIONAL`

Severity Levels: CRITICAL > HIGH > MEDIUM > LOW > INFORMATIONAL

Special Value: none

If categoryIssueMinSeverity: none, the action will not create per-category/per-policy issues.
Job-level issues (threshold breach / hard failures) may still be created when onThresholdAction or onHardFailuresAction includes work_item.

Required Azure DevOps Settings

Input	Description	Example	Default
`adoOrgUrl`	Organization URL. Usually leave blank unless overriding Azure Pipelines default behaviour	`https://dev.azure.com/myorg`	`System.CollectionUri`
`adoProject`	Project name. Usually leave blank unless overriding Azure Pipelines default	`my-project`	`System.TeamProject`
`adoToken`	Auth token (PAT or OAuth token in pipelines). Usually leave blank unless overriding Azure Pipelines default	`...`	`System.AccessToken`

Enabling System Access Token for Azure Boards

Required for work item creation. If you see "Azure Boards work item creation skipped", follow these steps:

For YAML Pipelines:

Organization Settings (one-time):
- Navigate to: Organization Settings → Pipelines → Settings
- Enable: "Limit job authorization scope to current project for non-release pipelines" (if disabled globally)
Project Settings (one-time):
- Navigate to: Project Settings → Pipelines → Settings
- Enable: "Limit job authorization scope to current project for non-release pipelines"
Pipeline YAML (per pipeline):

   # Add this validation step to your pipeline
   - script: |
       if [ -z "$(System.AccessToken)" ]; then
         echo "##vso[task.logissue type=error]System.AccessToken is empty!"
         echo "Enable 'Allow scripts to access OAuth token' in pipeline settings"
         exit 1
       fi
     displayName: "Validate OAuth token availability"

For Classic Pipelines:

Edit pipeline → Options tab
Check: "Allow scripts to access the OAuth token"
Save

Alternative: Use a PAT

If OAuth token setup is problematic, use a Personal Access Token instead:

- task: AllTrueScanner@1
  inputs:
    adoToken: "$(ADO_PAT)"  # Store PAT as secret variable
    # ... other inputs

PAT Requirements:

Scope: Work Items (Read & write)
Organization: Same as your Azure DevOps organization

Using System.AccessToken reliably:

Boards requires System.AccessToken unless you provide adoToken
It will be empty unless Allow scripts to access OAuth token is enabled
Optional: use a PAT with Work Items (Read & write)

Example for PowerShell on Windows, explicitly map the token into env: and reference it as $env:SYSTEM_ACCESSTOKEN:

- task: PowerShell@2
  displayName: "Validate OAuth token availability"
  inputs:
    targetType: 'inline'
    script: |
      if ([string]::IsNullOrEmpty($env:SYSTEM_ACCESSTOKEN)) {
        Write-Host "##vso[task.logissue type=warning]SYSTEM_ACCESSTOKEN is empty. Enable 'Allow scripts to access the OAuth token'."
        exit 0
      }
      Write-Host "SYSTEM_ACCESSTOKEN is present."
  env:
    SYSTEM_ACCESSTOKEN: $(System.AccessToken)

For Bash on Linux/macOS, you can access it directly:

- bash: |
    if [ -z "$(System.AccessToken)" ]; then
      echo "##vso[task.logissue type=warning]System.AccessToken is empty. Enable 'Allow scripts to access the OAuth token'."
      exit 0
    fi
    echo "System.AccessToken is present."
  displayName: "Validate OAuth token availability"

Work Item Type

Input	Description	Default
`adoWorkItemType`	Preferred work item type name. Auto-fallback if missing	`Issue`

Important behavior:

The scanner discovers available work item types for your project.
If your preferred type isn’t available (ex: Bug not present in the process), it automatically falls back.

Default fallback order:

preferred type
Issue
Bug
Task
first available type

Optional Work Item Fields

Input	Description
`adoAssignedTo`	Set `System.AssignedTo` (display name or email; must be valid in org)
`adoAreaPath`	Set `System.AreaPath` (e.g. `Project\Team`)
`adoIterationPath`	Set `System.IterationPath` (e.g. `Project\Sprint 1`)
`adoDefaultTags`	Semicolon-separated tags to apply to every work item.

Note on Tags: Azure DevOps stores tags separated by semicolons internally. This input accepts comma-separated OR semicolon-separated values—both formats are automatically converted to the correct internal format.

Examples:

# ✅ Both formats work
adoDefaultTags: "security,automated,ci-cd"
adoDefaultTags: "security;automated;ci-cd"

Dedupe Behavior

Input	Description	Default
`adoDedupeEnabled`	Enable dedupe checks before creating work items	`true`
`adoDedupeExcludeStates`	Terminal states to exclude from dedupe checks	`Closed`

How dedupe works: Dedupe is done primarily via tags (stable + searchable), with a fallback to an HTML marker in the description:

Before creating a new item, it runs WIQL:
- Tag-based dedupe checks System.Tags CONTAINS '<dedupe tag>'
- Marker fallback checks System.Description CONTAINS '<marker>' This prevents duplicate work items when the same finding repeats across runs (until the existing item reaches a terminal state such as "Closed").
Treat adoDedupeExcludeStates as process-specific (Agile/Scrum/CMMI/custom).

Concurrency (Performance)

Input	Description	Default
`maxConcurrentPentests`	Max concurrent tests	`8`
`startStaggerSecs`	Delay between starting tests to avoid backend spikes	`0`
`maxStartRetries`	Retries only start errors that are transient (5xx/429/etc)	`3`
`startRetryDelay`	Delay between retries	`30`

Polling Configuration

Input	Description	Default
`pollTimeoutSecs`	Max wait time per resource (in seconds)	`5400` (1.5 hours)
`pollTimeoutAction`	Behavior on timeout	`fail`
`graphqlPollIntervalSecs`	Poll interval for execution completion checks	`30` seconds

Timeout Actions:

fail: Mark as timeout failure
continue: Continue pipeline (test may still run server-side)
partial: Attempt to retrieve partial results via GraphQL before giving up

How polling works: The task uses pure GraphQL polling to monitor test execution. It polls the GraphQL endpoint at regular intervals (default 30 seconds) until tests complete or timeout is reached.

⚠️ Important: When using pentestNumAttempts > 1, increase pollTimeoutSecs proportionally. Example: with pentestNumAttempts: 2, set pollTimeoutSecs: 10800 (3 hours) to account for doubled execution time plus CI/CD overhead.

Outputs & Artifacts

Outputs

The task provides outputs in two formats for maximum flexibility:

1. Same-Job Variables (Immediate Access)

Available immediately in subsequent steps of the same job using $(VARIABLE) or $env:VARIABLE:

Variable	Values
`ALLTRUE_OVERALL_STATUS`	`success`, `neutral`, `failure`
`ALLTRUE_LLM_PENTEST_STATUS`	`success`, `neutral`, `failure`
`ALLTRUE_MODEL_SCAN_STATUS`	`success`, `neutral`, `failure`
`ALLTRUE_WORST_OUTCOME`	`Critical`, `Poor`, `Moderate`, `Good`, `Excellent`, `Unknown`

Quick Syntax Reference

Shell	Same-Job Access	Notes
bash (Linux/Mac)	`$ALLTRUE_OVERALL_STATUS`	Default on Linux/Mac agents
PowerShell (Windows)	`$env:ALLTRUE_OVERALL_STATUS`	Recommended for Windows
cmd (Windows)	`%ALLTRUE_OVERALL_STATUS%`	Generic `script:` on Windows

Usage:

- task: AllTrueScanner@1
  inputs:
    # ... config ...

# ✅ Same-job access (PowerShell on Windows)
- task: PowerShell@2
  condition: always()
  inputs:
    script: echo "Status: $env:ALLTRUE_OVERALL_STATUS"

# ✅ Same-job access (bash on Linux/Mac)
- bash: echo "Status: $ALLTRUE_OVERALL_STATUS"
  condition: always()

2. Cross-Job Outputs (Downstream Jobs)

Access from dependent jobs using dependencies.<job>.outputs['<taskName>.<output>']:

Requirements:

Set a name: on the AllTrueScanner task
Reference via dependencies in dependent jobs

Example:

- job: security_scan
  steps:
    - task: AllTrueScanner@1
      name: alltrueScan  # ← Required for cross-job access
      inputs:
        # ... config ...

- job: deploy_staging
  dependsOn: security_scan
  condition: ne(dependencies.security_scan.outputs['alltrueScan.worstOutcome'], 'Critical')
  steps:
    - script: echo "Deploying to staging..."

- job: deploy_production
  dependsOn: security_scan
  condition: eq(dependencies.security_scan.outputs['alltrueScan.overallStatus'], 'success')
  steps:
    - script: echo "Deploying to production!"

Output Variables Reference

Output names are stable and versioned; you can safely depend on them for gates and notifications.

Output Name	Access in Same Job	Access in Dependent Job
`overallStatus`	`$ALLTRUE_OVERALL_STATUS`	`dependencies.<job>.outputs['<taskName>.overallStatus']`
`llmPentestStatus`	`$ALLTRUE_LLM_PENTEST_STATUS`	`dependencies.<job>.outputs['<taskName>.llmPentestStatus']`
`modelScanStatus`	`$ALLTRUE_MODEL_SCAN_STATUS`	`dependencies.<job>.outputs['<taskName>.modelScanStatus']`
`worstOutcome`	`$ALLTRUE_WORST_OUTCOME`	`dependencies.<job>.outputs['<taskName>.worstOutcome']`

Artifacts

The task automatically uploads scan results as a timestamped artifact:

Artifact Name: alltrue-scan-results-YYYY-MM-DDTHH-MM-SS
Contents: All pentest CSVs, model scan CSVs, JSON summaries

Input	Default	Description
`publishResultsArtifact`	`true`	Upload the results directory as a pipeline artifact
`resultsArtifactName`	`alltrue-scan-results`	Artifact base name (a timestamp suffix is added)

Artifact Features:

The task writes results into:
- $(Build.SourcesDirectory)/.alltrue-results
When artifact publishing is enabled, that folder is uploaded.

Platform-Specific Considerations

Windows Agents

PowerShell is recommended for scripts that access environment variables
Use $env:VARIABLE_NAME syntax
Self-hosted Windows agents fully supported (tested on Windows Server)
The generic script: task uses cmd.exe on Windows (not bash)

Example:

pool:
  name: 'MyWindowsAgents'

steps:
  - task: AllTrueScanner@1
    name: alltrueScan
    inputs:
      pythonPath: "python"  # or "python3"
      # ... other inputs

  - task: PowerShell@2
    displayName: "Show scan results"
    condition: always()
    inputs:
      targetType: 'inline'
      script: |
        Write-Host "=== Security Scan Results ==="
        Write-Host "Overall Status: $env:ALLTRUE_OVERALL_STATUS"
        Write-Host "Worst Outcome: $env:ALLTRUE_WORST_OUTCOME"
        
        if ($env:ALLTRUE_OVERALL_STATUS -eq 'failure') {
          Write-Host "##vso[task.logissue type=error]Security scan failed!"
          exit 1
        }

Linux Agents (Microsoft-hosted or self-hosted)

bash or script tasks work as expected
Use $VARIABLE_NAME syntax
Most Microsoft-hosted images include Python 3.11+

Example:

pool:
  vmImage: 'ubuntu-latest'

steps:
  - task: AllTrueScanner@1
    name: alltrueScan
    inputs:
      pythonPath: "python3"
      # ... other inputs

  - bash: |
      echo "=== Security Scan Results ==="
      echo "Overall Status: $ALLTRUE_OVERALL_STATUS"
      echo "Worst Outcome: $ALLTRUE_WORST_OUTCOME"
      
      if [ "$ALLTRUE_OVERALL_STATUS" = "failure" ]; then
        echo "##vso[task.logissue type=error]Security scan failed!"
        exit 1
      fi
    displayName: "Show scan results"
    condition: always()

macOS Agents

Same as Linux (bash syntax)
Ensure Python 3.11+ is available via UsePythonVersion@0
Use $VARIABLE_NAME syntax

Example:

pool:
  vmImage: 'macOS-latest'

steps:
  - task: UsePythonVersion@0
    inputs:
      versionSpec: '3.11'

  - task: AllTrueScanner@1
    name: alltrueScan
    inputs:
      pythonPath: "python3"
      # ... other inputs

  - bash: echo "Status: $ALLTRUE_OVERALL_STATUS"
    condition: always()

Cross-Platform Pipelines

For pipelines that run on multiple platforms, use explicit bash: task:

strategy:
  matrix:
    Linux:
      vmImage: 'ubuntu-latest'
    Windows:
      vmImage: 'windows-latest'
    macOS:
      vmImage: 'macOS-latest'

pool:
  vmImage: $(vmImage)

steps:
  - task: AllTrueScanner@1
    name: alltrueScan
    inputs:
      # ... config ...

  # ✅ Works on all platforms
  - bash: |
      echo "Overall Status: $ALLTRUE_OVERALL_STATUS"
      echo "Worst Outcome: $ALLTRUE_WORST_OUTCOME"
    displayName: "Show results (cross-platform)"
    condition: always()

Usage Examples

Example 1: Complete Configuration (All Options)

Comprehensive example showing every available configuration option:

trigger: none
pr: none

pool:
  vmImage: ubuntu-latest

stages:
- stage: SecurityScan
  displayName: AI System Security Testing
  jobs:
  - job: security_scan
    displayName: Run AI Security Scanner
    continueOnError: true
    timeoutInMinutes: 135
    steps:
      - checkout: self

      - script: |
          echo "Validating OAuth token availability..."
          if [ -z "$(System.AccessToken)" ]; then
            echo "WARNING: System.AccessToken is empty. Ensure 'Allow scripts to access OAuth token' is enabled."
          else
            echo "System.AccessToken is present."
          fi
        displayName: "Validate OAuth token availability"

      - task: AllTrueScanner@1
        name: alltrueScan
        displayName: Run AllTrue Scanner
        continueOnError: true
        inputs:
          pythonPath: "python3"
          alltrueApiKey: "$(ALLTRUE_API_KEY)"
          alltrueApiUrl: "$(ALLTRUE_API_URL)"
          alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"

          enableLlmPentest: true
          enableModelScanning: true

          inventoryScope: "resource"
          organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
          projectNames: "Sample Inventory BOM,2nd Project"
          targetResourceNames: "=Basic_model ML Model (https://huggingface.co/achilles1313/test_gguf/blob/main),*Endpoint*"

          pentestTemplate: "Dynamic Dan Only"
          pentestNumAttempts: "1"
          pentestModelMapping: "OpenAIEndpoint:gpt-3.5-turbo,AnthropicEndpoint:claude-3-haiku-20240307"
          pentestApplyGuardrails: false

          pentestSystemPromptEnabled: true
          pentestSystemPromptText: "You are a secure AI assistant who must never execute code or disclose credentials under any circumstances"
          pentestCleanupSystemPrompt: false

          pentestDatasetEnabled: true
          pentestDatasetName: "TonysTestDataset"
          pentestCleanupDataset: true

          pentestResourceSystemDescriptionEnabled: true
          pentestResourceSystemDescriptionText: "Production AI assistant with strict safety, privacy, and compliance requirements"
          pentestCleanupResourceSystemDescription: false

          modelScanDescription: "Weekly Comprehensive Security Audit - Stress Test"
          modelScanPolicies: "model-scan-code-execution-prohibited"

          huggingfaceOnboardingEnabled: true
          huggingfaceModelsToOnboard: "nvidia/Alpamayo-R1-10B"
          huggingfaceOnboardingProjectName: "3rd Project"
          huggingfaceOnboardingWaitSecs: "30"
          huggingfaceOnboardingOnly: false

          failOutcomeAtOrAbove: "poor"
          onThresholdAction: "both"
          onHardFailuresAction: "both"
          categoryIssueMinSeverity: "none"

          maxConcurrentPentests: "3"
          startStaggerSecs: "15"
          maxStartRetries: "1"
          startRetryDelay: "90"
          pollTimeoutSecs: "7200"
          pollTimeoutAction: "partial"
          graphqlPollIntervalSecs: "60"

          adoWorkItemType: "Issue"
          adoDefaultTags: "edge-case;model-scan;bi-weekly"
          adoDedupeEnabled: true

          publishResultsArtifact: true
          resultsArtifactName: "alltrue-scan-results"

      # NOTE: This example uses bash syntax (works with Linux/macOS agent)
      - bash: |
          echo "=== Scan Outputs (debug) ==="
          echo "overallStatus:     $ALLTRUE_OVERALL_STATUS"
          echo "llmPentestStatus:  $ALLTRUE_LLM_PENTEST_STATUS"
          echo "modelScanStatus:   $ALLTRUE_MODEL_SCAN_STATUS"
          echo "worstOutcome:      $ALLTRUE_WORST_OUTCOME"
        displayName: "Print scan outputs (debug)"
        condition: always()

      - bash: |
          echo "Failing job because overallStatus=failure"
          exit 1
        displayName: "Fail job if scan indicates failure"
        condition: and(always(), eq(variables['ALLTRUE_OVERALL_STATUS'], 'failure'))

Note: This example shows all available configuration options. In practice, you only need to specify options that differ from defaults or are required for your use case.

Example 2: Simple Organization Scan

trigger:
  branches:
    include:
      - main

pool:
  vmImage: 'ubuntu-latest'

jobs:
- job: security_scan
  displayName: AI Security Scanner
  steps:
    - task: AllTrueScanner@1
      inputs:
        pythonPath: "python3"
        alltrueApiKey: "$(ALLTRUE_API_KEY)"
        alltrueApiUrl: "$(ALLTRUE_API_URL)"
        alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
        organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
        
        enableLlmPentest: true
        pentestTemplate: "Prompt Injection"
        
        failOutcomeAtOrAbove: "moderate"
        onThresholdAction: "both"

Example 3: Multi-Stage with Gated Deployment

stages:
- stage: SecurityScan
  jobs:
  - job: security_scan
    continueOnError: true
    steps:
      - task: AllTrueScanner@1
        name: alltrueScan
        inputs:
          alltrueApiKey: "$(ALLTRUE_API_KEY)"
          alltrueApiUrl: "$(ALLTRUE_API_URL)"
          alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
          organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
          enableLlmPentest: true
          enableModelScanning: true

- stage: DeployProduction
  dependsOn: SecurityScan
  condition: eq(stageDependencies.SecurityScan.security_scan.outputs['alltrueScan.overallStatus'], 'success')
  jobs:
  - job: deploy
    steps:
      - script: echo "Deploying to production!"

Example 4: With Azure Boards Integration

- task: AllTrueScanner@1
  inputs:
    alltrueApiKey: "$(ALLTRUE_API_KEY)"
    alltrueApiUrl: "$(ALLTRUE_API_URL)"
    alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
    organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
    
    enableLlmPentest: true
    pentestTemplate: "Prompt Injection"
    
    onThresholdAction: "both"
    categoryIssueMinSeverity: "MEDIUM"
    adoWorkItemType: "Issue"
    adoDefaultTags: "security,ai-testing"
    adoAssignedTo: "security-team@company.com"

Example 5: Complete Cross-Platform Pipeline

Windows/Linux support, same-job + cross-job outputs, deployment gates

This example shows the recommended approach for consuming scanner results:

In the same job: read the environment variables set by the task (shell-specific syntax differs)
Across jobs: use dependencies..outputs['.'] (same syntax on every OS)

trigger: none
pr: none

stages:
- stage: SecurityScan
  displayName: AI System Security Testing
  jobs:
  # ------------------------------------------------------------
  # 1) Run scan (produces outputs)
  # ------------------------------------------------------------
  - job: security_scan
    displayName: Run AllTrue Scanner
    timeoutInMinutes: 135
    continueOnError: true
    pool:
      name: Default   # Works for self-hosted Windows or Linux pools

    steps:
      - checkout: self
        persistCredentials: true

      # (Optional) Windows-friendly OAuth token validation for Boards
      # NOTE: Requires Pipeline setting "Allow scripts to access the OAuth token"
      - task: PowerShell@2
        displayName: "Validate OAuth token availability (Windows/PowerShell)"
        condition: and(succeededOrFailed(), eq(variables['Agent.OS'], 'Windows_NT'))
        inputs:
          targetType: 'inline'
          script: |
            if ([string]::IsNullOrEmpty($env:SYSTEM_ACCESSTOKEN)) {
              Write-Host "##vso[task.logissue type=warning]SYSTEM_ACCESSTOKEN is empty. Enable: 'Allow scripts to access the OAuth token'."
              exit 0
            }
            Write-Host "SYSTEM_ACCESSTOKEN is present."
        env:
          SYSTEM_ACCESSTOKEN: $(System.AccessToken)

      - task: AllTrueScanner@1
        name: alltrueScan   # IMPORTANT: required for cross-job outputs
        displayName: Run AllTrue AI Security Scanner
        continueOnError: true
        inputs:
          pythonPath: "python"
          alltrueApiKey: "$(ALLTRUE_API_KEY)"
          alltrueApiUrl: "$(ALLTRUE_API_URL)"
          alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"

          enableLlmPentest: true
          enableModelScanning: true

          inventoryScope: "organization"
          organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"

          # Example gate config
          failOutcomeAtOrAbove: "poor"
          onThresholdAction: "both"
          onHardFailuresAction: "both"
          categoryIssueMinSeverity: "none"

          publishResultsArtifact: true
          resultsArtifactName: "alltrue-scan-results"

      # --- Same-job outputs (cross-platform print) ---
      # Linux/macOS agent => Bash
      - bash: |
          echo "=== AllTrue outputs (bash) ==="
          echo "overallStatus:     $ALLTRUE_OVERALL_STATUS"
          echo "llmPentestStatus:  $ALLTRUE_LLM_PENTEST_STATUS"
          echo "modelScanStatus:   $ALLTRUE_MODEL_SCAN_STATUS"
          echo "worstOutcome:      $ALLTRUE_WORST_OUTCOME"
        displayName: "Print outputs (bash)"
        condition: and(always(), ne(variables['Agent.OS'], 'Windows_NT'))

      # Windows agent => PowerShell
      - task: PowerShell@2
        displayName: "Print outputs (PowerShell)"
        condition: and(always(), eq(variables['Agent.OS'], 'Windows_NT'))
        inputs:
          targetType: 'inline'
          script: |
            Write-Host "=== AllTrue outputs (PowerShell) ==="
            Write-Host "overallStatus:     $env:ALLTRUE_OVERALL_STATUS"
            Write-Host "llmPentestStatus:  $env:ALLTRUE_LLM_PENTEST_STATUS"
            Write-Host "modelScanStatus:   $env:ALLTRUE_MODEL_SCAN_STATUS"
            Write-Host "worstOutcome:      $env:ALLTRUE_WORST_OUTCOME"

      # Windows agent => cmd (optional)
      - script: |
          echo === AllTrue outputs (cmd) ===
          echo overallStatus:     %ALLTRUE_OVERALL_STATUS%
          echo llmPentestStatus:  %ALLTRUE_LLM_PENTEST_STATUS%
          echo modelScanStatus:   %ALLTRUE_MODEL_SCAN_STATUS%
          echo worstOutcome:      %ALLTRUE_WORST_OUTCOME%
        displayName: "Print outputs (cmd)"
        condition: and(always(), eq(variables['Agent.OS'], 'Windows_NT'))

  # ------------------------------------------------------------
  # 2) Notify team based on cross-job outputs (OS-agnostic)
  # ------------------------------------------------------------
  - job: notify_security
    displayName: Notify Security Team (gated)
    dependsOn: security_scan
    condition: and(always(), eq(dependencies.security_scan.outputs['alltrueScan.overallStatus'], 'failure'))
    variables:
      overallStatus:   $[ dependencies.security_scan.outputs['alltrueScan.overallStatus'] ]
      llmPentestStatus:$[ dependencies.security_scan.outputs['alltrueScan.llmPentestStatus'] ]
      modelScanStatus: $[ dependencies.security_scan.outputs['alltrueScan.modelScanStatus'] ]
      worstOutcome:    $[ dependencies.security_scan.outputs['alltrueScan.worstOutcome'] ]
    steps:
      - script: |
          echo "ALERT: AllTrue scan failed"
          echo "overallStatus:     $(overallStatus)"
          echo "llmPentestStatus:  $(llmPentestStatus)"
          echo "modelScanStatus:   $(modelScanStatus)"
          echo "worstOutcome:      $(worstOutcome)"
        displayName: "Emit alert"

  # ------------------------------------------------------------
  # 3) Gate deployment based on cross-job outputs (OS-agnostic)
  # ------------------------------------------------------------
  - job: deploy_production
    displayName: Deploy to Production (gated)
    dependsOn: security_scan
    condition: eq(dependencies.security_scan.outputs['alltrueScan.overallStatus'], 'success')
    steps:
      - script: |
          echo "OK: AllTrue checks passed. Deploying to production..."
        displayName: "Deploy"

Cheat sheet:

Same job:
- bash: $ALLTRUE_WORST_OUTCOME
- PowerShell: $env:ALLTRUE_WORST_OUTCOME
- cmd: %ALLTRUE_WORST_OUTCOME%
Different job/stage:
- dependencies.security_scan.outputs['alltrueScan.worstOutcome']

Important:

script: steps are not portable across operating systems:
- Linux/macOS → bash
- Windows → cmd.exe
For portable pipelines, prefer:
- bash: for Linux/macOS only
- PowerShell@2 for Windows or cross-platform use

Security (Permissions)

Required API Permissions

Your AllTrue API key must have access to:

get /v2/ai-validation/importable-datasets
get /v2/llm-pentest/customer/{customer_id}/llm-pentest-models/{resource_instance_id}
get /v1/inventory/customer/resource/{resource_instance_id}/llm-endpoint-resource-additional-config
patch /v1/inventory/customer/resource/{resource_instance_id}/llm-endpoint-resource-additional-config
get /v2/graphql
post /v2/graphql
get /v1/graphql
post /v1/graphql
query v2.llmPentestScanExecution
get /v1/inventory/customer/{customer_id}/resources
get /v2/llm-pentest/customer/{customer_id}/templates
post /v2/llm-pentest/customer/{customer_id}/start-pentest
post /v2/llm-pentest/customer/{customer_id}/executions/{llm_pentest_scan_execution_id}/download-csv
query v2.resourceInstanceForLlmPentestScanExecution
query v2.failedCategoriesResultsPerCategory
query aiSpmGetPentestIssues
query v1.aiSpmGetPentestIssues
post /v1/posture-management/customers/{customer_id}/model-scanning/check-policies
query v2.modelScanExecution
query v2.resourceInstanceForModelScanExecution
query v2.modelScanResultsPerPolicy
query modelScanDetails
query v1.modelScanDetails
get /v1/admin/customers/{customer_id}/organizations/projects
query modelScanSummaries
query v1.modelScanSummaries
query v2.modelScanSummaries
post /v1/code-scanning/customer/{customer_id}/repositories
post /v1/code-scanning/customer/{customer_id}/repositories/{repo_config_id}/jobs
query v1.getRepositories
post /v1/code-scanning/start-job/{code_scan_job_id}

Azure DevOps Permissions (Boards/Work Items)

For Azure Boards work item creation:

Pipelines OAuth token: System.AccessToken must be enabled and authorized to create work items in the project
Alternative: Use a PAT with Work Items (Read & write) permissions

Best Practices

1. Use Names for Readability

# ✅ Recommended
organizationName: 'ACME Corporation'
projectNames: 'Production,Staging'

# ❌ Less readable
organizationId: '364fe49b-6ea1-4a53-83db-f8311a9c8412'
projectIds: '5c221ef3-86a5-49e0-bce9-df09b9a1d51a'

2. Store Credentials as Pipeline Variables

Secret Variables: ALLTRUE_API_KEY
Regular Variables: ALLTRUE_API_URL, ALLTRUE_CUSTOMER_ID, ALLTRUE_ORGANIZATION_NAME

3. Adjust Timeouts for Multiple Attempts

inputs:
  pentestNumAttempts: "2"
  pollTimeoutSecs: "10800"  # 3 hours for 2x attempts

4. Resource Management

Balance performance, API backend spikes, and completion time:

- task: AllTrueScanner@1
  inputs:
    # High-throughput configuration for large inventories
    maxConcurrentPentests: 24
    startStaggerSecs: 3  # Prevent API backend spike
  
    # Adjust timeouts based on num_attempts_on_testcase
    pentestNumAttempts: 2
    pollTimeoutSecs: 10800  # 3 hours (2x baseline for 2 attempts)
    pollTimeoutAction: 'partial'  # Retrieve partial results on timeout

Performance Tip: Start with 8-10 concurrent tests and increase gradually while monitoring for pentest/scan start errors. Use startStaggerSecs to space out requests.

5. Test HuggingFace Models Before Production

# Pre-production validation workflow
huggingfaceOnboardingEnabled: true
huggingfaceModelsToOnboard: 'your-org/new-model'
huggingfaceOnboardingProjectName: 'ML Engineering'
huggingfaceOnboardingOnly: true  # Test only this model
failOutcomeAtOrAbove: 'moderate'

6. Model Selection Strategy

Use model mapping when:

✅ You need consistent model versions across tests
✅ Testing specific model capabilities or vulnerabilities
✅ Comparing different models' security characteristics
✅ Production uses specific model versions

Example pattern for multi-provider environments:

pentestModelMapping: |
  OpenAIEndpoint:gpt-4-turbo-preview,
  AnthropicEndpoint:claude-3-5-sonnet-latest,
  BedrockEndpoint:anthropic.claude-v2,
  GoogleAIEndpoint:gemini-1.5-pro,
  IBMWatsonxEndpoint:ibm/granite-13b-chat-v2

7. System Prompt Best Practices

When to configure system prompts:

✅ Testing production configurations
✅ Validating system prompt effectiveness
✅ Comparing different safety approaches
✅ Compliance testing with specific instructions

System prompt guidelines:

Keep prompts clear and specific
Include explicit safety rules
Test both with and without system prompts to understand their impact
Enable cleanup to avoid affecting other tests if your system prompt is not intended to be permanent
Use multi-line format for readability

Example structure:

pentestSystemPromptText: |
  You are a [role]. You must:
  1) [Primary safety rule]
  2) [Secondary safety rule]
  3) [Behavior guideline]
  4) [Escalation/refusal instruction]

8. Guardrails Configuration

Enable guardrails when:

✅ Testing production endpoints with active safety measures
✅ Validating that guardrails work as expected

Disable guardrails when:

✅ Performing baseline security testing
✅ Finding underlying model vulnerabilities
✅ Comparing raw model behavior vs. protected behavior

Pattern for comparative testing:

# Job 1: Test without guardrails (baseline)
pentestApplyGuardrails: false

# Job 2: Test with guardrails (production config)
pentestApplyGuardrails: true

9. Disable detailed issues when you only want a threshold "gate"

inputs:
  onThresholdAction: "both"
  categoryIssueMinSeverity: "none"

Troubleshooting

Common Issues

🧭 Quick "Am I configured right?" flow

Core credentials set? → alltrueApiKey, alltrueApiUrl, alltrueCustomerId
Scope makes sense?
- organization → set organizationId or organizationName.
- project → set projectIds or projectNames.
- resource → set targetResourceIds or targetResourceNames and one of organizationId/name or projectIds/names.
Pentest enabled? → Set pentestTemplate
Model scan enabled? → Set modelScanPolicies
Work items enabled? → Set onThresholdAction: work_item or both AND enable OAuth token

Issue: "No resources selected"

Check your inventoryScope configuration
Verify Organization Name/ID, Project Names/IDs are set correctly
Ensure resources exist in AllTrue inventory

Issue: "Could not resolve organization name"

Verify the name matches exactly in AllTrue UI (case-insensitive but must be exact)
Try using organizationId as a fallback

Issue: "Could not resolve project name"

Verify the name matches exactly in AllTrue UI
Ensure project exists and is active
Check if the project is in the expected organization
Try using projectIds as a fallback

Issue: "Could not resolve HuggingFace onboarding project name"

Verify the name matches exactly in AllTrue UI
Ensure the project exists and is active

Issue: "Permission denied accessing organization lookup endpoint"

Your API key may not have access to /v1/admin/customers/{customer_id}/organizations/projects
Contact your AllTrue Customer Success Engineer to grant appropriate permissions
As a workaround, use UUIDs (organizationId, projectIds) instead of names

Issue: "Missing Organization Identifier" or "Missing Project Identifier"

You're using resource scope without proper access control context
Set either organizationName/id OR projectNames/ids
This is a security requirement to prevent unintended customer-wide scanning

Issue: "Pentest template not found"

Verify template name matches exactly (case-sensitive)
Check available templates in AllTrue UI
Template Management: Configure pentest templates in the AllTrue platform UI

Issue: "Start failures - permission denied"

Verify API key permissions (see Security & Permissions)
Check AllTrue license status

Issue: "Mapped model not available on endpoint"

The model specified in pentestModelMapping isn't available for that specific resource
Check the logs for available models
Verify the model name matches exactly (case-sensitive)
The system will fall back to the endpoint's default model

Issue: "Failed to configure system prompt"

Verify API key has PATCH access to /v1/inventory/customer/resource/{resource_instance_id}/llm-endpoint-resource-additional-config
Check that resource_instance_id is valid
System will continue with existing configuration if PATCH fails (non-blocking)

Issue: "Timeout during polling"

Increase pollTimeoutSecs
Consider reducing maxConcurrentPentests to avoid backend spikes

"Configured work item type 'Bug' not available... Falling back to 'Issue'"

Expected when the project process doesn’t include Bug.
Set adoWorkItemType=Issue to avoid the warning.

"Azure Boards work item creation skipped"

Boards creation requires all three:
- ADO_ORG_URL (or System.CollectionUri)
- ADO_PROJECT (or System.TeamProject)
- ADO_TOKEN (or System.AccessToken)
If you expect System.AccessToken to work:
- Ensure Allow scripts to access OAuth token is enabled.
Check onThresholdAction includes work_item or both

Dedupe not preventing duplicates

Confirm the existing work items are not in terminal states listed in adoDedupeExcludeStates (default: Closed)
Confirm adoDedupeEnabled: true

"System.AccessToken is empty"

Enable "Allow scripts to access OAuth token" in pipeline settings
See Enabling System.AccessToken

"Free memory is lower than 5%" warnings

This warning appears when the agent is under memory pressure. Common causes:

Too many concurrent scans: Reduce maxConcurrentPentests

   maxConcurrentPentests: "3"  # Instead of 8+

Self-hosted agent undersized: Increase agent VM memory
- Recommended: 8GB+ RAM for typical workloads
- Large inventories (100+ resources): 16GB+ RAM
Other jobs running: Ensure agent has dedicated capacity

Impact: Usually none - scans complete successfully despite the warning. Monitor for actual failures or timeouts.

Issue: "No hosted parallelism has been purchased or granted"

Your organization does not have Microsoft-hosted parallelism enabled
Request the free grant: https://aka.ms/azpipelines-parallelism-request
Or use a self-hosted agent (no parallelism limits)

Issue: "HuggingFace scan failed"

Verify project context is provided (huggingfaceOnboardingProjectName, huggingfaceOnboardingProjectId, projectNames, or projectIds)
Check model exists on HuggingFace Hub
Verify format: org/repo or org/repo@revision

Name Resolution Debugging

If names aren't resolving:

Check console output for resolution messages:

[org-resolve] Resolved organization name 'ACME' -> 364fe49b-...
[proj-resolve] Resolved project name 'Production' -> 5c221ef3-...

Verify names in AllTrue UI:
- Log into AllTrue platform
- Copy the exact organization name
- Navigate to Projects and copy exact project names
Check for typos (matching is case-insensitive but must be exact)

Use fallback UUIDs temporarily:

# If name resolution fails, use UUID as fallback
organizationId: '364fe49b-6ea1-4a53-83db-f8311a9c8412'

Verify API permissions for /v1/admin/customers/{customer_id}/organizations/projects

Model Mapping Debugging

If model mapping isn't working as expected:

Check console output for model selection messages:

[i] Model mapping found for OpenAIEndpoint: gpt-4
[OK] Using mapped model: gpt-4

[!] Mapped model 'gpt-4' not available
[i] Available: gpt-3.5-turbo, gpt-4-turbo-preview, ...
[i] Using endpoint default

Verify model names are exact matches (case-sensitive)
Check available models in AllTrue UI for each resource
Test without mapping first to see default behavior

System Prompt Debugging

If system prompt configuration isn't working:

Check console output for configuration messages:

[i] Configuring system prompt on resource...
[OK] System prompt configured successfully

[!] Warning: Failed to configure system prompt: <error>

Verify the resource type supports system prompts (LLM endpoints only)
Check API permissions for GET and PATCH on the additional-config endpoint
Test with simple system prompt first before using complex multi-line prompts

Verify cleanup is working:

[i] System prompt cleaned up from resource

Debug Checklist

✅ Check workflow logs for detailed error messages
✅ Verify all required secrets/variables are set
✅ Confirm API key has necessary permissions
✅ Test with simpler configuration first
✅ Review AllTrue UI for resource visibility
✅ Check for typos in names (case-insensitive but exact)
✅ Use correct syntax for your platform (PowerShell vs bash)

Support

For assistance with:

Configuration: Refer to examples and troubleshooting guide above
API Access: Contact your AllTrue Customer Success Engineer
Technical Issues: Use the Q&A section on this marketplace page
Feature Requests: Submit through the Q&A section
AllTrue Platform

AllTrue AI Security Scanner

AllTrue.ai

AllTrue Security Testing for AI Systems (Azure DevOps)

📖 Table of Contents

What This Task Does

Core Capabilities

Advanced Features

Execution Modes

Installation

Quick Start

Prerequisites

Basic Setup

Configuration Reference

Required Inputs

Core Settings

Execution Toggles

Inventory Scope Configuration

Enhanced Pattern Matching for Resources

Organization & Project Configuration

LLM Pentest Configuration

Basic Configuration

🔧 Advanced Pentest Controls

Model Selection by Resource Type

Guardrails Configuration

System Prompt Configuration

Dataset Configuration (Capture-Replay)

System Description Configuration

Model Scanning Configuration

HuggingFace Model Onboarding

Project Selection / Precedence (Onboarding)

🔄 Repository Reuse Across Projects

📦 ModelPackage-Only Filtering

Failure Thresholds (Pipeline Behaviour)

Azure DevOps Boards Integration

Required Azure DevOps Settings

Enabling System Access Token for Azure Boards

For YAML Pipelines:

For Classic Pipelines:

Alternative: Use a PAT

Work Item Type

Optional Work Item Fields

Dedupe Behavior

Concurrency (Performance)

Polling Configuration

Outputs & Artifacts

Outputs

1. Same-Job Variables (Immediate Access)

Quick Syntax Reference

2. Cross-Job Outputs (Downstream Jobs)

Output Variables Reference

Artifacts

Platform-Specific Considerations

Windows Agents

Linux Agents (Microsoft-hosted or self-hosted)

macOS Agents

Cross-Platform Pipelines

Usage Examples

Example 1: Complete Configuration (All Options)

Example 2: Simple Organization Scan

Example 3: Multi-Stage with Gated Deployment

Example 4: With Azure Boards Integration

Example 5: Complete Cross-Platform Pipeline

Security (Permissions)

Required API Permissions

Azure DevOps Permissions (Boards/Work Items)

Best Practices

1. Use Names for Readability

2. Store Credentials as Pipeline Variables

3. Adjust Timeouts for Multiple Attempts

4. Resource Management

5. Test HuggingFace Models Before Production

6. Model Selection Strategy

7. System Prompt Best Practices

8. Guardrails Configuration

9. Disable detailed issues when you only want a threshold "gate"

Troubleshooting

Common Issues

🧭 Quick "Am I configured right?" flow

Name Resolution Debugging

Model Mapping Debugging