AllTrue Security Testing for AI Systems (Azure DevOps)
Run automated security testing for LLM endpoints and AI models inside Azure Pipelines. The task integrates with the AllTrue platform to discover inventory, execute scans, and optionally create Azure Boards work items for findings.
📖 Table of Contents
What This Task Does
Core Capabilities
- ✅ Automated Discovery: Enumerates LLM endpoints and AI models from your AllTrue inventory
- ✅ LLM Endpoint Pentesting - Test for prompt injection, data leakage, harmful content generation etc.
- ✅ Model Scanning - Scan AI models for malicious code, security vulnerabilities, policy violations etc.
- ✅ HuggingFace Integration - Automatically onboard and scan models from HuggingFace Hub
- ✅ Flexible Scoping - Test at organization, project, or individual resource levels
- ✅ Parallel Execution - Run multiple tests concurrently with intelligent retry logic
- ✅ Outcome-Based Control - Configure pipeline behavior based on security outcomes
Advanced Features
- 🔧 Model Selection - Map specific models to resource types for consistent testing
- 🛡️ Guardrails Testing - Test with or without safety mechanisms
- 📝 System Prompts - Configure and test custom system prompts
- 📊 Capture-Replay - Test with real user interaction patterns
- 🧾 Azure Boards Integration: Automatically create work items for threshold breaches, failures, and (optionally) per-policy/per-category findings
- 📈 Comprehensive Reporting - CSV exports and JSON summaries
Execution Modes
The scanner supports two complementary testing approaches:
- LLM Endpoint Pentesting (
enableLlmPentest): Tests your LLM endpoints for vulnerabilities like prompt injection, data leakage, harmful content generation, and more
- Model Scanning (
enableModelScanning): Scans AI models and model assets for security issues, malicious code, and policy violations
You can enable either or both modes depending on your needs. This task is flexible, but certain inputs become required depending on which mode(s) you enable and how you scope inventory.
Installation
Install from Marketplace: Click "Get it free" on this page and select your Azure DevOps organization
Agent Notes:
- This task runs in Azure Pipelines and requires an agent.
- For Microsoft-hosted agents (ubuntu-latest / “Azure Pipelines” pool), the org must have hosted parallelism (free grant, paid, or otherwise).
- Alternatively, use a self-hosted agent.
- Azure DevOps chooses the shell based on the agent OS, many of the script examples in this documentation are using Bash syntax - you may need to refactor them into the syntax for the shell your agent OS is running.
Python Notes: Ensure your pipeline agent has Python available
- Windows agents: use
python
- Linux/macOS agents: use
python3
- Use
UsePythonVersion@0 to pin a specific version if needed
Quick Start
Prerequisites
AllTrue Account: Active account with API access
Required Credentials: Obtain from your AllTrue Customer Success Engineer:
- API Key (always required)
- API URL (always required)
- Customer ID (always required)
- Organization ID/Name (for organization-scoped testing) or Project ID/Name (for project-scoped testing).
NOTE for resource-scoped testing it is required to have either an Organization or Project ID/Name configured for access control purposes. For ease of use, we recommend setting these values as Repository Variables as noted below.
Basic Setup
Step 1: Configure Pipeline Variables
Navigate to Pipelines -> Edit -> Variables:
Secret Variables (click Keep this value secret):
ALLTRUE_API_KEY = <your-api-key>
Regular Variables:
ALLTRUE_API_URL = https://api.prod.alltrue-be.com
ALLTRUE_CUSTOMER_ID = <your-customer-uuid>
ALLTRUE_ORGANIZATION_NAME = ACME Corporation
Step 2: Add Task to Pipeline
steps:
- task: AllTrueScanner@1
displayName: Run AllTrue AI Security Scanner
inputs:
pythonPath: "python3"
alltrueApiKey: "$(ALLTRUE_API_KEY)"
alltrueApiUrl: "$(ALLTRUE_API_URL)"
alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
enableLlmPentest: true
enableModelScanning: false
inventoryScope: "organization"
organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
pentestTemplate: "Prompt Injection"
pentestNumAttempts: "1"
See Usage Examples section for more complete pipeline examples.
Configuration Reference
| Input |
Description |
Example |
alltrueApiKey |
AllTrue API authentication key (store as a secret variable) |
$(ALLTRUE_API_KEY) |
alltrueApiUrl |
AllTrue API base URL |
https://api.prod.alltrue-be.com |
alltrueCustomerId |
AllTrue Customer UUID |
<your-customer-uuid> |
pythonPath |
Python executable (python, python3, or full path). Use UsePythonVersion@0 to control version. |
python3 |
Core Settings
Execution Toggles
| Input |
Description |
Default |
enableLlmPentest |
Enable LLM endpoint pentesting |
true |
enableModelScanning |
Enable model scanning |
false |
Inventory Scope Configuration
Control what resources are tested:
| Input |
Description |
Default |
Options |
inventoryScope |
Testing scope level |
organization |
organization, project, resource |
organizationId |
Organization UUID |
'' |
Optional (use name instead when possible) |
organizationName |
Organization name (resolves to UUID) |
'' |
For organization scope (preferred) |
projectIds |
Comma-separated project UUIDs |
'' |
For project scope |
projectNames |
Comma-separated project names (resolve to UUIDs) |
'' |
For project scope (preferred) |
targetResourceIds |
Comma-separated resource IDs |
'' |
For resource scope |
targetResourceNames |
Comma-separated resource patterns |
'' |
For resource scope (supports advanced matching) |
Important (resource scope): for access-control reasons, resource scope requires some context. Provide either:
organizationId / organizationName, or
projectIds / projectNames
Scope Examples:
# Test all resources in organization (using name - recommended!)
- task: AllTrueScanner@1
inputs:
inventoryScope: 'organization'
organizationName: 'ACME Corporation'
# Test specific projects (using names - recommended!)
- task: AllTrueScanner@1
inputs:
inventoryScope: 'project'
projectNames: 'Production,Staging,Development'
# OR using UUIDs
- task: AllTrueScanner@1
inputs:
inventoryScope: 'project'
projectIds: 'proj-123,proj-456'
# Test specific resources (with org context using name)
- task: AllTrueScanner@1
inputs:
inventoryScope: 'resource'
organizationName: 'ACME Corporation'
targetResourceNames: 'production-chatbot,staging-api'
Enhanced Pattern Matching for Resources
When using inventoryScope: resource, you can specify targetResourceNames with powerful pattern matching:
| Pattern Type |
Format |
Description |
Example |
| Substring |
text |
Matches any resource containing the text |
OpenAI API Key |
| Repository |
repo:org/name |
Matches all files in a HuggingFace repository |
repo:meta-llama/Llama-2-7b |
| File |
file:name |
Matches specific file names (any file-type resource) |
file:exploit.py |
| Exact |
=name |
Matches only exact display_name |
=MyExactModelName |
| Wildcard |
*pattern* |
Matches resources with pattern in name |
*.gguf* |
Pattern Matching Examples:
# Match specific files across repositories
targetResourceNames: 'file:exploit.py,file:backdoor.onnx,file:model.safetensors'
# Match all files in specific repositories
targetResourceNames: 'repo:IHasFarms/MaliciousModel,repo:unsloth/Qwen'
# Mix patterns for comprehensive selection
targetResourceNames: '*.gguf*,file:config.json,OpenAI API Key (test-BOM)'
# Exact match to avoid over-selection
targetResourceNames: '=production-model-v2,=staging-endpoint'
Key Features:
File-level matching (file:) works with file-type resources:
ModelFile - Python scripts, configuration files
ModelArtifactFile - Model weights, GGUF files, etc.
Repository-level matching (repo:) selects entire HuggingFace repositories:
- Matches
ModelPackage resources only
- Excludes individual files within the repository
Wildcards provide flexible pattern matching:
- Use
* for any characters
- Example:
*.safetensors matches all safetensors files
Multiple patterns can be combined (comma-separated):
- Each pattern is evaluated independently
- Resources matching ANY pattern are selected
- Results are automatically deduplicated
Common Use Cases:
# Security testing: specific malicious files
- task: AllTrueScanner@1
inputs:
inventoryScope: 'resource'
organizationName: 'ACME Corporation'
projectNames: 'Security Testing'
targetResourceNames: 'file:exploit.py,file:backdoor.onnx,file:deserialization.pkl'
# Model format testing: all GGUF files
- task: AllTrueScanner@1
inputs:
inventoryScope: 'resource'
organizationName: 'ACME Corporation'
projectNames: 'Model Repository'
targetResourceNames: '*.gguf*'
# Repository validation: entire HuggingFace repos
- task: AllTrueScanner@1
inputs:
inventoryScope: 'resource'
organizationName: 'ACME Corporation'
projectNames: 'Production'
targetResourceNames: 'repo:company/prod-model,repo:company/staging-model'
# Mixed approach: files + endpoints
- task: AllTrueScanner@1
inputs:
inventoryScope: 'resource'
organizationName: 'ACME Corporation'
projectNames: 'Production'
targetResourceNames: 'file:model.safetensors,OpenAI API Key (prod),Anthropic API Key'
⚠️ Important: Wildcard Matching with Display Names
When using wildcards:
- ✅
*.pkl* - Matches files with .pkl anywhere in name (recommended)
- ✅
.pkl - Substring match (simpler, works for most cases)
- ❌
*.pkl - Only matches if name ENDS with .pkl (rare)
Best Practice: Use *.pkl* (with trailing wildcard) or substring match .pkl for file extensions.
Organization & Project Configuration
You can specify organizations and projects using either UUIDs or names:
| Configuration |
UUID Method |
Name Method |
Notes |
| Organization |
organizationId: 'uuid' |
organizationName: 'ACME' |
Name takes precedence if both provided |
| Projects |
projectIds: 'uuid1,uuid2' |
projectNames: 'Prod,Stage' |
Both are merged (can use together) |
Benefits of using names:
- ✅ More readable and self-documenting
- ✅ Easier to maintain and review
- ✅ No need to look up UUIDs
- ✅ Automatically resolved at runtime (cached for performance)
When to use UUIDs:
- When you need guaranteed stability (names can change)
- When you already have UUIDs in existing configurations
LLM Pentest Configuration
Basic Configuration
| Input |
Description |
Default |
pentestTemplate |
Pentest template name (must match AllTrue template name) |
Prompt Injection |
pentestNumAttempts |
Number of attempts per test case to account for LLM variability |
1 |
Template Management: The run looks up this template by name in AllTrue. Configure pentest templates in the AllTrue platform UI. The name must match exactly (case-sensitive).
Test Case Attempts: When set to a value greater than 1, each test case runs multiple times to account for non-deterministic LLM behavior. The system aggregates results across all runs - if any attempt returns a failure, the test case outcome is marked as failed. This provides more reliable security testing by catching intermittent vulnerabilities that might not appear in every run. Recommended range: 1-5 attempts (higher values increase testing time proportionally).
When to increase attempts:
- ✅ Testing models with high response variability
- ✅ Critical security categories where you need high confidence
- ✅ Production endpoints where false negatives are costly
- ✅ When you've observed inconsistent test results
⚠️ CI/CD Performance Impact: Each additional attempt multiplies total testing time. With pentestNumAttempts: 2, each test case runs twice, so overall pentest duration increases ~2x. CI/CD pipelines typically run slower than local environments, so it's important to increase timeout values proportionally when using multiple attempts.
🔧 Advanced Pentest Controls
Model Selection by Resource Type
Control which model is used for pentesting each type of LLM endpoint:
| Input |
Description |
Default |
pentestModelMapping |
Map resource types to models |
'' |
Format: ResourceType1:model1,ResourceType2:model2
Supported Resource Types:
OpenAIEndpoint
AnthropicEndpoint
BedrockEndpoint
GoogleAIEndpoint
IBMWatsonxEndpoint
Note: Other resource types (e.g., AzureOpenAIEndpoint, CustomLlmEndpoint) will use their configured default model and ignore any mapping specified.
Example Configuration:
- task: AllTrueScanner@1
inputs:
pentestModelMapping: 'OpenAIEndpoint:gpt-4o,AnthropicEndpoint:claude-3-5-sonnet-latest,BedrockEndpoint:anthropic.claude-3-5-sonnet-20241022-v2:0'
How it works:
- The task checks if a mapped model is specified for the resource type
- It validates the model is available on that specific endpoint
- If available, uses the mapped model; otherwise falls back to the endpoint's default
- Logs clear messages about model selection for full transparency
When to use model mapping:
- ✅ Consistency: Ensure the same model version is tested across all runs
- ✅ Specific Testing: Target particular model capabilities or known vulnerabilities
- ✅ Comparison: Compare security characteristics of different models
- ✅ Production Alignment: Test the exact models used in production
Guardrails Configuration
Enable or disable safety guardrails during pentesting:
| Input |
Description |
Default |
pentestApplyGuardrails |
Apply guardrails during execution |
false |
What are guardrails?
- Safety mechanisms configured on your LLM endpoints in AllTrue
- Can include content filtering, PII redaction, harmful content blocking, etc.
- Act as an additional security layer on top of the base model
When to enable guardrails (true):
- ✅ Production Testing: Test endpoints with active safety measures as they appear in production
- ✅ Guardrail Validation: Verify that your guardrails work as expected under attack
- ✅ Compliance Testing: Ensure safety measures remain active during security assessments
- ✅ Defense-in-Depth: Validate your complete security stack
When to disable guardrails (false - default):
- ✅ Baseline Testing: Assess raw model behavior without safety layers
- ✅ Vulnerability Discovery: Find issues that guardrails might mask
- ✅ Root Cause Analysis: Understand underlying model weaknesses
- ✅ Comparative Analysis: Compare protected vs. unprotected behavior
System Prompt Configuration
Configure custom system prompts before pentesting:
| Input |
Description |
Default |
pentestSystemPromptEnabled |
Enable configuring a system prompt before scanning |
false |
pentestSystemPromptText |
Custom system prompt text |
'' |
pentestCleanupSystemPrompt |
Clean up (restore/clear) system prompt after scan |
true |
How it works:
- Before pentesting: The task configures the system prompt on the LLM endpoint resource
- During pentesting: Tests run with
system_prompt_enabled: true in the pentest payload
- After pentesting: System prompt is optionally cleared (if
pentestCleanupSystemPrompt: true)
Use cases:
- ✅ Production Testing: Test your actual production system prompt configuration
- ✅ Effectiveness Validation: Verify that system prompts provide adequate protection
- ✅ Comparative Testing: Compare security outcomes with different system prompts
- ✅ Safety Research: Understand how different prompt strategies affect security
- ✅ Compliance: Ensure system-level instructions meet security requirements
Example - Testing Production System Prompt:
- task: AllTrueScanner@1
inputs:
pentestSystemPromptEnabled: true
pentestSystemPromptText: |
You are a helpful, harmless, and honest AI assistant. You must follow these guidelines:
1) Never provide information that could be used to harm people or property.
2) Decline requests for illegal activities.
3) Be respectful and avoid generating offensive content.
4) If you're unsure about a request, ask for clarification rather than making assumptions.
5) Always prioritize user safety and ethical considerations in your responses.
pentestCleanupSystemPrompt: true
System Prompt Best Practices:
- Keep prompts clear and specific
- Include explicit safety rules and boundaries
- Test both with and without system prompts to understand their impact
- Use multi-line format with
| for readability
- Enable cleanup (
true) to avoid affecting other tests
Cleanup Behavior:
true (default): Clears the system prompt after testing, restoring the original state
false: Leaves the configured system prompt on the resource (use if you want to persist the configuration)
Dataset Configuration (Capture-Replay)
Configure capture-replay datasets for realistic pentesting with real user interaction patterns:
| Input |
Description |
Default |
pentestDatasetEnabled |
Enable dataset configuration |
false |
pentestDatasetId |
Dataset UUID |
'' |
pentestDatasetName |
Dataset name (resolved to UUID); project context required |
'' |
pentestCleanupDataset |
Clean up dataset configuration after scan |
true |
What are capture-replay datasets?
- Collections of real user interactions captured from your production LLM endpoints
- Enable testing with realistic attack patterns based on actual usage
- Provide more representative security assessments than synthetic test cases
How it works:
- Before pentesting: Configures dataset on the LLM endpoint resource
- During pentesting: Tests incorporate patterns from the dataset
- After pentesting: Optionally clears dataset configuration
Dataset Resolution:
- Use
pentestDatasetId for direct UUID reference
- Use
pentestDatasetName for automatic name-to-UUID resolution
- Name resolution requires project context (set
projectNames or projectIds)
Example:
- task: AllTrueScanner@1
inputs:
pentestDatasetEnabled: true
pentestDatasetName: 'Production User Patterns Q4'
pentestCleanupDataset: true
# Project context required for name resolution
inventoryScope: 'project'
projectNames: 'Production'
Use cases:
- ✅ Realistic Testing: Test with actual user interaction patterns
- ✅ Production Alignment: Security assessment based on real usage
- ✅ Compliance: Demonstrate testing against production-like scenarios
- ✅ Attack Pattern Discovery: Identify vulnerabilities in real user flows
Best practices:
- Use production datasets for most accurate security assessment
- Enable cleanup to avoid affecting other tests
- Combine with system prompts and guardrails for comprehensive testing
System Description Configuration
Configure a resource-level system description on the LLM endpoint resource:
| Input |
Description |
Default |
pentestResourceSystemDescriptionEnabled |
Enable setting a system description on the endpoint. |
false |
pentestResourceSystemDescriptionText |
System description text |
'' |
pentestCleanupResourceSystemDescription |
Clean up system description after scan |
false |
What is the system description?
- This maps to
llm_endpoint_resource_system_description on the LLM endpoint resource.
- It is distinct from the system prompt:
- System prompt: instruction/policy text that influences model behavior
- System description: metadata/context describing the endpoint (used by some providers/tasks)
How it works:
- Before pentesting, the system description is configured on the LLM endpoint resource
- The pentest executes using the resource configuration in AllTrue
- After testing completes (success or failure), the system description is optionally cleared
Example:
- task: AllTrueScanner@1
inputs:
pentestResourceSystemDescriptionEnabled: true
pentestResourceSystemDescriptionText: |
Customer-facing support assistant for ACME. Handles account questions and order status.
Do not include internal-only data in responses.
pentestCleanupResourceSystemDescription: false
Model Scanning Configuration
| Input |
Description |
Default |
modelScanPolicies |
Comma-separated policy names |
'model-scan-code-execution-prohibited,model-scan-input-output-operations-prohibited,model-scan-network-access-prohibited,model-scan-malware-signatures-prohibited,model-custom-layers-prohibited' (all policies applied by default, omit individual polcicies as desired) |
modelScanDescription |
Free-text description attached to the run |
CI Model Scan |
Available Policies:
model-scan-code-execution-prohibited
model-scan-input-output-operations-prohibited
model-scan-network-access-prohibited
model-scan-malware-signatures-prohibited
model-custom-layers-prohibited
HuggingFace Model Onboarding
Automatically onboard and scan models from HuggingFace Hub:
| Input |
Description |
Default |
huggingfaceOnboardingEnabled |
Enable HF onboarding |
false |
huggingfaceModelsToOnboard |
Models to onboard |
'' |
huggingfaceOnboardingProjectName |
Project name to associate onboarded models with (preferred) |
'' |
huggingfaceOnboardingProjectId |
Project UUID to associate onboarded models with |
'' |
huggingfaceOnboardingWaitSecs |
Wait time after onboarding (indexing) |
10 |
huggingfaceOnboardingOnly |
If true, scan only onboarded HF models (skip normal inventory selection) |
false |
Project Selection / Precedence (Onboarding)
When onboarding is enabled, the task chooses the onboarding project in this order:
huggingfaceOnboardingProjectName (resolved to a UUID at runtime)
huggingfaceOnboardingProjectId
- First project from
projectIds / projectNames (after name -> ID resolution)
If you provide both a name and an ID, the name wins.
Examples:
# ✅ Preferred: use a project name (more readable)
- task: AllTrueScanner@1
inputs:
huggingfaceOnboardingEnabled: true
huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'
huggingfaceOnboardingProjectName: 'ML Engineering'
# ✅ UUID also supported (fallback)
- task: AllTrueScanner@1
inputs:
huggingfaceOnboardingEnabled: true
huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'
huggingfaceOnboardingProjectId: '270fca05-7b02-414e-8337-d50c0cc00507'
# ✅ Or rely on the first configured project
- task: AllTrueScanner@1
inputs:
projectNames: 'ML Engineering,Staging'
huggingfaceOnboardingEnabled: true
huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'
Format: org1/repo1,org2/repo2@revision or JSON array
Usage Modes:
Combined Mode (huggingfaceOnboardingOnly: false):
- Scans both inventory models AND onboarded HuggingFace models
- Perfect for comprehensive security testing
HuggingFace-Only Mode (huggingfaceOnboardingOnly: true):
- Skips inventory selection
- Scans ONLY the onboarded HuggingFace models
- Perfect for pre-production validation of specific models
Example:
- task: AllTrueScanner@1
inputs:
enableModelScanning: true
inventoryScope: 'project'
projectNames: 'Production'
# Onboard and scan a new HuggingFace model
huggingfaceOnboardingEnabled: true
huggingfaceModelsToOnboard: 'mistralai/Mistral-7B-Instruct-v0.2'
huggingfaceOnboardingProjectName: 'Production'
# false = scan inventory + HF model (combined)
# true = scan only HF model (skip inventory)
huggingfaceOnboardingOnly: false
Notes:
- Requires some project context (one of):
huggingfaceOnboardingProjectName, huggingfaceOnboardingProjectId
- or
projectNames / projectIds
- ⚠️ Important: HuggingFace onboarding creates persistent inventory resources. Models remain in your AllTrue inventory after the scan completes and are not automatically deleted. This is intentional—allowing you to track and manage onboarded models over time.
- 10-second default wait allows backend indexing
- LLM pentesting unaffected by
huggingfaceOnboardingOnly
Failure Thresholds (Pipeline Behaviour)
These settings control pass/fail and work item creation:
| Input |
Description |
Default |
Options |
failOutcomeAtOrAbove |
The job fails if the worst known outcome is at or above this level and onThresholdAction includes fail. Use '' to disable thresholding |
moderate |
critical, poor, moderate, good, '' (none) |
onThresholdAction |
Action when threshold defined by failOutcomeAtOrAbove is breached |
fail |
fail, work_item, both, none |
onHardFailuresAction |
Controls behavior for start/polling/permission errors (not test outcomes) |
ignore |
fail, work_item, both, ignore |
Outcome Severity Levels (most to least severe):
- Critical: Critical vulnerabilities requiring immediate action
- Poor: Significant security concerns
- Moderate: Issues requiring attention
- Good: Minor issues, acceptable risk
- Excellent: No security issues found
Action Types:
fail: Fail the pipeline
work_item: Create Azure DevOps work items
both: Fail pipeline AND create work items
none/ignore: No action (continue)
Notes
- "Unknown" outcomes do not count towards failing the threshold.
- When onThresholdAction includes work_item, the task will also create per-category (pentest) and per-policy (model scan) issues, filtered by categoryIssueMinSeverity (described below).
- Setting
categoryIssueMinSeverity: none disables per-category/per-policy work items, but threshold/hard-failure job-level work items can still be created when actions include work_item.
Azure DevOps Boards Integration
When enabled, the scanner can create work items in Azure Boards for:
- Threshold breaches (e.g. outcome at/above your configured threshold)
- Hard failures (start/poll/permission errors)
- Optional detailed findings
- Per-category (LLM pentest)
- Per-policy (model scan)
| Input |
Description |
Default |
categoryIssueMinSeverity |
Minimum severity for per-category (pentest) and per-policy (model scan) issues |
INFORMATIONAL |
Severity Levels: CRITICAL > HIGH > MEDIUM > LOW > INFORMATIONAL
Special Value: none
- If
categoryIssueMinSeverity: none, the action will not create per-category/per-policy issues.
- Job-level issues (threshold breach / hard failures) may still be created when
onThresholdAction or onHardFailuresAction includes work_item.
Required Azure DevOps Settings
| Input |
Description |
Example |
Default |
adoOrgUrl |
Organization URL. Usually leave blank unless overriding Azure Pipelines default behaviour |
https://dev.azure.com/myorg |
System.CollectionUri |
adoProject |
Project name. Usually leave blank unless overriding Azure Pipelines default |
my-project |
System.TeamProject |
adoToken |
Auth token (PAT or OAuth token in pipelines). Usually leave blank unless overriding Azure Pipelines default |
... |
System.AccessToken |
Enabling System Access Token for Azure Boards
Required for work item creation. If you see "Azure Boards work item creation skipped", follow these steps:
For YAML Pipelines:
Organization Settings (one-time):
- Navigate to: Organization Settings → Pipelines → Settings
- Enable: "Limit job authorization scope to current project for non-release pipelines" (if disabled globally)
Project Settings (one-time):
- Navigate to: Project Settings → Pipelines → Settings
- Enable: "Limit job authorization scope to current project for non-release pipelines"
Pipeline YAML (per pipeline):
# Add this validation step to your pipeline
- script: |
if [ -z "$(System.AccessToken)" ]; then
echo "##vso[task.logissue type=error]System.AccessToken is empty!"
echo "Enable 'Allow scripts to access OAuth token' in pipeline settings"
exit 1
fi
displayName: "Validate OAuth token availability"
For Classic Pipelines:
- Edit pipeline → Options tab
- Check: "Allow scripts to access the OAuth token"
- Save
Alternative: Use a PAT
If OAuth token setup is problematic, use a Personal Access Token instead:
- task: AllTrueScanner@1
inputs:
adoToken: "$(ADO_PAT)" # Store PAT as secret variable
# ... other inputs
PAT Requirements:
- Scope: Work Items (Read & write)
- Organization: Same as your Azure DevOps organization
Using System.AccessToken reliably:
- Boards requires
System.AccessToken unless you provide adoToken
- It will be empty unless
Allow scripts to access OAuth token is enabled
- Optional: use a PAT with Work Items (Read & write)
Example for PowerShell on Windows, explicitly map the token into env: and reference it as $env:SYSTEM_ACCESSTOKEN:
- task: PowerShell@2
displayName: "Validate OAuth token availability"
inputs:
targetType: 'inline'
script: |
if ([string]::IsNullOrEmpty($env:SYSTEM_ACCESSTOKEN)) {
Write-Host "##vso[task.logissue type=warning]SYSTEM_ACCESSTOKEN is empty. Enable 'Allow scripts to access the OAuth token'."
exit 0
}
Write-Host "SYSTEM_ACCESSTOKEN is present."
env:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
For Bash on Linux/macOS, you can access it directly:
- bash: |
if [ -z "$(System.AccessToken)" ]; then
echo "##vso[task.logissue type=warning]System.AccessToken is empty. Enable 'Allow scripts to access the OAuth token'."
exit 0
fi
echo "System.AccessToken is present."
displayName: "Validate OAuth token availability"
Work Item Type
| Input |
Description |
Default |
adoWorkItemType |
Preferred work item type name. Auto-fallback if missing |
Issue |
Important behavior:
- The scanner discovers available work item types for your project.
- If your preferred type isn’t available (ex:
Bug not present in the process), it automatically falls back.
Default fallback order:
- preferred type
Issue
Bug
Task
- first available type
Optional Work Item Fields
| Input |
Description |
adoAssignedTo |
Set System.AssignedTo (display name or email; must be valid in org) |
adoAreaPath |
Set System.AreaPath (e.g. Project\Team) |
adoIterationPath |
Set System.IterationPath (e.g. Project\Sprint 1) |
adoDefaultTags |
Semicolon-separated tags to apply to every work item. |
Note on Tags: Azure DevOps stores tags separated by semicolons internally. This input accepts comma-separated OR semicolon-separated values—both formats are automatically converted to the correct internal format.
Examples:
# ✅ Both formats work
adoDefaultTags: "security,automated,ci-cd"
adoDefaultTags: "security;automated;ci-cd"
Dedupe Behavior
| Input |
Description |
Default |
adoDedupeEnabled |
Enable dedupe checks before creating work items |
true |
adoDedupeExcludeStates |
Terminal states to exclude from dedupe checks |
Closed |
How dedupe works:
Dedupe is done primarily via tags (stable + searchable), with a fallback to an HTML marker in the description:
Before creating a new item, it runs WIQL:
- Tag-based dedupe checks
System.Tags CONTAINS '<dedupe tag>'
- Marker fallback checks
System.Description CONTAINS '<marker>'
This prevents duplicate work items when the same finding repeats across runs (until the existing item reaches a terminal state such as "Closed").
Treat adoDedupeExcludeStates as process-specific (Agile/Scrum/CMMI/custom).
| Input |
Description |
Default |
maxConcurrentPentests |
Max concurrent tests |
8 |
startStaggerSecs |
Delay between starting tests to avoid backend spikes |
0 |
maxStartRetries |
Retries only start errors that are transient (5xx/429/etc) |
3 |
startRetryDelay |
Delay between retries |
30 |
Polling Configuration
| Input |
Description |
Default |
pollTimeoutSecs |
Max wait time per resource (in seconds) |
5400 (1.5 hours) |
pollTimeoutAction |
Behavior on timeout |
fail |
graphqlPollIntervalSecs |
Poll interval for execution completion checks |
30 seconds |
Timeout Actions:
fail: Mark as timeout failure
continue: Continue pipeline (test may still run server-side)
partial: Attempt to retrieve partial results via GraphQL before giving up
How polling works: The task uses pure GraphQL polling to monitor test execution. It polls the GraphQL endpoint at regular intervals (default 30 seconds) until tests complete or timeout is reached.
⚠️ Important: When using pentestNumAttempts > 1, increase pollTimeoutSecs proportionally. Example: with pentestNumAttempts: 2, set pollTimeoutSecs: 10800 (3 hours) to account for doubled execution time plus CI/CD overhead.
Outputs & Artifacts
Outputs
The task provides outputs in two formats for maximum flexibility:
Available immediately in subsequent steps of the same job using $(VARIABLE) or $env:VARIABLE:
| Variable |
Values |
ALLTRUE_OVERALL_STATUS |
success, neutral, failure |
ALLTRUE_LLM_PENTEST_STATUS |
success, neutral, failure |
ALLTRUE_MODEL_SCAN_STATUS |
success, neutral, failure |
ALLTRUE_WORST_OUTCOME |
Critical, Poor, Moderate, Good, Excellent, Unknown |
Quick Syntax Reference
| Shell |
Same-Job Access |
Notes |
| bash (Linux/Mac) |
$ALLTRUE_OVERALL_STATUS |
Default on Linux/Mac agents |
| PowerShell (Windows) |
$env:ALLTRUE_OVERALL_STATUS |
Recommended for Windows |
| cmd (Windows) |
%ALLTRUE_OVERALL_STATUS% |
Generic script: on Windows |
Usage:
- task: AllTrueScanner@1
inputs:
# ... config ...
# ✅ Same-job access (PowerShell on Windows)
- task: PowerShell@2
condition: always()
inputs:
script: echo "Status: $env:ALLTRUE_OVERALL_STATUS"
# ✅ Same-job access (bash on Linux/Mac)
- bash: echo "Status: $ALLTRUE_OVERALL_STATUS"
condition: always()
2. Cross-Job Outputs (Downstream Jobs)
Access from dependent jobs using dependencies.<job>.outputs['<taskName>.<output>']:
Requirements:
- Set a
name: on the AllTrueScanner task
- Reference via
dependencies in dependent jobs
Example:
- job: security_scan
steps:
- task: AllTrueScanner@1
name: alltrueScan # ← Required for cross-job access
inputs:
# ... config ...
- job: deploy_staging
dependsOn: security_scan
condition: ne(dependencies.security_scan.outputs['alltrueScan.worstOutcome'], 'Critical')
steps:
- script: echo "Deploying to staging..."
- job: deploy_production
dependsOn: security_scan
condition: eq(dependencies.security_scan.outputs['alltrueScan.overallStatus'], 'success')
steps:
- script: echo "Deploying to production!"
Output Variables Reference
Output names are stable and versioned; you can safely depend on them for gates and notifications.
| Output Name |
Access in Same Job |
Access in Dependent Job |
overallStatus |
$ALLTRUE_OVERALL_STATUS |
dependencies.<job>.outputs['<taskName>.overallStatus'] |
llmPentestStatus |
$ALLTRUE_LLM_PENTEST_STATUS |
dependencies.<job>.outputs['<taskName>.llmPentestStatus'] |
modelScanStatus |
$ALLTRUE_MODEL_SCAN_STATUS |
dependencies.<job>.outputs['<taskName>.modelScanStatus'] |
worstOutcome |
$ALLTRUE_WORST_OUTCOME |
dependencies.<job>.outputs['<taskName>.worstOutcome'] |
Artifacts
The task automatically uploads scan results as a timestamped artifact:
- Artifact Name:
alltrue-scan-results-YYYY-MM-DDTHH-MM-SS
- Contents: All pentest CSVs, model scan CSVs, JSON summaries
| Input |
Default |
Description |
publishResultsArtifact |
true |
Upload the results directory as a pipeline artifact |
resultsArtifactName |
alltrue-scan-results |
Artifact base name (a timestamp suffix is added) |
Artifact Features:
- The task writes results into:
$(Build.SourcesDirectory)/.alltrue-results
- When artifact publishing is enabled, that folder is uploaded.
Windows Agents
- PowerShell is recommended for scripts that access environment variables
- Use
$env:VARIABLE_NAME syntax
- Self-hosted Windows agents fully supported (tested on Windows Server)
- The generic
script: task uses cmd.exe on Windows (not bash)
Example:
pool:
name: 'MyWindowsAgents'
steps:
- task: AllTrueScanner@1
name: alltrueScan
inputs:
pythonPath: "python" # or "python3"
# ... other inputs
- task: PowerShell@2
displayName: "Show scan results"
condition: always()
inputs:
targetType: 'inline'
script: |
Write-Host "=== Security Scan Results ==="
Write-Host "Overall Status: $env:ALLTRUE_OVERALL_STATUS"
Write-Host "Worst Outcome: $env:ALLTRUE_WORST_OUTCOME"
if ($env:ALLTRUE_OVERALL_STATUS -eq 'failure') {
Write-Host "##vso[task.logissue type=error]Security scan failed!"
exit 1
}
Linux Agents (Microsoft-hosted or self-hosted)
- bash or script tasks work as expected
- Use
$VARIABLE_NAME syntax
- Most Microsoft-hosted images include Python 3.11+
Example:
pool:
vmImage: 'ubuntu-latest'
steps:
- task: AllTrueScanner@1
name: alltrueScan
inputs:
pythonPath: "python3"
# ... other inputs
- bash: |
echo "=== Security Scan Results ==="
echo "Overall Status: $ALLTRUE_OVERALL_STATUS"
echo "Worst Outcome: $ALLTRUE_WORST_OUTCOME"
if [ "$ALLTRUE_OVERALL_STATUS" = "failure" ]; then
echo "##vso[task.logissue type=error]Security scan failed!"
exit 1
fi
displayName: "Show scan results"
condition: always()
macOS Agents
- Same as Linux (bash syntax)
- Ensure Python 3.11+ is available via
UsePythonVersion@0
- Use
$VARIABLE_NAME syntax
Example:
pool:
vmImage: 'macOS-latest'
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '3.11'
- task: AllTrueScanner@1
name: alltrueScan
inputs:
pythonPath: "python3"
# ... other inputs
- bash: echo "Status: $ALLTRUE_OVERALL_STATUS"
condition: always()
For pipelines that run on multiple platforms, use explicit bash: task:
strategy:
matrix:
Linux:
vmImage: 'ubuntu-latest'
Windows:
vmImage: 'windows-latest'
macOS:
vmImage: 'macOS-latest'
pool:
vmImage: $(vmImage)
steps:
- task: AllTrueScanner@1
name: alltrueScan
inputs:
# ... config ...
# ✅ Works on all platforms
- bash: |
echo "Overall Status: $ALLTRUE_OVERALL_STATUS"
echo "Worst Outcome: $ALLTRUE_WORST_OUTCOME"
displayName: "Show results (cross-platform)"
condition: always()
Usage Examples
Example 1: Complete Configuration (All Options)
Comprehensive example showing every available configuration option:
trigger: none
pr: none
pool:
vmImage: ubuntu-latest
stages:
- stage: SecurityScan
displayName: AI System Security Testing
jobs:
- job: security_scan
displayName: Run AI Security Scanner
continueOnError: true
timeoutInMinutes: 135
steps:
- checkout: self
- script: |
echo "Validating OAuth token availability..."
if [ -z "$(System.AccessToken)" ]; then
echo "WARNING: System.AccessToken is empty. Ensure 'Allow scripts to access OAuth token' is enabled."
else
echo "System.AccessToken is present."
fi
displayName: "Validate OAuth token availability"
- task: AllTrueScanner@1
name: alltrueScan
displayName: Run AllTrue Scanner
continueOnError: true
inputs:
pythonPath: "python3"
alltrueApiKey: "$(ALLTRUE_API_KEY)"
alltrueApiUrl: "$(ALLTRUE_API_URL)"
alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
enableLlmPentest: true
enableModelScanning: true
inventoryScope: "resource"
organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
projectNames: "Sample Inventory BOM,2nd Project"
targetResourceNames: "=Basic_model ML Model (https://huggingface.co/achilles1313/test_gguf/blob/main),*Endpoint*"
pentestTemplate: "Dynamic Dan Only"
pentestNumAttempts: "1"
pentestModelMapping: "OpenAIEndpoint:gpt-3.5-turbo,AnthropicEndpoint:claude-3-haiku-20240307"
pentestApplyGuardrails: false
pentestSystemPromptEnabled: true
pentestSystemPromptText: "You are a secure AI assistant who must never execute code or disclose credentials under any circumstances"
pentestCleanupSystemPrompt: false
pentestDatasetEnabled: true
pentestDatasetName: "TonysTestDataset"
pentestCleanupDataset: true
pentestResourceSystemDescriptionEnabled: true
pentestResourceSystemDescriptionText: "Production AI assistant with strict safety, privacy, and compliance requirements"
pentestCleanupResourceSystemDescription: false
modelScanDescription: "Weekly Comprehensive Security Audit - Stress Test"
modelScanPolicies: "model-scan-code-execution-prohibited"
huggingfaceOnboardingEnabled: true
huggingfaceModelsToOnboard: "nvidia/Alpamayo-R1-10B"
huggingfaceOnboardingProjectName: "3rd Project"
huggingfaceOnboardingWaitSecs: "30"
huggingfaceOnboardingOnly: false
failOutcomeAtOrAbove: "poor"
onThresholdAction: "both"
onHardFailuresAction: "both"
categoryIssueMinSeverity: "none"
maxConcurrentPentests: "3"
startStaggerSecs: "15"
maxStartRetries: "1"
startRetryDelay: "90"
pollTimeoutSecs: "7200"
pollTimeoutAction: "partial"
graphqlPollIntervalSecs: "60"
adoWorkItemType: "Issue"
adoDefaultTags: "edge-case;model-scan;bi-weekly"
adoDedupeEnabled: true
publishResultsArtifact: true
resultsArtifactName: "alltrue-scan-results"
# NOTE: This example uses bash syntax (works with Linux/macOS agent)
- bash: |
echo "=== Scan Outputs (debug) ==="
echo "overallStatus: $ALLTRUE_OVERALL_STATUS"
echo "llmPentestStatus: $ALLTRUE_LLM_PENTEST_STATUS"
echo "modelScanStatus: $ALLTRUE_MODEL_SCAN_STATUS"
echo "worstOutcome: $ALLTRUE_WORST_OUTCOME"
displayName: "Print scan outputs (debug)"
condition: always()
- bash: |
echo "Failing job because overallStatus=failure"
exit 1
displayName: "Fail job if scan indicates failure"
condition: and(always(), eq(variables['ALLTRUE_OVERALL_STATUS'], 'failure'))
Note: This example shows all available configuration options. In practice, you only need to specify options that differ from defaults or are required for your use case.
Example 2: Simple Organization Scan
trigger:
branches:
include:
- main
pool:
vmImage: 'ubuntu-latest'
jobs:
- job: security_scan
displayName: AI Security Scanner
steps:
- task: AllTrueScanner@1
inputs:
pythonPath: "python3"
alltrueApiKey: "$(ALLTRUE_API_KEY)"
alltrueApiUrl: "$(ALLTRUE_API_URL)"
alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
enableLlmPentest: true
pentestTemplate: "Prompt Injection"
failOutcomeAtOrAbove: "moderate"
onThresholdAction: "both"
Example 3: Multi-Stage with Gated Deployment
stages:
- stage: SecurityScan
jobs:
- job: security_scan
continueOnError: true
steps:
- task: AllTrueScanner@1
name: alltrueScan
inputs:
alltrueApiKey: "$(ALLTRUE_API_KEY)"
alltrueApiUrl: "$(ALLTRUE_API_URL)"
alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
enableLlmPentest: true
enableModelScanning: true
- stage: DeployProduction
dependsOn: SecurityScan
condition: eq(stageDependencies.SecurityScan.security_scan.outputs['alltrueScan.overallStatus'], 'success')
jobs:
- job: deploy
steps:
- script: echo "Deploying to production!"
Example 4: With Azure Boards Integration
- task: AllTrueScanner@1
inputs:
alltrueApiKey: "$(ALLTRUE_API_KEY)"
alltrueApiUrl: "$(ALLTRUE_API_URL)"
alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
enableLlmPentest: true
pentestTemplate: "Prompt Injection"
onThresholdAction: "both"
categoryIssueMinSeverity: "MEDIUM"
adoWorkItemType: "Issue"
adoDefaultTags: "security,ai-testing"
adoAssignedTo: "security-team@company.com"
Windows/Linux support, same-job + cross-job outputs, deployment gates
This example shows the recommended approach for consuming scanner results:
- In the same job: read the environment variables set by the task (shell-specific syntax differs)
- Across jobs: use dependencies..outputs['.'] (same syntax on every OS)
trigger: none
pr: none
stages:
- stage: SecurityScan
displayName: AI System Security Testing
jobs:
# ------------------------------------------------------------
# 1) Run scan (produces outputs)
# ------------------------------------------------------------
- job: security_scan
displayName: Run AllTrue Scanner
timeoutInMinutes: 135
continueOnError: true
pool:
name: Default # Works for self-hosted Windows or Linux pools
steps:
- checkout: self
persistCredentials: true
# (Optional) Windows-friendly OAuth token validation for Boards
# NOTE: Requires Pipeline setting "Allow scripts to access the OAuth token"
- task: PowerShell@2
displayName: "Validate OAuth token availability (Windows/PowerShell)"
condition: and(succeededOrFailed(), eq(variables['Agent.OS'], 'Windows_NT'))
inputs:
targetType: 'inline'
script: |
if ([string]::IsNullOrEmpty($env:SYSTEM_ACCESSTOKEN)) {
Write-Host "##vso[task.logissue type=warning]SYSTEM_ACCESSTOKEN is empty. Enable: 'Allow scripts to access the OAuth token'."
exit 0
}
Write-Host "SYSTEM_ACCESSTOKEN is present."
env:
SYSTEM_ACCESSTOKEN: $(System.AccessToken)
- task: AllTrueScanner@1
name: alltrueScan # IMPORTANT: required for cross-job outputs
displayName: Run AllTrue AI Security Scanner
continueOnError: true
inputs:
pythonPath: "python"
alltrueApiKey: "$(ALLTRUE_API_KEY)"
alltrueApiUrl: "$(ALLTRUE_API_URL)"
alltrueCustomerId: "$(ALLTRUE_CUSTOMER_ID)"
enableLlmPentest: true
enableModelScanning: true
inventoryScope: "organization"
organizationName: "$(ALLTRUE_ORGANIZATION_NAME)"
# Example gate config
failOutcomeAtOrAbove: "poor"
onThresholdAction: "both"
onHardFailuresAction: "both"
categoryIssueMinSeverity: "none"
publishResultsArtifact: true
resultsArtifactName: "alltrue-scan-results"
# --- Same-job outputs (cross-platform print) ---
# Linux/macOS agent => Bash
- bash: |
echo "=== AllTrue outputs (bash) ==="
echo "overallStatus: $ALLTRUE_OVERALL_STATUS"
echo "llmPentestStatus: $ALLTRUE_LLM_PENTEST_STATUS"
echo "modelScanStatus: $ALLTRUE_MODEL_SCAN_STATUS"
echo "worstOutcome: $ALLTRUE_WORST_OUTCOME"
displayName: "Print outputs (bash)"
condition: and(always(), ne(variables['Agent.OS'], 'Windows_NT'))
# Windows agent => PowerShell
- task: PowerShell@2
displayName: "Print outputs (PowerShell)"
condition: and(always(), eq(variables['Agent.OS'], 'Windows_NT'))
inputs:
targetType: 'inline'
script: |
Write-Host "=== AllTrue outputs (PowerShell) ==="
Write-Host "overallStatus: $env:ALLTRUE_OVERALL_STATUS"
Write-Host "llmPentestStatus: $env:ALLTRUE_LLM_PENTEST_STATUS"
Write-Host "modelScanStatus: $env:ALLTRUE_MODEL_SCAN_STATUS"
Write-Host "worstOutcome: $env:ALLTRUE_WORST_OUTCOME"
# Windows agent => cmd (optional)
- script: |
echo === AllTrue outputs (cmd) ===
echo overallStatus: %ALLTRUE_OVERALL_STATUS%
echo llmPentestStatus: %ALLTRUE_LLM_PENTEST_STATUS%
echo modelScanStatus: %ALLTRUE_MODEL_SCAN_STATUS%
echo worstOutcome: %ALLTRUE_WORST_OUTCOME%
displayName: "Print outputs (cmd)"
condition: and(always(), eq(variables['Agent.OS'], 'Windows_NT'))
# ------------------------------------------------------------
# 2) Notify team based on cross-job outputs (OS-agnostic)
# ------------------------------------------------------------
- job: notify_security
displayName: Notify Security Team (gated)
dependsOn: security_scan
condition: and(always(), eq(dependencies.security_scan.outputs['alltrueScan.overallStatus'], 'failure'))
variables:
overallStatus: $[ dependencies.security_scan.outputs['alltrueScan.overallStatus'] ]
llmPentestStatus:$[ dependencies.security_scan.outputs['alltrueScan.llmPentestStatus'] ]
modelScanStatus: $[ dependencies.security_scan.outputs['alltrueScan.modelScanStatus'] ]
worstOutcome: $[ dependencies.security_scan.outputs['alltrueScan.worstOutcome'] ]
steps:
- script: |
echo "ALERT: AllTrue scan failed"
echo "overallStatus: $(overallStatus)"
echo "llmPentestStatus: $(llmPentestStatus)"
echo "modelScanStatus: $(modelScanStatus)"
echo "worstOutcome: $(worstOutcome)"
displayName: "Emit alert"
# ------------------------------------------------------------
# 3) Gate deployment based on cross-job outputs (OS-agnostic)
# ------------------------------------------------------------
- job: deploy_production
displayName: Deploy to Production (gated)
dependsOn: security_scan
condition: eq(dependencies.security_scan.outputs['alltrueScan.overallStatus'], 'success')
steps:
- script: |
echo "OK: AllTrue checks passed. Deploying to production..."
displayName: "Deploy"
Cheat sheet:
Same job:
- bash:
$ALLTRUE_WORST_OUTCOME
- PowerShell:
$env:ALLTRUE_WORST_OUTCOME
- cmd:
%ALLTRUE_WORST_OUTCOME%
Different job/stage:
dependencies.security_scan.outputs['alltrueScan.worstOutcome']
Important:
script: steps are not portable across operating systems:
Linux/macOS → bash
Windows → cmd.exe
- For portable pipelines, prefer:
bash: for Linux/macOS only
PowerShell@2 for Windows or cross-platform use
Security (Permissions)
Required API Permissions
Your AllTrue API key must have access to:
- get /v2/ai-validation/importable-datasets
- get /v2/llm-pentest/customer/{customer_id}/llm-pentest-models/{resource_instance_id}
- get /v1/inventory/customer/resource/{resource_instance_id}/llm-endpoint-resource-additional-config
- patch /v1/inventory/customer/resource/{resource_instance_id}/llm-endpoint-resource-additional-config
- get /v2/graphql
- post /v2/graphql
- get /v1/graphql
- post /v1/graphql
- query v2.llmPentestScanExecution
- get /v1/inventory/customer/{customer_id}/resources
- get /v2/llm-pentest/customer/{customer_id}/templates
- post /v2/llm-pentest/customer/{customer_id}/start-pentest
- post /v2/llm-pentest/customer/{customer_id}/executions/{llm_pentest_scan_execution_id}/download-csv
- query v2.resourceInstanceForLlmPentestScanExecution
- query v2.failedCategoriesResultsPerCategory
- query aiSpmGetPentestIssues
- query v1.aiSpmGetPentestIssues
- post /v1/posture-management/customers/{customer_id}/model-scanning/check-policies
- query v2.modelScanExecution
- query v2.resourceInstanceForModelScanExecution
- query v2.modelScanResultsPerPolicy
- query modelScanDetails
- query v1.modelScanDetails
- get /v1/admin/customers/{customer_id}/organizations/projects
- query modelScanSummaries
- query v1.modelScanSummaries
- query v2.modelScanSummaries
- post /v1/inventory/resources
Azure DevOps Permissions (Boards/Work Items)
For Azure Boards work item creation:
- Pipelines OAuth token:
System.AccessToken must be enabled and authorized to create work items in the project
- Alternative: Use a PAT with Work Items (Read & write) permissions
Best Practices
1. Use Names for Readability
# ✅ Recommended
organizationName: 'ACME Corporation'
projectNames: 'Production,Staging'
# ❌ Less readable
organizationId: '364fe49b-6ea1-4a53-83db-f8311a9c8412'
projectIds: '5c221ef3-86a5-49e0-bce9-df09b9a1d51a'
2. Store Credentials as Pipeline Variables
Secret Variables: ALLTRUE_API_KEY
Regular Variables: ALLTRUE_API_URL, ALLTRUE_CUSTOMER_ID, ALLTRUE_ORGANIZATION_NAME
3. Adjust Timeouts for Multiple Attempts
inputs:
pentestNumAttempts: "2"
pollTimeoutSecs: "10800" # 3 hours for 2x attempts
4. Resource Management
Balance performance, API backend spikes, and completion time:
- task: AllTrueScanner@1
inputs:
# High-throughput configuration for large inventories
maxConcurrentPentests: 24
startStaggerSecs: 3 # Prevent API backend spike
# Adjust timeouts based on num_attempts_on_testcase
pentestNumAttempts: 2
pollTimeoutSecs: 10800 # 3 hours (2x baseline for 2 attempts)
pollTimeoutAction: 'partial' # Retrieve partial results on timeout
Performance Tip: Start with 8-10 concurrent tests and increase gradually while monitoring for pentest/scan start errors. Use startStaggerSecs to space out requests.
5. Test HuggingFace Models Before Production
# Pre-production validation workflow
huggingfaceOnboardingEnabled: true
huggingfaceModelsToOnboard: 'your-org/new-model'
huggingfaceOnboardingProjectName: 'ML Engineering'
huggingfaceOnboardingOnly: true # Test only this model
failOutcomeAtOrAbove: 'moderate'
6. Model Selection Strategy
Use model mapping when:
- ✅ You need consistent model versions across tests
- ✅ Testing specific model capabilities or vulnerabilities
- ✅ Comparing different models' security characteristics
- ✅ Production uses specific model versions
Example pattern for multi-provider environments:
pentestModelMapping: |
OpenAIEndpoint:gpt-4-turbo-preview,
AnthropicEndpoint:claude-3-5-sonnet-latest,
BedrockEndpoint:anthropic.claude-v2,
GoogleAIEndpoint:gemini-1.5-pro,
IBMWatsonxEndpoint:ibm/granite-13b-chat-v2
7. System Prompt Best Practices
When to configure system prompts:
- ✅ Testing production configurations
- ✅ Validating system prompt effectiveness
- ✅ Comparing different safety approaches
- ✅ Compliance testing with specific instructions
System prompt guidelines:
- Keep prompts clear and specific
- Include explicit safety rules
- Test both with and without system prompts to understand their impact
- Enable cleanup to avoid affecting other tests if your system prompt is not intended to be permanent
- Use multi-line format for readability
Example structure:
pentestSystemPromptText: |
You are a [role]. You must:
1) [Primary safety rule]
2) [Secondary safety rule]
3) [Behavior guideline]
4) [Escalation/refusal instruction]
8. Guardrails Configuration
Enable guardrails when:
- ✅ Testing production endpoints with active safety measures
- ✅ Validating that guardrails work as expected
Disable guardrails when:
- ✅ Performing baseline security testing
- ✅ Finding underlying model vulnerabilities
- ✅ Comparing raw model behavior vs. protected behavior
Pattern for comparative testing:
# Job 1: Test without guardrails (baseline)
pentestApplyGuardrails: false
# Job 2: Test with guardrails (production config)
pentestApplyGuardrails: true
9. Disable detailed issues when you only want a threshold "gate"
inputs:
onThresholdAction: "both"
categoryIssueMinSeverity: "none"
Troubleshooting
Common Issues
Core credentials set? → alltrueApiKey, alltrueApiUrl, alltrueCustomerId
Scope makes sense?
- organization → set organizationId or organizationName.
- project → set projectIds or projectNames.
- resource → set targetResourceIds or targetResourceNames and one of organizationId/name or projectIds/names.
Pentest enabled? → Set pentestTemplate
Model scan enabled? → Set modelScanPolicies
Work items enabled? → Set onThresholdAction: work_item or both AND enable OAuth token
Issue: "No resources selected"
- Check your
inventoryScope configuration
- Verify Organization Name/ID, Project Names/IDs are set correctly
- Ensure resources exist in AllTrue inventory
Issue: "Could not resolve organization name"
- Verify the name matches exactly in AllTrue UI (case-insensitive but must be exact)
- Try using
organizationId as a fallback
Issue: "Could not resolve project name"
- Verify the name matches exactly in AllTrue UI
- Ensure project exists and is active
- Check if the project is in the expected organization
- Try using
projectIds as a fallback
Issue: "Could not resolve HuggingFace onboarding project name"
- Verify the name matches exactly in AllTrue UI
- Ensure the project exists and is active
Issue: "Permission denied accessing organization lookup endpoint"
- Your API key may not have access to
/v1/admin/customers/{customer_id}/organizations/projects
- Contact your AllTrue Customer Success Engineer to grant appropriate permissions
- As a workaround, use UUIDs (
organizationId, projectIds) instead of names
Issue: "Missing Organization Identifier" or "Missing Project Identifier"
- You're using resource scope without proper access control context
- Set either
organizationName/id OR projectNames/ids
- This is a security requirement to prevent unintended customer-wide scanning
Issue: "Pentest template not found"
- Verify template name matches exactly (case-sensitive)
- Check available templates in AllTrue UI
- Template Management: Configure pentest templates in the AllTrue platform UI
Issue: "Start failures - permission denied"
- Verify API key permissions (see Security & Permissions)
- Check AllTrue license status
Issue: "Mapped model not available on endpoint"
- The model specified in
pentestModelMapping isn't available for that specific resource
- Check the logs for available models
- Verify the model name matches exactly (case-sensitive)
- The system will fall back to the endpoint's default model
Issue: "Failed to configure system prompt"
- Verify API key has PATCH access to
/v1/inventory/customer/resource/{resource_instance_id}/llm-endpoint-resource-additional-config
- Check that
resource_instance_id is valid
- System will continue with existing configuration if PATCH fails (non-blocking)
Issue: "Timeout during polling"
- Increase
pollTimeoutSecs
- Consider reducing
maxConcurrentPentests to avoid backend spikes
"Configured work item type 'Bug' not available... Falling back to 'Issue'"
- Expected when the project process doesn’t include
Bug.
- Set
adoWorkItemType=Issue to avoid the warning.
"Azure Boards work item creation skipped"
- Boards creation requires all three:
ADO_ORG_URL (or System.CollectionUri)
ADO_PROJECT (or System.TeamProject)
ADO_TOKEN (or System.AccessToken)
- If you expect System.AccessToken to work:
- Ensure
Allow scripts to access OAuth token is enabled.
- Check
onThresholdAction includes work_item or both
Dedupe not preventing duplicates
- Confirm the existing work items are not in terminal states listed in
adoDedupeExcludeStates (default: Closed)
- Confirm
adoDedupeEnabled: true
"System.AccessToken is empty"
"Free memory is lower than 5%" warnings
This warning appears when the agent is under memory pressure. Common causes:
- Too many concurrent scans: Reduce
maxConcurrentPentests
maxConcurrentPentests: "3" # Instead of 8+
Self-hosted agent undersized: Increase agent VM memory
- Recommended: 8GB+ RAM for typical workloads
- Large inventories (100+ resources): 16GB+ RAM
Other jobs running: Ensure agent has dedicated capacity
Impact: Usually none - scans complete successfully despite the warning. Monitor for actual failures or timeouts.
Issue: "No hosted parallelism has been purchased or granted"
HuggingFace onboarding failed
- Verify project context is provided (
huggingfaceOnboardingProjectName, huggingfaceOnboardingProjectId, projectNames, or projectIds)
- Check model exists on HuggingFace Hub
- Verify format:
org/repo or org/repo@revision
Name Resolution Debugging
If names aren't resolving:
Check console output for resolution messages:
[org-resolve] Resolved organization name 'ACME' -> 364fe49b-...
[proj-resolve] Resolved project name 'Production' -> 5c221ef3-...
Verify names in AllTrue UI:
- Log into AllTrue platform
- Copy the exact organization name
- Navigate to Projects and copy exact project names
Check for typos (matching is case-insensitive but must be exact)
Use fallback UUIDs temporarily:
# If name resolution fails, use UUID as fallback
organizationId: '364fe49b-6ea1-4a53-83db-f8311a9c8412'
Verify API permissions for /v1/admin/customers/{customer_id}/organizations/projects
Model Mapping Debugging
If model mapping isn't working as expected:
Check console output for model selection messages:
[i] Model mapping found for OpenAIEndpoint: gpt-4
[OK] Using mapped model: gpt-4
OR
[!] Mapped model 'gpt-4' not available
[i] Available: gpt-3.5-turbo, gpt-4-turbo-preview, ...
[i] Using endpoint default
Verify model names are exact matches (case-sensitive)
Check available models in AllTrue UI for each resource
Test without mapping first to see default behavior
System Prompt Debugging
If system prompt configuration isn't working:
Check console output for configuration messages:
[i] Configuring system prompt on resource...
[OK] System prompt configured successfully
OR
[!] Warning: Failed to configure system prompt: <error>
Verify the resource type supports system prompts (LLM endpoints only)
Check API permissions for GET and PATCH on the additional-config endpoint
Test with simple system prompt first before using complex multi-line prompts
Verify cleanup is working:
[i] System prompt cleaned up from resource
Debug Checklist
- ✅ Check workflow logs for detailed error messages
- ✅ Verify all required secrets/variables are set
- ✅ Confirm API key has necessary permissions
- ✅ Test with simpler configuration first
- ✅ Review AllTrue UI for resource visibility
- ✅ Check for typos in names (case-insensitive but exact)
- ✅ Use correct syntax for your platform (PowerShell vs bash)
Support
For assistance with:
- Configuration: Refer to examples and troubleshooting guide above
- API Access: Contact your AllTrue Customer Success Engineer
- Technical Issues: Use the Q&A section on this marketplace page
- Feature Requests: Submit through the Q&A section
- AllTrue Platform
📝 License
Copyright © 2025 AllTrue.ai Canada Inc. All rights reserved.