Skip to content
| Marketplace
Sign in
Azure DevOps>Azure Pipelines>AI Agent Evaluation
AI Agent Evaluation

AI Agent Evaluation

Microsoft

microsoft.com
|
12 installs
| (0) | Free
Get it free

AI Agent Evaluation

This extension enables offline evaluation of Azure AI Agents within your Azure DevOps pipelines. It streamlines the evaluation process, allowing you to identify potential issues and make improvements before releasing updates to production.

To use this extension, provide a data set with test queries and a list of evaluators. It will invoke your agent(s) with the queries, collect performance data including latency and token counts, run the evaluations, and generate a summary report.

Features

  • Automated Agent Evaluation: Integrate offline evaluation into your CI/CD workflows to automate the pre-production assessment of Azure AI Agents.
  • Built-in Evaluators: Leverage existing evaluators provided by the Azure AI Evaluation SDK
  • Seamless Integration: Easily integrate with existing GitHub workflows to run evaluation based on rules that you specify in your workflows (e.g., when changes are committed to agent versions, prompt templates, or feature flag configuration).
  • Statistical Analysis: Evaluation results include confidence intervals and test for statistical significance to determine if changes are meaningful and not due to random variation.
  • Out-of-box operation metrics: Automatically generates operational metrics for each Evaluation run (Client run duration, server run duration, completion tokens and prompt tokens).

Supported AI Evaluators

Type Evaluator
AI Quality (AI assisted) IntentResolutionEvaluator
TaskAdherenceEvaluator
RelevanceEvaluator
CoherenceEvaluator
FluencyEvaluator
Risk and safety ViolenceEvaluator
SexualEvaluator
SelfHarmEvaluator
HateUnfairnessEvaluator
IndirectAttackEvaluator
ProtectedMaterialEvaluator
CodeVulnerabilityEvaluator
Composite ContentSafetyEvaluator

Getting Started

Task Parameters

Name Required Description
azure-ai-project-endpoint Yes Endpoint of your Azure AI Project
deployment-name Yes The name of the Azure AI model deployment to use for evaluation
data-path Yes Path to the data file that contains the evaluators and input queries for evaluations
agent-ids Yes ID of the agent(s) to evaluate. If multiple are provided, all agents will be evaluated and compared against the baseline with statistical test results
baseline-agent-id No ID of the baseline agent to compare against when evaluating multiple agents. If not provided, the first agent is used
evaluation-result-view No Specifies the format of evaluation results. Defaults to "default" (boolean scores such as passing and defect rates) if omitted. Options are "default", "all-scores" (includes all evaluation scores), and "raw-scores-only" (non-boolean scores only)
api-version No The API version to use when connecting to model deployment

Data File Format

The input data file should be a JSON file with the following structure:

{
  "name": "test-data",
  "evaluators": ["IntentResolutionEvaluator", "FluencyEvaluator"],
  "data": [
    {
      "query": "Tell me about Smart eyeware"
    },
    {
      "query": "How do I rebase my branch in git?"
    }
  ]
}

Azure Pipeline Example

steps:
  - task: AIAgentEvaluation@2
    displayName: "Evaluate AI Agents"
    inputs:
      azure-ai-project-endpoint: "$(AzureAIProjectEndpoint)"
      deployment-name: "$(DeploymentName)"
      data-path: "$(System.DefaultWorkingDirectory)/path/to/your/dataset.json"
      agent-ids: "$(AgentIds)"

Evaluation Results

Evaluation results will appear in the Azure DevOps pipeline summary with detailed metrics and comparisons between agents when multiple are evaluated.

Sample evaluation results showing agent comparisons

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft