Evaluator

Agentic development flow inside GitHub Copilot Chat — auto-scans your project, generates PRDs, and orchestrates implementation with quality gates.

Features

Evaluator lives inside Copilot Chat as @evaluator and guides you through a structured development pipeline:

Project Scanning — Automatically detects your stack (Java/Maven, Java/Gradle, Python, React, Angular, TypeScript/Node) and runs preflight checks.
PRD Generation — Generates a Product Requirements Document from your prompt, leveraging project context.
Spec & Task Generation — Breaks the approved PRD into technical specifications and granular implementation tasks.
Autonomous Implementation — Implements tasks with a validation loop (lint, tests, code review) and a scoring system that ensures quality before opening a PR.
Definition of Done Validation — Validates the repository against a configurable Definition of Done checklist.

Requirements

VS Code 1.99+
GitHub Copilot Chat active subscription

Usage

Open Copilot Chat (Ctrl+L / Cmd+L) and type:

Command	Description
`@evaluator /start <prompt>`	Start the pipeline — scans project, runs preflight, generates PRD
`@evaluator /status`	Show current orchestrator state and progress
`@evaluator /approve`	Approve the PRD and generate SPECs + TASKs
`@evaluator /implement`	Approve SPECs/TASKs and start autonomous implementation
`@evaluator /reject <feedback>`	Reject with feedback — regenerates the current artifact
`@evaluator /dod`	Validate repository against the Definition of Done

Typical Workflow

@evaluator /start Build a user settings page with dark mode toggle
  → reviews PRD
@evaluator /approve
  → reviews SPECs and TASKs
@evaluator /implement
  → autonomous implementation with quality gates
  → PR opened automatically when score >= 80%

Scoring System

Evaluator scores implementations across multiple pillars:

Backend — Test coverage (25%), Contract adherence (25%), Security/best practices (25%), Performance (25%)

Frontend — Figma fidelity (25%), Accessibility (20%), E2E tests (25%), Responsiveness/UX (15%), Performance (15%)

Threshold: score >= 80% opens a PR automatically. Below that, Evaluator enters a self-correction loop (up to 3 attempts).

Supported Stacks

Java (Maven & Gradle)
Python
React
Angular
TypeScript / Node.js

License

MIT

Evaluator

Evaluator Harness Agent

Evaluator

Features

Requirements

Usage

Typical Workflow

Scoring System

Supported Stacks

License