Evaluator
Agentic development flow inside GitHub Copilot Chat — auto-scans your project, generates PRDs, and orchestrates implementation with quality gates.
Features
Evaluator lives inside Copilot Chat as @evaluator and guides you through a structured development pipeline:
- Project Scanning — Automatically detects your stack (Java/Maven, Java/Gradle, Python, React, Angular, TypeScript/Node) and runs preflight checks.
- PRD Generation — Generates a Product Requirements Document from your prompt, leveraging project context.
- Spec & Task Generation — Breaks the approved PRD into technical specifications and granular implementation tasks.
- Autonomous Implementation — Implements tasks with a validation loop (lint, tests, code review) and a scoring system that ensures quality before opening a PR.
- Definition of Done Validation — Validates the repository against a configurable Definition of Done checklist.
Requirements
- VS Code 1.99+
- GitHub Copilot Chat active subscription
Usage
Open Copilot Chat (Ctrl+L / Cmd+L) and type:
| Command |
Description |
@evaluator /start <prompt> |
Start the pipeline — scans project, runs preflight, generates PRD |
@evaluator /status |
Show current orchestrator state and progress |
@evaluator /approve |
Approve the PRD and generate SPECs + TASKs |
@evaluator /implement |
Approve SPECs/TASKs and start autonomous implementation |
@evaluator /reject <feedback> |
Reject with feedback — regenerates the current artifact |
@evaluator /dod |
Validate repository against the Definition of Done |
Typical Workflow
@evaluator /start Build a user settings page with dark mode toggle
→ reviews PRD
@evaluator /approve
→ reviews SPECs and TASKs
@evaluator /implement
→ autonomous implementation with quality gates
→ PR opened automatically when score >= 80%
Scoring System
Evaluator scores implementations across multiple pillars:
Backend — Test coverage (25%), Contract adherence (25%), Security/best practices (25%), Performance (25%)
Frontend — Figma fidelity (25%), Accessibility (20%), E2E tests (25%), Responsiveness/UX (15%), Performance (15%)
Threshold: score >= 80% opens a PR automatically. Below that, Evaluator enters a self-correction loop (up to 3 attempts).
Supported Stacks
- Java (Maven & Gradle)
- Python
- React
- Angular
- TypeScript / Node.js
License
MIT