Sentinel AI Safety - VS Code Extension
AI safety guardrails for LLM prompts using the THSP protocol (Truth, Harm, Scope, Purpose).

Features
Two Analysis Modes
| Mode |
Method |
Accuracy |
Requires |
| Semantic (recommended) |
LLM-based analysis |
High (~90%) |
API key (OpenAI or Anthropic) |
| Heuristic (fallback) |
Pattern matching |
Limited (~50%) |
Nothing |
For accurate results, configure an LLM API key. Heuristic mode uses pattern matching which has significant false positives/negatives.
Real-time Safety Linting
The extension automatically detects potentially unsafe patterns in your prompts:
- Jailbreak attempts: "ignore previous instructions", persona switches
- Harmful content: weapons, hacking, malware references
- Deception patterns: fake documents, impersonation
- Purposeless actions: requests lacking legitimate benefit
Commands
| Command |
Description |
Sentinel: Analyze Selection for Safety |
Analyze selected text using THSP protocol |
Sentinel: Analyze File for Safety |
Analyze entire file |
Sentinel: Insert Alignment Seed |
Insert standard seed (~1,400 tokens) |
Sentinel: Insert Minimal Alignment Seed |
Insert minimal seed (~450 tokens) |
Sentinel: Set OpenAI API Key (Secure) |
Store API key securely |
Sentinel: Set Anthropic API Key (Secure) |
Store API key securely |
Sentinel: Show Status |
Show current analysis mode and provider |
The THSP Protocol
Every request is evaluated through four gates:
| Gate |
Question |
| Truth |
Does this involve deception? |
| Harm |
Could this cause harm? |
| Scope |
Is this within boundaries? |
| Purpose |
Does this serve legitimate benefit? |
All four gates must pass for content to be considered safe.
Configuration
Recommended: Enable Semantic Analysis
For accurate analysis, configure an LLM API key using the secure method:
- Open Command Palette (
Ctrl+Shift+P or Cmd+Shift+P)
- Run
Sentinel: Set OpenAI API Key (Secure) or Sentinel: Set Anthropic API Key (Secure)
- Enter your API key (stored encrypted in VS Code's SecretStorage)
Alternatively, you can set keys in VS Code Settings (less secure - stored in plaintext).
Supported Providers
Currently supported:
- OpenAI (GPT-4o, GPT-4o-mini, etc.)
- Anthropic (Claude 3 Haiku, Sonnet, Opus)
Planned for future versions:
- Azure OpenAI (enterprise)
- Ollama (local/free)
- OpenAI-compatible endpoints (Groq, Together AI, etc.)
All Settings
| Setting |
Default |
Description |
sentinel.enableRealTimeLinting |
true |
Enable real-time safety linting |
sentinel.seedVariant |
standard |
Default seed variant (minimal/standard) |
sentinel.highlightUnsafePatterns |
true |
Highlight unsafe patterns |
sentinel.llmProvider |
openai |
LLM provider (openai/anthropic) |
sentinel.openaiApiKey |
"" |
OpenAI API key (use secure command instead) |
sentinel.openaiModel |
gpt-4o-mini |
OpenAI model to use |
sentinel.anthropicApiKey |
"" |
Anthropic API key (use secure command instead) |
sentinel.anthropicModel |
claude-3-haiku-20240307 |
Anthropic model to use |
sentinel.useSentinelApi |
false |
Use Sentinel API for analysis |
sentinel.apiEndpoint |
https://api.sentinelseed.dev/api/v1/guard |
Sentinel API endpoint |
Usage Examples
Checking Prompts for Safety Issues
- Select the text you want to analyze
- Right-click and choose "Sentinel: Analyze Selection for Safety"
- View the THSP gate results with confidence level
Understanding Analysis Results
The extension shows:
- Method: Semantic (LLM) or Heuristic (pattern matching)
- Confidence: How reliable the analysis is
- Gate results: Pass/fail for each THSP gate
- Issues: Specific concerns detected
- Reasoning: Explanation (semantic mode only)
Severity Levels
- 🔴 Error: High-risk patterns (weapons, safety bypass)
- 🟡 Warning: Potential issues (jailbreak attempts)
- 🔵 Information: Consider reviewing
- 💡 Hint: Suggestions (missing Sentinel seed)
Semantic vs Heuristic Analysis
Semantic Analysis (Recommended)
Uses an LLM to understand content contextually:
- ✅ Understands context ("hack my productivity" vs malicious hacking)
- ✅ Detects paraphrased harmful content
- ✅ Provides reasoning for decisions
- ✅ ~90% confidence
Heuristic Analysis (Fallback)
Uses pattern matching for basic detection:
- ⚠️ May flag legitimate content (false positives)
- ⚠️ May miss paraphrased threats (false negatives)
- ⚠️ No contextual understanding
- ⚠️ ~50% confidence
Supported Languages
- Markdown
- Plain text
- Python
- JavaScript/TypeScript
- JSON
- YAML
Links
License
MIT License - See LICENSE for details.
Made by Sentinel Team