DataGuard

AI-Assisted Dataset Analysis & Cleaning Inside Visual Studio Code
Inspect • Visualize • Clean • Generate Insights
Stay inside your editor. Explore datasets faster.
📘 Overview
DataGuard brings dataset analysis directly into Visual Studio Code.
Open a supported dataset and instantly access an interactive workspace for inspection, visualization, cleaning, and optional AI-assisted insights — without leaving your editor.
All core processing runs locally.
AI capabilities remain optional and disabled until configured.
✨ Features
Dataset Analysis
- Automatic dataset profiling
- Dataset overview and metadata
- Row and column inspection
- Column type detection
- Missing value analysis
- Duplicate discovery
- Statistical summaries
Interactive Visualizations
- Dataset composition
- Numerical distributions
- Missing value breakdown
- Column exploration
- Interactive charts
Data Cleaning
Perform cleaning directly inside VS Code:
- Remove duplicates
- Fill missing values
- Convert data types
- Column operations
- Safe save workflow
Changes remain local.
Optional AI Insights
Configure an AI provider to generate:
- Dataset summaries
- Cleaning suggestions
- Pattern discovery
- High-level observations
Compatible with configurable AI providers:
- OpenAI
- Anthropic
- Google Gemini
- Groq
- Cohere
AI providers are optional and require user configuration.
| Format |
Supported |
| CSV |
✅ |
| TSV |
✅ |
| JSON |
✅ |
📸 Screenshots
Profile Dataset
Explore Dashboard
Clean Data
AI Insights (Optional)
🎥 Demonstration
Watch the demo on YouTube:
🗂 Dataset Attribution
Screenshots, demonstrations, and promotional materials shown in this repository may include examples generated using the googleplaystore.csv dataset.
Dataset source:
- L. Gupta, "Google Play Store Apps," Feb 2019. [Online]. Available: Kaggle
Usage purpose:
- Product demonstration
- Dashboard showcase
- Visualization examples
- Documentation screenshots
DataGuard is not affiliated with, endorsed by, or associated with the dataset maintainers.
The extension itself is dataset-agnostic and supports analysis of user-provided datasets in supported formats.
🚀 Quick Start
- Install DataGuard
- Open a supported dataset
- DataGuard activates automatically
- Explore visualizations and statistics
- Apply cleaning operations
- Save changes
⚙️ Requirements
| Requirement |
Version |
| VS Code |
Latest Stable |
| Python |
3.10+ |
DataGuard automatically detects required Python dependencies.
If manual installation is needed:
pip install pandas numpy
If Python path detection does not work:
Open Command Palette → DataGuard: Set Python Path
🔄 Workflow
Dataset
↓
Profile
↓
Visualize
↓
Clean
↓
Save
🔒 Privacy
DataGuard processes datasets locally.
AI features require explicit configuration.
No data leaves your machine unless an AI provider is enabled.
Performance depends on:
- Dataset size
- Available memory
- Python environment
Designed to support analysis across datasets of varying sizes.
🔗 Resources