Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Data Quality Studio — GE & Soda CoreNew to Visual Studio Code? Get it now.
Data Quality Studio — GE & Soda Core

Data Quality Studio — GE & Soda Core

Ayoade Adegbite

| (0) | Free
Visual sidebar builder that generates Great Expectations and Soda Core data quality tests from your live database schema — no code required.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

DQ Test Builder — Great Expectations & Soda

Build production-ready data quality tests visually, no memorising SodaCL or Great Expectations syntax.

Connect to your live database, browse schemas and tables directly in the VS Code sidebar, pick checks from a full catalog filtered to each column's data type, and generate ready-to-run SodaCL YAML or Great Expectations Python in one click.


How It Works

1. Connect and browse your database

The extension auto-detects your credentials from ~/.dbt/profiles.yml or a workspace .env file on startup. Your schemas and tables appear immediately in the Data Quality sidebar, no configuration required if you already have dbt set up.

Sidebar tree and framework picker

Click any table to open the test builder panel.


2. Pick your framework — once per session

Choose Soda Core or Great Expectations the first time you open a table. The choice is remembered across VS Code and you will never be asked again unless you explicitly switch.

Framework picker

Checks are tailored to the framework you pick. There is no mixing: Soda checks produce SodaCL YAML, GE checks produce Python.


3. Add checks per column

Every column is shown with its data type badge. Click a column to expand it and use + Add check to open VS Code's native command palette, searchable, keyboard-navigable, and filtered to checks that apply to that column's type.

Check builder with checks added per column

Numeric columns get range checks (min, max, avg, sum, percentile). Text columns get length and regex checks. Timestamp columns get freshness and date-range checks. Booleans and categoricals get value-set checks. Every column gets null and uniqueness checks.


4. Generate — opens in a new editor tab

Click Generate Tests. The output opens in a new editor tab with the correct language mode already set (YAML for Soda, Python for GE). Save the file wherever your project expects it, there is no fixed output path.

Generated GE Python alongside the builder

Full generated GE Python file

Soda Core YAML with configuration hint


Features

  • Zero-config auto-connect: Reads ~/.dbt/profiles.yml (resolves {{ env_var('...') }} templates), workspace .env, or a custom path you set once in VS Code settings
  • Live schema browser: schemas → tables loaded directly from your database; refresh any time with the ↺ button
  • One-time framework choice: Pick Soda Core or Great Expectations once; the choice persists across sessions and across tables
  • Full check catalog: 18 Soda checks + 17 GE checks, each filtered to the column data types they apply to:
    • Presence & validity — null, uniqueness, value sets, allowed values
    • Numeric statistics — min, max, average, sum, std dev, percentile, unique proportion
    • String patterns — regex match, length bounds, date format strings
    • Timestamps & freshness — freshness window, date min/max, parseable dates
  • VS Code-native check picker: + Add check opens the command palette at the top of the editor, not a clipped inline dropdown
  • Custom checks: Write any check the catalog doesn't cover: give it a name and a condition expression; generated in the correct syntax for whichever framework you picked
  • Generated output uses your real connection details: host, port, database, user, and schema are filled in; only the password stays as an environment variable placeholder
  • Framework-specific configuration hints: Soda output includes a ready-to-fill configuration.yml block; GE output includes the correct pip install line and DATABASE_URL export command

Supported Databases

Database Auto-detect Manual
PostgreSQL ~/.dbt/profiles.yml · .env Connection string
Redshift ~/.dbt/profiles.yml Connection string
Snowflake ~/.dbt/profiles.yml Browse for profiles.yml
BigQuery ~/.dbt/profiles.yml Browse for service account JSON

Custom credentials path: Set dq-studio.credentialsPath in VS Code Settings to point at any profiles.yml, BigQuery service account JSON, or .env file. The extension uses that path instead of the default lookup. Supports ~ for the home directory.


Full Check Catalog

Soda Core (SodaCL)

Check Applies to What it generates
Missing (null) count All missing_count(col) = 0
Duplicate count All duplicate_count(col) = 0
Missing % threshold All missing_percent(col) < N
Duplicate % threshold All duplicate_percent(col) < N
Values in set Text, Boolean invalid_count(col) = 0: valid values: [...]
Min value ≥ Numeric min(col) >= N
Max value ≤ Numeric max(col) <= N
Average between Numeric avg(col) between A and B
Sum between Numeric sum(col) between A and B
Std dev between Numeric stddev(col) between A and B
Percentile value Numeric percentile(col, P) between A and B
Min string length Text min_length(col) >= N
Max string length Text max_length(col) <= N
Avg string length Text avg_length(col) between A and B
Freshness window Timestamp freshness(col) < Xh
Date minimum Timestamp min(col) >= 'YYYY-MM-DD'
Date maximum Timestamp max(col) <= 'YYYY-MM-DD'
Custom check All failed rows: name: ..., fail condition: ...

Great Expectations

Check Applies to What it generates
Not Null All expect_column_values_to_not_be_null
Unique All expect_column_values_to_be_unique
Mostly not null (%) All expect_column_values_to_not_be_null(mostly=N)
Values in set Text, Boolean expect_column_values_to_be_in_set
Values not in set Text, Boolean expect_column_values_to_not_be_in_set
Unique proportion All expect_column_proportion_of_unique_values_to_be_between
Min value between Numeric expect_column_min_to_be_between
Max value between Numeric expect_column_max_to_be_between
Mean between Numeric expect_column_mean_to_be_between
Sum between Numeric expect_column_sum_to_be_between
Std dev between Numeric expect_column_stdev_to_be_between
Quantile value Numeric expect_column_quantile_values_to_be_between
String length between Text expect_column_value_lengths_to_be_between
Regex match Text expect_column_values_to_match_regex
Date format (strftime) Text, Timestamp expect_column_values_to_match_strftime_format
Date parseable Timestamp expect_column_values_to_be_dateutil_parseable
Date range Timestamp expect_column_values_to_be_between
Custom check All expect_column_values_to_satisfy with meta={"name": ...}

Generated Output

Soda Core

# Soda Core — checks file
# Generated by DQ Test Builder
#
# 1. Install:  pip install soda-core-postgres
#
# 2. Save the following as configuration.yml:
#
#   data_sources:
#     raw_data:
#       type: postgres
#       connection:
#         host: prod.db.example.com
#         port: 5432
#         database: warehouse
#       username: analytics_user
#       password: ${DB_PASSWORD}
#       schema: raw_data
#
# 3. Run:
#      soda scan -d raw_data -c configuration.yml checks.yml

checks for raw_data.order_items:
  # order_item_id
  - duplicate_count(order_item_id) = 0
  - missing_count(order_item_id) = 0
  # amount_usd
  - min(amount_usd) >= 0
  - max(amount_usd) <= 99999
  - missing_count(amount_usd) = 0
  # status
  - invalid_count(status) = 0:
      valid values: ['pending', 'shipped', 'delivered', 'returned']
  # custom check
  - failed rows:
      name: Amount cannot be negative
      fail condition: amount_usd < 0

Great Expectations

# Great Expectations — raw_data.order_items
# Generated by DQ Test Builder
#
# pip install great_expectations sqlalchemy psycopg2-binary
#
# Usage:
#   export DATABASE_URL="postgresql+psycopg2://analytics_user:YOUR_PASSWORD@prod.db.example.com:5432/warehouse"
#   python <this_file>.py

import os
import great_expectations as gx

context = gx.get_context()

# ── Datasource ──────────────────────────────────────────────────────────────
# Replace YOUR_PASSWORD, or export DATABASE_URL to skip this line entirely
CONNECTION_STRING = os.environ.get("DATABASE_URL", "postgresql+psycopg2://analytics_user:YOUR_PASSWORD@prod.db.example.com:5432/warehouse")

datasource = context.sources.add_or_update_sql(name="raw_data_datasource", connection_string=CONNECTION_STRING)
asset = datasource.add_table_asset("order_items", table_name="order_items", schema_name="raw_data")
batch_request = asset.build_batch_request()

# ── Expectation suite ────────────────────────────────────────────────────────
suite_name = "order_items_suite"
suite = context.add_or_update_expectation_suite(expectation_suite_name=suite_name)
validator = context.get_validator(batch_request=batch_request, expectation_suite_name=suite_name)

# ── Checks ───────────────────────────────────────────────────────────────────
# order_item_id
validator.expect_column_values_to_be_unique("order_item_id")
validator.expect_column_values_to_not_be_null("order_item_id")

# amount_usd
validator.expect_column_min_to_be_between("amount_usd", min_value=0, max_value=0)
validator.expect_column_max_to_be_between("amount_usd", min_value=0, max_value=99999)

validator.save_expectation_suite(discard_failed_expectations=False)

# ── Run validation ────────────────────────────────────────────────────────────
checkpoint = context.add_or_update_checkpoint(name="order_items_checkpoint", validator=validator)
result = checkpoint.run()
print(result)

Connection Setup

Auto-detect (no configuration needed)

On startup the extension tries the following in order:

  1. The path set in dq-studio.credentialsPath (if configured)
  2. ~/.dbt/profiles.yml — reads your active target and resolves {{ env_var('VAR') }} templates against your shell environment
  3. A .env file in the open workspace root (looks for DB_HOST, DB_USER, DB_NAME and variants)

If a connection is found, the sidebar tree populates automatically.

Manual connection

Click the ⚙ icon in the Data Quality sidebar title bar. You can:

  • Use an auto-detected connection if one was found, confirm or browse for a different file
  • Browse for a credentials file: opens a file picker defaulting to ~/.dbt/
    • profiles.yml or .yaml — dbt profile format, all targets supported
    • .env — key=value format with standard DB_HOST / POSTGRES_* / DATABASE_* variable names
  • Paste a connection string: postgresql://user:password@host:5432/database
  • Select a BigQuery service account JSON: standard {"type": "service_account", "project_id": ...} format

Persistent credentials path

Set dq-studio.credentialsPath in VS Code Settings (or settings.json) to avoid browsing each time:

{
  "dq-studio.credentialsPath": "~/.dbt/profiles.yml"
}

Supports ~ for the home directory. Accepts profiles.yml, service account JSON, or .env files.


Requirements

  • VS Code 1.85 or later
  • A running PostgreSQL, Redshift, Snowflake, or BigQuery database accessible from your machine
  • Python with great_expectations or soda-core-* installed, only needed to run the generated tests, not to use the extension itself

Privacy

The extension connects only to the database you configure. No data, queries, or schema information is sent anywhere other than your own database server.


License

MIT — see LICENSE

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft