Skip to content
| Marketplace
Sign in
Visual Studio Code>Formatters>KF SQL Formatter (Databricks/Spark)New to Visual Studio Code? Get it now.
KF SQL Formatter (Databricks/Spark)

KF SQL Formatter (Databricks/Spark)

Kyle Fring

|
173 installs
| (0) | Free
SQL formatter for Databricks/Spark SQL. Mostly AI Slop.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

KF SQL Formatter (Databricks/Spark)

Version License

A custom token-based SQL formatter for VS Code that formats Databricks/Spark SQL with 100% idempotent output.

✨ Features

  • 🎯 100% Idempotent - Format twice = same result (381/381 tests passing)
  • 📏 Width-Aware Alignment - Automatic 80/120 character width preference per statement
  • 📐 Content-Aware Separators - Comment separators snap to 80 or 120 dashes
  • 📍 Leading Commas - SELECT list items formatted with leading commas (,col)
  • 🔤 Smart Keyword Casing - Configurable keyword case (lower, upper, or preserve)
  • 🔗 Multi-word JOIN Handling - Keeps INNER JOIN, LEFT JOIN, etc. together
  • 📦 CTE Indentation - Perfect context tracking for WHERE/GROUP BY in CTEs
  • ✨ Trailing Whitespace Removal - Clean, consistent output
  • ⚡ Spark SQL Support - Arrays, maps, structs, named_struct, explode, lateral view
  • 🧪 SQL Test Generator - Generate test queries to debug WHERE/JOIN filtering logic

📦 Installation

Option 1: VS Code Marketplace (Recommended)

Search for "KF SQL Formatter" in VS Code Extensions or install from: VS Code Marketplace - KF SQL Formatter

Option 2: Direct Download

⬇️ Download Latest Release (v0.5.1)

After downloading:

  1. Open VS Code
  2. Press Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (Mac)
  3. Type "Extensions: Install from VSIX..."
  4. Select the downloaded .vsix file

Option 3: Build from Source

git clone https://github.com/k-f-/kf.sql.formatter.git
cd kf.sql.formatter
npm install
npm run compile
npm run package

🚀 Usage

  1. Open any .sql, .hql, or .spark.sql file
  2. Run Format Document (Shift+Alt+F on Windows/Linux, Shift+Option+F on Mac)
  3. Your SQL will be formatted with consistent style!

🧪 SQL Test Generator

Generate test queries to debug WHERE and JOIN filtering logic. Useful for identifying why data disappears between pipeline layers.

How to Use

  1. Open a SQL file with a SELECT query
  2. Use one of these methods:
    • Keyboard: Ctrl+Shift+T (Windows/Linux) or Cmd+Shift+T (Mac)
    • Right-click: Select KF.SQL: Generate SQL Tests from context menu
    • Command Palette: Ctrl+Shift+P → KF.SQL: Generate SQL Tests
  3. A new tab opens with your test query

What It Does

  • Extracts WHERE conditions → Creates TEST columns showing PASS/FAIL for each filter
  • Extracts CTE conditions → Creates CTE columns (CTE1, CTE2) for conditions inside WITH clauses
  • Extracts JOIN conditions → Creates JOIN columns showing PASS/FAIL/N/A for each join
  • Converts INNER JOIN → LEFT JOIN → Preserves unmatched rows for debugging
  • Comments out WHERE clause → Returns ALL rows so you can see what would be filtered
  • Adds dual-comment pattern → --[TEST1] markers link results to original conditions

Example

Input:

SELECT * FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active' AND o.amount > 100

Output (automatically formatted using your workspace settings):

select
     *
     -- [Test Indicators] ----------
    ,CASE
        WHEN (u.status = 'active' and o.amount > 100) THEN 'Y'
        ELSE 'N'
    END               as TEST1                                                 --[TEST1]
    ,CASE
        WHEN u.status = 'active' THEN 'Y'
        ELSE 'N'
    END               as TEST1_1                                               --[TEST1.1]
    ,CASE
        WHEN o.amount > 100 THEN 'Y'
        ELSE 'N'
    END               as TEST1_2                                               --[TEST1.2]
    ,CASE
        WHEN o.user_id is null THEN 'N/A'
        WHEN u.id = o.user_id THEN 'Y'
        ELSE 'N'
    END               as JOIN1                                                 --[JOIN1]
from users            as u
left join orders      as o
    on u.id = o.user_id                                                        --[JOIN1]
-- WHERE clause commented for testing:
-- WHERE u.status = 'active' --[TEST1.1] AND o.amount > 100 --[TEST1.2]

Key features shown:

  • TEST1: Combined parent test (both conditions AND'd)
  • TEST1_1, TEST1_2: Individual condition tests
  • JOIN1: JOIN condition with N/A for unmatched rows
  • Dual-comment pattern: --[TEST1] markers link results to original conditions
  • Auto-formatted: Output uses your workspace formatter settings

Test Generator Settings

Setting Default Description
testGenerator.convertInnerJoins true Convert INNER → LEFT JOIN for non-destructive testing
testGenerator.addJoinNullHandling true Add N/A for unmatched JOIN rows
testGenerator.showMetadata true Show notification with test counts

Supported Features

Feature Test ID Pattern Example
WHERE conditions TEST1, TEST2, etc. WHERE status = 'active' → TEST1
JOIN ON conditions JOIN1, JOIN2, etc. ON u.id = o.user_id → JOIN1
CTE WHERE conditions CTE1, CTE2, etc. WITH cte AS (... WHERE x > 10) → CTE1
HAVING conditions HAVING1, HAVING2, etc. HAVING COUNT(*) > 5 → HAVING1
Subquery WHERE SUB1, SUB2, etc. FROM (SELECT ... WHERE x > 10) subq → SUB1
AND decomposition Parent + children WHERE a AND b → TEST1, TEST1_1, TEST1_2
BETWEEN decomposition Parent + >= / <= BETWEEN 100 AND 500 → TEST1, TEST1_1 (>=), TEST1_2 (<=)
IN list decomposition Parent + each value IN ('a', 'b') → TEST1, TEST1_1, TEST1_2
CTE internal JOINs CTE_JOIN1, CTE_JOIN2 WITH cte AS (... INNER JOIN ...) → CTE_JOIN1
Subquery JOINs SUB_JOIN1 FROM (SELECT ... INNER JOIN ...) subq → SUB_JOIN1
Complex ON clauses JOIN1 ON LOWER(a.email) = LOWER(b.email) → JOIN1
Multi-statement SQL S1_TEST1, S2_TEST1 Multiple SELECTs → prefixed test IDs

Note: Output is automatically formatted using your workspace formatter settings (keywordCase, indent, leadingCommas, etc.).

⚙️ Configuration

Configure the formatter in VS Code settings (settings.json):

{
  "databricksSqlFormatter.keywordCase": "lower",
  "databricksSqlFormatter.functionCase": "lower",
  "databricksSqlFormatter.indent": 4,
  "databricksSqlFormatter.leadingCommas": true,
  "databricksSqlFormatter.dialect": "spark",
  "databricksSqlFormatter.aliasAlignmentScope": "file",
  "databricksSqlFormatter.aliasMinGap": 8,
  "databricksSqlFormatter.aliasMaxColumnCap": 120,
  "databricksSqlFormatter.trimTrailingWhitespace": true
}

Quick Settings Access

  • Command Palette: Cmd+Shift+P → "KF.SQL: Open Settings"
  • Settings UI: Cmd+, → Search "databricks"

Available Settings

Core Formatting

Setting Type Default Description
keywordCase "lower" | "upper" | "preserve" "lower" SQL keyword letter case
functionCase "lower" | "upper" | "preserve" "lower" Function name letter case
indent number 4 Spaces per indent level (2-8)
leadingCommas boolean true Use leading commas in SELECT lists
dialect "spark" | "hive" | "ansi" "spark" SQL dialect (for future use)
trimTrailingWhitespace boolean true Remove trailing whitespace from lines

Alias Alignment

Setting Type Default Description
aliasAlignmentScope "file" | "statement" | "none" "statement" Scope for alias and comment alignment. "statement" aligns within each SQL statement, "file" aligns across entire file
aliasMinGap number 8 Minimum spaces between expression and AS (1-20)
aliasMaxColumnCap number 120 Maximum column for alias alignment (40-200)
aliasColumnCapMode "fixed" | "adaptive" "adaptive" Column cap behavior. "adaptive" uses 80 or 120 based on expression length, "fixed" always uses aliasMaxColumnCap
aliasAdaptiveThreshold number 80 When aliasColumnCapMode: "adaptive", expressions >= this length use 120 column cap, shorter use 80
addExplicitAs boolean true Add explicit AS keyword to all aliases

Auto-Generate Aliases

Setting Type Default Description
autoGenerateAliases boolean true Master switch for auto-generating aliases (must be true to enable sub-options)
autoGenerateTableAliases boolean true Auto-generate table aliases from table names (e.g., customer_orders → co)
autoGenerateSelectAliases boolean false Auto-generate SELECT expression aliases for complex expressions

📐 Formatting Examples

Before Formatting

WITH active_users AS (SELECT user_id,user_name,account_type FROM users WHERE status='active' AND created_date>='2024-01-01'),order_stats AS(SELECT user_id,COUNT(*) as order_count,SUM(amount) as total_spent,AVG(amount) as avg_order FROM orders WHERE order_date>='2024-01-01' GROUP BY user_id)SELECT u.user_id,u.user_name as customer_name,u.account_type,CASE WHEN o.order_count>10 THEN 'high' WHEN o.order_count>5 THEN 'medium' ELSE 'low' END as engagement_level,o.total_spent,o.avg_order FROM active_users u INNER JOIN order_stats o ON u.user_id=o.user_id WHERE o.total_spent>100 ORDER BY o.total_spent DESC

After Formatting

Configuration Used: This example uses aliasAlignmentScope: 'statement' with all formatting features enabled:

{
  "keywordCase": "lower",
  "functionCase": "lower",
  "indent": 4,
  "leadingCommas": true,
  "aliasAlignmentScope": "statement",
  "aliasMinGap": 8,
  "aliasMaxColumnCap": 120,
  "addExplicitAs": true,
  "trimTrailingWhitespace": true
}
with active_users as (
    select
         user_id
        ,user_name
        ,account_type
    from users
    where status = 'active'
        and created_date >= '2024-01-01'
)
,order_stats as (
    select
         user_id
        ,count(*)           as order_count
        ,sum(amount)        as total_spent
        ,avg(amount)        as avg_order
    from orders
    where order_date >= '2024-01-01'
    group by user_id
)
select
     u.user_id
    ,u.user_name            as customer_name
    ,u.account_type
    ,case
        when o.order_count > 10 then 'high'
        when o.order_count > 5 then 'medium'
        else 'low'
    end                     as engagement_level
    ,o.total_spent
    ,o.avg_order
from active_users           as u
inner join order_stats      as o
    on u.user_id = o.user_id
where o.total_spent > 100
order by o.total_spent desc

Key Features Demonstrated:

  • 📍 Leading commas for SELECT columns (leadingCommas: true)
  • 📏 Smart alias alignment - all AS keywords align within statement scope (aliasAlignmentScope: 'statement', aliasMinGap: 8)
  • 🔤 Consistent keyword casing (keywordCase: 'lower', functionCase: 'lower')
  • 📦 Perfect CTE indentation with proper WHERE clause alignment (indent: 4)
  • 🔗 Multi-word keywords kept together (INNER JOIN, ORDER BY)
  • ⚡ Clean, readable structure with explicit AS keywords (addExplicitAs: true)

Understanding aliasAlignmentScope

The aliasAlignmentScope option controls how AS keywords align across your SQL file. Here's a visual comparison:

With scope: 'statement' (each statement aligns independently):

select
     customer_id
    ,very_long_customer_name        as name
    ,account_balance                as balance
from customers;
select
     id          as user_id
    ,name        as user_name
from users

Notice how:

  • Statement 1: AS keywords align at column 36 (due to long column name)
  • Statement 2: AS keywords align at column 17 (shorter expressions)
  • Each statement uses optimal spacing for its own expressions

With scope: 'file' (all statements align together):

select
     customer_id
    ,very_long_customer_name        as name
    ,account_balance                as balance
from customers;
select
     id                             as user_id
    ,name                           as user_name
from users

Notice how:

  • All AS keywords align at column 36 across both statements
  • Statement 2 has extra spacing to match statement 1's longest expression
  • Provides consistent alignment across the entire file

Recommendation: Use 'statement' for optimal readability in files with multiple queries, or 'file' when you want uniform alignment across all statements.

🏗️ Architecture

The formatter uses a token-based architecture with three stages:

  1. Tokenize - SQL → Tokens (preserves 100% of original text)
  2. Transform - Apply transformations:
    • Keyword case normalization (configurable: lower, upper, or preserve)
    • Line breaks insertion
    • Leading commas conversion
    • Context-aware indentation (CTEs, subqueries, CASE statements)
    • Explicit AS keyword insertion
    • Unified alias alignment (column, table, and subquery aliases align together)
    • Comment separator adjustment (80/120 dashes)
    • Trailing whitespace removal
  3. Generate - Tokens → Formatted SQL

Split-Merge Pipeline (Experimental)

A new split-merge pipeline is available behind the useSplitMerge feature flag (default: false). This pipeline improves handling of complex multi-statement files by processing each statement independently before merging them back together.

📊 Test Results

381/381 tests passing ✅

  • Unit Tests: Tokenizer, Generator
  • Pipeline Tests: normalize, split, indent, merge, align, cosmetics, alias-generator, validation
  • Options Tests: 70+ tests for all configuration options
  • Golden Files: Idempotence tests + Similarity metrics (informational)
  • Test Cases: Idempotent formatting verification
  • Edge Cases: Complex SQL patterns
  • Test Generator: AST condition extraction, comment injection, join conversion, named_struct, detail level
  • Integration: End-to-end formatter tests

📁 Project Structure

kf.sql.formatter/
├── package.json                   # Extension manifest
├── tsconfig.json                  # TypeScript config
├── esbuild.js                     # Production bundler
├── vitest.config.ts               # Test configuration
├── src/                           # Source code
│   ├── extension.ts               # VS Code extension entry point
│   ├── formatter-custom.ts        # Main formatter orchestration
│   ├── tokenizer/                 # SQL tokenizer
│   │   ├── tokenize.ts
│   │   └── token-types.ts
│   ├── transformers/              # Token transformers (15 files)
│   │   ├── keywords.ts
│   │   ├── function-case.ts
│   │   ├── operator-spacing.ts
│   │   ├── line-breaks.ts
│   │   ├── leading-commas.ts
│   │   ├── cleanup-whitespace.ts
│   │   ├── indentation.ts
│   │   ├── in-list-indentation.ts
│   │   ├── auto-generate-aliases.ts
│   │   ├── add-explicit-as.ts
│   │   ├── alias-alignment.ts
│   │   ├── comment-alignment.ts
│   │   ├── comment-separators.ts
│   │   ├── trim-trailing-whitespace.ts
│   │   └── symbol-table-builder.ts
│   ├── pipeline/                  # Split-merge pipeline (experimental)
│   │   ├── normalize.ts
│   │   ├── split.ts
│   │   ├── indent.ts
│   │   ├── merge.ts
│   │   ├── align.ts
│   │   └── cosmetics.ts
│   ├── test-generator/            # SQL test query generator
│   │   ├── index.ts
│   │   ├── types.ts
│   │   ├── ast-condition-extractor.ts
│   │   ├── comment-injector.ts
│   │   ├── join-converter.ts
│   │   └── test-case-generator.ts
│   ├── utils/                     # Utilities
│   │   └── alias-generator.ts
│   └── generator/                 # SQL generator
│       └── generate-sql.ts
├── dist/                          # Compiled bundle (gitignored)
│   └── extension.js               # esbuild bundle (only VSIX file)
├── tests/                         # Test suite (381 tests)
│   ├── unit/                      # Tokenizer, generator tests
│   ├── pipeline/                  # Pipeline stage tests
│   ├── options/                   # 70+ configuration option tests
│   ├── golden/                    # Idempotence + similarity tests
│   ├── test-generator/            # Test generator tests (149 tests)
│   ├── integration/               # End-to-end formatter tests
│   └── framework/                 # Test framework utilities
└── examples/                      # SQL examples
    ├── test-cases/                # Test case files (8 files)
    ├── golden-references/         # Reference SQL (12 files)
    └── edge-cases/               # Complex SQL patterns (7 files)

🔗 Recommended Companions

For a complete SQL development workflow, we recommend using this formatter alongside:

  • SQLFluff - The dialect-flexible SQL linter for modern data stacks.

While this extension handles the visual formatting and layout of your code, SQLFluff excels at linting and enforcing style rules. Using them together ensures your SQL is both beautiful and compliant with organizational standards.

🤝 Contributing

Contributions are welcome! Please see the guidelines below:

Development Workflow

Before Pushing

The repository includes a pre-push hook that automatically validates:

  • ✅ TypeScript compilation (npm run compile)
  • ✅ Test suite (381 tests via vitest)
  • ✅ Markdown linting (npm run lint)

If validation fails, you'll see helpful error messages with fix instructions.

The hook runs automatically before every git push and takes ~5-10 seconds.

Manual Validation

To check your changes before pushing:

npm run verify

This runs the same checks as the pre-push hook.

Fixing Markdown Linting Errors

Most markdown errors can be auto-fixed:

npm run lint:fix

Some errors (like MD040 - missing code fence languages) require manual fixes.

Bypassing Pre-Push Hook

⚠️ Use only in emergencies:

git push --no-verify

Warning: This will likely cause CI failures on GitHub Actions.

📝 License

MIT License - see LICENSE file for details

🔗 Links

  • VS Code Marketplace
  • GitHub Repository
  • Issue Tracker
  • Release Notes

📈 Version History

For full version history, see CHANGELOG.md.


Made with ❤️ for the Databricks community

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft