Skip to content
| Marketplace
Sign in
Visual Studio Code>Formatters>KF SQL Formatter (Databricks/Spark)New to Visual Studio Code? Get it now.
KF SQL Formatter (Databricks/Spark)

KF SQL Formatter (Databricks/Spark)

Kyle Fring

|
58 installs
| (0) | Free
SQL formatter for Databricks/Spark SQL. Mostly AI Slop.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

KF SQL Formatter (Databricks/Spark)

Version License

A custom token-based SQL formatter for VS Code that formats Databricks/Spark SQL with 100% idempotent output.

✨ Features

  • 🎯 100% Idempotent - Format twice = same result (140/140 tests passing)
  • 📏 Width-Aware Alignment - Automatic 80/120 character width preference per statement
  • 📐 Content-Aware Separators - Comment separators snap to 80 or 120 dashes
  • 📍 Leading Commas - SELECT list items formatted with leading commas (,col)
  • 🔤 Smart Keyword Casing - Configurable keyword case (lower, upper, or preserve)
  • 🔗 Multi-word JOIN Handling - Keeps INNER JOIN, LEFT JOIN, etc. together
  • 📦 CTE Indentation - Perfect context tracking for WHERE/GROUP BY in CTEs
  • ✨ Trailing Whitespace Removal - Clean, consistent output
  • ⚡ Spark SQL Support - Arrays, maps, structs, named_struct, explode, lateral view
  • 🧪 SQL Test Generator - Generate test queries to debug WHERE/JOIN filtering logic

📦 Installation

Option 1: VS Code Marketplace (Recommended)

Search for "KF SQL Formatter" in VS Code Extensions or install from: VS Code Marketplace - KF SQL Formatter

Option 2: Direct Download

⬇️ Download Latest Release (v0.4.0)

After downloading:

  1. Open VS Code
  2. Press Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (Mac)
  3. Type "Extensions: Install from VSIX..."
  4. Select the downloaded .vsix file

Option 3: Build from Source

git clone https://github.com/k-f-/kf.sql.formatter.git
cd kf.sql.formatter
npm install
npm run compile
npm run package

🚀 Usage

  1. Open any .sql, .hql, or .spark.sql file
  2. Run Format Document (Shift+Alt+F on Windows/Linux, Shift+Option+F on Mac)
  3. Your SQL will be formatted with consistent style!

🧪 SQL Test Generator

Generate test queries to debug WHERE and JOIN filtering logic. Useful for identifying why data disappears between pipeline layers.

How to Use

  1. Open a SQL file with a SELECT query
  2. Use one of these methods:
    • Keyboard: Ctrl+Shift+T (Windows/Linux) or Cmd+Shift+T (Mac)
    • Right-click: Select KF.SQL: Generate SQL Tests from context menu
    • Command Palette: Ctrl+Shift+P → KF.SQL: Generate SQL Tests
  3. A new tab opens with your test query

What It Does

  • Extracts WHERE conditions → Creates TEST columns showing PASS/FAIL for each filter
  • Extracts CTE conditions → Creates CTE columns (CTE1, CTE2) for conditions inside WITH clauses
  • Extracts JOIN conditions → Creates JOIN columns showing PASS/FAIL/N/A for each join
  • Converts INNER JOIN → LEFT JOIN → Preserves unmatched rows for debugging
  • Comments out WHERE clause → Returns ALL rows so you can see what would be filtered
  • Adds dual-comment pattern → --[TEST1] markers link results to original conditions

Example

Input:

SELECT * FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active' AND o.amount > 100

Output (automatically formatted using your workspace settings):

select
     *
     -- [Test Indicators] ----------
    ,CASE
        WHEN (u.status = 'active' and o.amount > 100) THEN 'Y'
        ELSE 'N'
    END               as TEST1                                                 --[TEST1]
    ,CASE
        WHEN u.status = 'active' THEN 'Y'
        ELSE 'N'
    END               as TEST1_1                                               --[TEST1.1]
    ,CASE
        WHEN o.amount > 100 THEN 'Y'
        ELSE 'N'
    END               as TEST1_2                                               --[TEST1.2]
    ,CASE
        WHEN o.user_id is null THEN 'N/A'
        WHEN u.id = o.user_id THEN 'Y'
        ELSE 'N'
    END               as JOIN1                                                 --[JOIN1]
from users            as u
left join orders      as o
    on u.id = o.user_id                                                        --[JOIN1]
-- WHERE clause commented for testing:
-- WHERE u.status = 'active' --[TEST1.1] AND o.amount > 100 --[TEST1.2]

Key features shown:

  • TEST1: Combined parent test (both conditions AND'd)
  • TEST1_1, TEST1_2: Individual condition tests
  • JOIN1: JOIN condition with N/A for unmatched rows
  • Dual-comment pattern: --[TEST1] markers link results to original conditions
  • Auto-formatted: Output uses your workspace formatter settings

Test Generator Settings

Setting Default Description
testGenerator.convertInnerJoins true Convert INNER → LEFT JOIN for non-destructive testing
testGenerator.addJoinNullHandling true Add N/A for unmatched JOIN rows
testGenerator.showMetadata true Show notification with test counts

Supported Features

Feature Test ID Pattern Example
WHERE conditions TEST1, TEST2, etc. WHERE status = 'active' → TEST1
JOIN ON conditions JOIN1, JOIN2, etc. ON u.id = o.user_id → JOIN1
CTE WHERE conditions CTE1, CTE2, etc. WITH cte AS (... WHERE x > 10) → CTE1
HAVING conditions HAVING1, HAVING2, etc. HAVING COUNT(*) > 5 → HAVING1
Subquery WHERE SUB1, SUB2, etc. FROM (SELECT ... WHERE x > 10) subq → SUB1
AND decomposition Parent + children WHERE a AND b → TEST1, TEST1_1, TEST1_2
BETWEEN decomposition Parent + >= / <= BETWEEN 100 AND 500 → TEST1, TEST1_1 (>=), TEST1_2 (<=)
IN list decomposition Parent + each value IN ('a', 'b') → TEST1, TEST1_1, TEST1_2

Note: Output is automatically formatted using your workspace formatter settings (keywordCase, indent, leadingCommas, etc.).

⚙️ Configuration

Configure the formatter in VS Code settings (settings.json):

{
  "databricksSqlFormatter.keywordCase": "lower",
  "databricksSqlFormatter.functionCase": "lower",
  "databricksSqlFormatter.indent": 4,
  "databricksSqlFormatter.leadingCommas": true,
  "databricksSqlFormatter.dialect": "spark",
  "databricksSqlFormatter.aliasAlignmentScope": "file",
  "databricksSqlFormatter.aliasMinGap": 8,
  "databricksSqlFormatter.aliasMaxColumnCap": 120,
  "databricksSqlFormatter.trimTrailingWhitespace": true
}

📖 Complete Settings Guide - Detailed documentation for all 16 configuration options with examples, recommendations, and configuration recipes.

Quick Settings Access

  • Command Palette: Cmd+Shift+P → "KF.SQL: Open Settings"
  • Settings UI: Cmd+, → Search "databricks"
  • Documentation: See VSCode Settings Guide

Available Settings

Core Formatting

Setting Type Default Description
keywordCase "lower" | "upper" | "preserve" "lower" SQL keyword letter case
functionCase "lower" | "upper" | "preserve" "lower" Function name letter case
indent number 4 Spaces per indent level (2-8)
leadingCommas boolean true Use leading commas in SELECT lists
dialect "spark" | "hive" | "ansi" "spark" SQL dialect (for future use)
trimTrailingWhitespace boolean true Remove trailing whitespace from lines

Alias Alignment

Setting Type Default Description
aliasAlignmentScope "file" | "statement" | "none" "statement" Scope for alias and comment alignment. "statement" aligns within each SQL statement, "file" aligns across entire file
aliasMinGap number 8 Minimum spaces between expression and AS (1-20)
aliasMaxColumnCap number 120 Maximum column for alias alignment (40-200)
aliasColumnCapMode "fixed" | "adaptive" "adaptive" Column cap behavior. "adaptive" uses 80 or 120 based on expression length, "fixed" always uses aliasMaxColumnCap
aliasAdaptiveThreshold number 75 When aliasColumnCapMode: "adaptive", expressions >= this length use 120 column cap, shorter use 80
addExplicitAs boolean true Add explicit AS keyword to all aliases

Auto-Generate Aliases

Setting Type Default Description
autoGenerateAliases boolean true Master switch for auto-generating aliases (must be true to enable sub-options)
autoGenerateTableAliases boolean true Auto-generate table aliases from table names (e.g., customer_orders → co)
autoGenerateSelectAliases boolean false Auto-generate SELECT expression aliases for complex expressions

📐 Formatting Examples

Before Formatting

WITH active_users AS (SELECT user_id,user_name,account_type FROM users WHERE status='active' AND created_date>='2024-01-01'),order_stats AS(SELECT user_id,COUNT(*) as order_count,SUM(amount) as total_spent,AVG(amount) as avg_order FROM orders WHERE order_date>='2024-01-01' GROUP BY user_id)SELECT u.user_id,u.user_name as customer_name,u.account_type,CASE WHEN o.order_count>10 THEN 'high' WHEN o.order_count>5 THEN 'medium' ELSE 'low' END as engagement_level,o.total_spent,o.avg_order FROM active_users u INNER JOIN order_stats o ON u.user_id=o.user_id WHERE o.total_spent>100 ORDER BY o.total_spent DESC

After Formatting

Configuration Used: This example uses aliasAlignmentScope: 'statement' with all formatting features enabled:

{
  "keywordCase": "lower",
  "functionCase": "lower",
  "indent": 4,
  "leadingCommas": true,
  "aliasAlignmentScope": "statement",
  "aliasMinGap": 8,
  "aliasMaxColumnCap": 120,
  "addExplicitAs": true,
  "trimTrailingWhitespace": true
}
with active_users as (
    select
         user_id
        ,user_name
        ,account_type
    from users
    where status = 'active'
        and created_date >= '2024-01-01'
)
,order_stats as (
    select
         user_id
        ,count(*)           as order_count
        ,sum(amount)        as total_spent
        ,avg(amount)        as avg_order
    from orders
    where order_date >= '2024-01-01'
    group by user_id
)
select
     u.user_id
    ,u.user_name            as customer_name
    ,u.account_type
    ,case
        when o.order_count > 10 then 'high'
        when o.order_count > 5 then 'medium'
        else 'low'
    end                     as engagement_level
    ,o.total_spent
    ,o.avg_order
from active_users           as u
inner join order_stats      as o
    on u.user_id = o.user_id
where o.total_spent > 100
order by o.total_spent desc

Key Features Demonstrated:

  • 📍 Leading commas for SELECT columns (leadingCommas: true)
  • 📏 Smart alias alignment - all AS keywords align within statement scope (aliasAlignmentScope: 'statement', aliasMinGap: 8)
  • 🔤 Consistent keyword casing (keywordCase: 'lower', functionCase: 'lower')
  • 📦 Perfect CTE indentation with proper WHERE clause alignment (indent: 4)
  • 🔗 Multi-word keywords kept together (INNER JOIN, ORDER BY)
  • ⚡ Clean, readable structure with explicit AS keywords (addExplicitAs: true)

Understanding aliasAlignmentScope

The aliasAlignmentScope option controls how AS keywords align across your SQL file. Here's a visual comparison:

With scope: 'statement' (each statement aligns independently):

select
     customer_id
    ,very_long_customer_name        as name
    ,account_balance                as balance
from customers;
select
     id          as user_id
    ,name        as user_name
from users

Notice how:

  • Statement 1: AS keywords align at column 36 (due to long column name)
  • Statement 2: AS keywords align at column 17 (shorter expressions)
  • Each statement uses optimal spacing for its own expressions

With scope: 'file' (all statements align together):

select
     customer_id
    ,very_long_customer_name        as name
    ,account_balance                as balance
from customers;
select
     id                             as user_id
    ,name                           as user_name
from users

Notice how:

  • All AS keywords align at column 36 across both statements
  • Statement 2 has extra spacing to match statement 1's longest expression
  • Provides consistent alignment across the entire file

Recommendation: Use 'statement' for optimal readability in files with multiple queries, or 'file' when you want uniform alignment across all statements.

🏗️ Architecture

The formatter uses a token-based architecture with three stages:

  1. Tokenize - SQL → Tokens (preserves 100% of original text)
  2. Transform - Apply transformations:
    • Keyword case normalization (configurable: lower, upper, or preserve)
    • Line breaks insertion
    • Leading commas conversion
    • Context-aware indentation (CTEs, subqueries, CASE statements)
    • Explicit AS keyword insertion
    • Unified alias alignment (column, table, and subquery aliases align together)
    • Comment separator adjustment (80/120 dashes)
    • Trailing whitespace removal
  3. Generate - Tokens → Formatted SQL

📊 Test Results

140/140 tests passing ✅

  • Unit Tests: Tokenizer (4/4), Generator (2/2), Integration (6/6)
  • Golden Files: 5 files - Idempotence tests + Similarity metrics (informational)
  • Test Cases: 8 files - Idempotent formatting verification
  • Edge Cases: 7 files covering complex SQL patterns
  • Minified Path: Unformatted → Formatted idempotence testing
  • Options Tests: 70+ tests for all configuration options
  • VSCode Integration: Extension and settings tests

📁 Project Structure

kf.sql.formatter/
├── package.json                   # Extension manifest
├── tsconfig.json                  # TypeScript config
├── src/                          # Source code
│   ├── extension.ts              # VS Code extension entry point
│   ├── formatter-custom.ts       # Main formatter
│   ├── tokenizer/               # SQL tokenizer
│   │   ├── tokenize.ts
│   │   └── token-types.ts
│   ├── transformers/            # Token transformers
│   │   ├── keywords.ts
│   │   ├── line-breaks.ts
│   │   ├── leading-commas.ts
│   │   ├── indentation.ts
│   │   ├── add-explicit-as.ts
│   │   ├── alias-alignment.ts
│   │   ├── comment-separators.ts
│   │   └── trim-trailing-whitespace.ts
│   └── generator/               # SQL generator
│       └── generate-sql.ts
├── dist/                        # Compiled JavaScript (gitignored)
├── tests/                       # Test suite
│   └── test-suite.js           # Comprehensive tests
├── examples/                    # Test SQL files
│   ├── manual/                 # Manual test cases (7 files)
│   ├── golden-references/      # Reference SQL (5 files)
│   └── edge-cases/            # Complex edge cases (7 files)
└── docs/                       # Documentation
    ├── development/           # Developer guides
    └── releases/             # Release notes

🔮 Future Features

The following features are planned but not yet implemented:

  • Semantic Auto-Aliasing - Automatically add meaningful aliases to expressions
  • Advanced Comment Handling - Comment alignment and wrapping
  • More Dialects - Full support for Hive and ANSI SQL variations

🤝 Contributing

Contributions are welcome! Please see the development documentation for:

  • Architecture overview
  • Testing guidelines
  • Development workflow

Development Workflow

Before Pushing

The repository includes a pre-push hook that automatically validates:

  • ✅ TypeScript compilation (npm run compile)
  • ✅ Test suite (24 tests via test-suite.js)
  • ✅ Markdown linting (npm run lint)

If validation fails, you'll see helpful error messages with fix instructions.

The hook runs automatically before every git push and takes ~5-10 seconds.

Manual Validation

To check your changes before pushing:

npm run verify

This runs the same checks as the pre-push hook.

Fixing Markdown Linting Errors

Most markdown errors can be auto-fixed:

npm run lint:fix

Some errors (like MD040 - missing code fence languages) require manual fixes.

Bypassing Pre-Push Hook

⚠️ Use only in emergencies:

git push --no-verify

Warning: This will likely cause CI failures on GitHub Actions.

📝 License

MIT License - see LICENSE file for details

🔗 Links

  • VS Code Marketplace
  • GitHub Repository
  • Issue Tracker
  • Release Notes

📈 Version History

  • v0.0.9 (2025-11-13) - Table alias alignment fix, IN list indentation, examples reorganization (issue #18)
  • v0.0.8 (2025-11-12) - Width-aware formatting, comment separators, CTE indentation fixes, 6/7 golden files at 100%
  • v0.0.7 (2025-11-11) - Alias alignment improvements, CASE keyword casing
  • v0.0.5 (2025-11-01) - Custom token-based formatter, 100% idempotent, 31 tests passing
  • v0.0.4 (2025-10-12) - Security fixes, marketplace publishing
  • v0.0.3 (2025-10-02) - Comment handling fixes, idempotence improvements
  • v0.0.2 (2025-10-02) - Initial multi-pass formatter
  • v0.0.1 (2025-10-02) - First release

Made with ❤️ for the Databricks community

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft