Skip to content
| Marketplace
Sign in
Visual Studio Code>Formatters>KF SQL Formatter (Databricks/Spark)New to Visual Studio Code? Get it now.
KF SQL Formatter (Databricks/Spark)

KF SQL Formatter (Databricks/Spark)

Kyle Fring

|
24 installs
| (0) | Free
SQL formatter for Databricks/Spark SQL. Mostly AI Slop.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

KF SQL Formatter (Databricks/Spark)

Version License

A custom token-based SQL formatter for VS Code that formats Databricks/Spark SQL with 100% idempotent output.

✨ Features

  • 🎯 100% Idempotent - Format twice = same result (31/31 tests passing)
  • 📏 File-wide Alias Alignment - All AS keywords align to a single column
  • 📍 Leading Commas - SELECT list items formatted with leading commas (,col)
  • 🔤 Lowercase Keywords - All SQL keywords normalized to lowercase
  • 🔗 Multi-word JOIN Handling - Keeps INNER JOIN, LEFT JOIN, etc. together
  • ✨ Trailing Whitespace Removal - Clean, consistent output
  • ⚡ Spark SQL Support - Arrays, maps, structs, named_struct, explode, lateral view

📦 Installation

Option 1: VS Code Marketplace (Recommended)

Search for "KF SQL Formatter" in VS Code Extensions or install from: VS Code Marketplace - KF SQL Formatter

Option 2: Direct Download

⬇️ Download Latest Release (v0.0.7)

After downloading:

  1. Open VS Code
  2. Press Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (Mac)
  3. Type "Extensions: Install from VSIX..."
  4. Select the downloaded .vsix file

Option 3: Build from Source

git clone https://github.com/k-f-/kf.sql.formatter.git
cd kf.sql.formatter
npm install
npm run compile
npm run package

🚀 Usage

  1. Open any .sql, .hql, or .spark.sql file
  2. Run Format Document (Shift+Alt+F on Windows/Linux, Shift+Option+F on Mac)
  3. Your SQL will be formatted with consistent style!

⚙️ Configuration

Configure the formatter in VS Code settings (settings.json):

{
  "databricksSqlFormatter.keywordCase": "lower",
  "databricksSqlFormatter.functionCase": "lower",
  "databricksSqlFormatter.indent": 4,
  "databricksSqlFormatter.leadingCommas": true,
  "databricksSqlFormatter.dialect": "spark",
  "databricksSqlFormatter.aliasAlignmentScope": "file",
  "databricksSqlFormatter.aliasMinGap": 8,
  "databricksSqlFormatter.aliasMaxColumnCap": 120,
  "databricksSqlFormatter.trimTrailingWhitespace": true
}

Available Settings

Setting Type Default Description
keywordCase "lower" | "upper" | "preserve" "lower" SQL keyword letter case
functionCase "lower" | "upper" | "preserve" "lower" Function name letter case
indent number 4 Spaces per indent level (2-8)
leadingCommas boolean true Use leading commas in SELECT lists
dialect "spark" | "hive" | "ansi" "spark" SQL dialect (for future use)
aliasAlignmentScope "file" | "select" | "none" "file" Scope for alias alignment
aliasMinGap number 8 Minimum spaces between expression and AS (1-20)
aliasMaxColumnCap number 120 Maximum column for alias alignment (40-200)
trimTrailingWhitespace boolean true Remove trailing whitespace from lines

📐 Formatting Examples

Before Formatting

SELECT customer_id,customer_name as name,SUM(order_amount) as total FROM orders WHERE status='completed' GROUP BY customer_id,customer_name

After Formatting

select
    customer_id
    ,customer_name      as name
    ,sum(order_amount)  as total
from orders
where status = 'completed'
group by
    customer_id
    ,customer_name

🏗️ Architecture

The formatter uses a token-based architecture with three stages:

  1. Tokenize - SQL → Tokens (preserves 100% of original text)
  2. Transform - Apply transformations:
    • Keywords normalization
    • Line breaks insertion
    • Leading commas conversion
    • Indentation
    • Alias alignment
    • Trailing whitespace removal
  3. Generate - Tokens → Formatted SQL

📊 Test Results

31/31 tests passing ✅

  • Unit Tests: Tokenizer (4/4), Generator (2/2)
  • Integration Tests: 6/6 formatting tests
  • Golden Files: 5/5 files, 90%+ similarity to hand-coded reference
  • Manual Examples: 7/7 files, 100% idempotent
  • Edge Cases: 7/7 files covering complex SQL patterns
    • Window functions (PARTITION BY, ORDER BY, frame specifications)
    • Nested CTEs and recursive queries
    • Complex CASE statements (3+ levels deep)
    • Multiple JOINs with complex conditions
    • Spark SQL features (arrays, maps, structs)
    • Complex expressions and nested functions
    • Complex WHERE clauses with AND/OR logic

📁 Project Structure

kf.sql.formatter/
├── package.json                   # Extension manifest
├── tsconfig.json                  # TypeScript config
├── src/                          # Source code
│   ├── extension.ts              # VS Code extension entry point
│   ├── formatter-custom.ts       # Main formatter
│   ├── tokenizer/               # SQL tokenizer
│   │   ├── tokenize.ts
│   │   └── token-types.ts
│   ├── transformers/            # Token transformers
│   │   ├── keywords.ts
│   │   ├── line-breaks.ts
│   │   ├── leading-commas.ts
│   │   ├── indentation.ts
│   │   ├── alias-alignment.ts
│   │   └── trim-trailing-whitespace.ts
│   └── generator/               # SQL generator
│       └── generate-sql.ts
├── dist/                        # Compiled JavaScript (gitignored)
├── tests/                       # Test suite
│   └── test-suite.js           # Comprehensive tests
├── examples/                    # Test SQL files
│   ├── manual/                 # Manual test cases (7 files)
│   ├── golden-references/      # Reference SQL (5 files)
│   └── edge-cases/            # Complex edge cases (7 files)
└── docs/                       # Documentation
    ├── development/           # Developer guides
    └── releases/             # Release notes

🔮 Future Features

The following features are planned but not yet implemented:

  • Semantic Auto-Aliasing - Automatically add meaningful aliases to expressions
  • Advanced Comment Handling - Comment alignment and wrapping
  • More Dialects - Full support for Hive and ANSI SQL variations

🤝 Contributing

Contributions are welcome! Please see the development documentation for:

  • Architecture overview
  • Testing guidelines
  • Development workflow

📝 License

MIT License - see LICENSE file for details

🔗 Links

  • VS Code Marketplace
  • GitHub Repository
  • Issue Tracker
  • Release Notes

📈 Version History

  • v0.0.5 (2025-11-01) - Custom token-based formatter, 100% idempotent, 31 tests passing
  • v0.0.4 (2025-10-12) - Security fixes, marketplace publishing
  • v0.0.3 (2025-10-02) - Comment handling fixes, idempotence improvements
  • v0.0.2 (2025-10-02) - Initial multi-pass formatter
  • v0.0.1 (2025-10-02) - First release

Made with ❤️ for the Databricks community

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft