
A custom token-based SQL formatter for VS Code that formats Databricks/Spark SQL with 100% idempotent output.
✨ Features
- 🎯 100% Idempotent - Format twice = same result (31/31 tests passing)
- 📏 File-wide Alias Alignment - All
AS keywords align to a single column
- 📍 Leading Commas - SELECT list items formatted with leading commas (
,col)
- 🔤 Lowercase Keywords - All SQL keywords normalized to lowercase
- 🔗 Multi-word JOIN Handling - Keeps
INNER JOIN, LEFT JOIN, etc. together
- ✨ Trailing Whitespace Removal - Clean, consistent output
- ⚡ Spark SQL Support - Arrays, maps, structs, named_struct, explode, lateral view
📦 Installation
Option 1: VS Code Marketplace (Recommended)
Search for "KF SQL Formatter" in VS Code Extensions or install from:
VS Code Marketplace - KF SQL Formatter
Option 2: Direct Download
⬇️ Download Latest Release (v0.0.7)
After downloading:
- Open VS Code
- Press
Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (Mac)
- Type "Extensions: Install from VSIX..."
- Select the downloaded
.vsix file
Option 3: Build from Source
git clone https://github.com/k-f-/kf.sql.formatter.git
cd kf.sql.formatter
npm install
npm run compile
npm run package
🚀 Usage
- Open any
.sql, .hql, or .spark.sql file
- Run Format Document (
Shift+Alt+F on Windows/Linux, Shift+Option+F on Mac)
- Your SQL will be formatted with consistent style!
⚙️ Configuration
Configure the formatter in VS Code settings (settings.json):
{
"databricksSqlFormatter.keywordCase": "lower",
"databricksSqlFormatter.functionCase": "lower",
"databricksSqlFormatter.indent": 4,
"databricksSqlFormatter.leadingCommas": true,
"databricksSqlFormatter.dialect": "spark",
"databricksSqlFormatter.aliasAlignmentScope": "file",
"databricksSqlFormatter.aliasMinGap": 8,
"databricksSqlFormatter.aliasMaxColumnCap": 120,
"databricksSqlFormatter.trimTrailingWhitespace": true
}
Available Settings
| Setting |
Type |
Default |
Description |
keywordCase |
"lower" | "upper" | "preserve" |
"lower" |
SQL keyword letter case |
functionCase |
"lower" | "upper" | "preserve" |
"lower" |
Function name letter case |
indent |
number |
4 |
Spaces per indent level (2-8) |
leadingCommas |
boolean |
true |
Use leading commas in SELECT lists |
dialect |
"spark" | "hive" | "ansi" |
"spark" |
SQL dialect (for future use) |
aliasAlignmentScope |
"file" | "select" | "none" |
"file" |
Scope for alias alignment |
aliasMinGap |
number |
8 |
Minimum spaces between expression and AS (1-20) |
aliasMaxColumnCap |
number |
120 |
Maximum column for alias alignment (40-200) |
trimTrailingWhitespace |
boolean |
true |
Remove trailing whitespace from lines |
SELECT customer_id,customer_name as name,SUM(order_amount) as total FROM orders WHERE status='completed' GROUP BY customer_id,customer_name
select
customer_id
,customer_name as name
,sum(order_amount) as total
from orders
where status = 'completed'
group by
customer_id
,customer_name
🏗️ Architecture
The formatter uses a token-based architecture with three stages:
- Tokenize - SQL → Tokens (preserves 100% of original text)
- Transform - Apply transformations:
- Keywords normalization
- Line breaks insertion
- Leading commas conversion
- Indentation
- Alias alignment
- Trailing whitespace removal
- Generate - Tokens → Formatted SQL
📊 Test Results
31/31 tests passing ✅
- Unit Tests: Tokenizer (4/4), Generator (2/2)
- Integration Tests: 6/6 formatting tests
- Golden Files: 5/5 files, 90%+ similarity to hand-coded reference
- Manual Examples: 7/7 files, 100% idempotent
- Edge Cases: 7/7 files covering complex SQL patterns
- Window functions (PARTITION BY, ORDER BY, frame specifications)
- Nested CTEs and recursive queries
- Complex CASE statements (3+ levels deep)
- Multiple JOINs with complex conditions
- Spark SQL features (arrays, maps, structs)
- Complex expressions and nested functions
- Complex WHERE clauses with AND/OR logic
📁 Project Structure
kf.sql.formatter/
├── package.json # Extension manifest
├── tsconfig.json # TypeScript config
├── src/ # Source code
│ ├── extension.ts # VS Code extension entry point
│ ├── formatter-custom.ts # Main formatter
│ ├── tokenizer/ # SQL tokenizer
│ │ ├── tokenize.ts
│ │ └── token-types.ts
│ ├── transformers/ # Token transformers
│ │ ├── keywords.ts
│ │ ├── line-breaks.ts
│ │ ├── leading-commas.ts
│ │ ├── indentation.ts
│ │ ├── alias-alignment.ts
│ │ └── trim-trailing-whitespace.ts
│ └── generator/ # SQL generator
│ └── generate-sql.ts
├── dist/ # Compiled JavaScript (gitignored)
├── tests/ # Test suite
│ └── test-suite.js # Comprehensive tests
├── examples/ # Test SQL files
│ ├── manual/ # Manual test cases (7 files)
│ ├── golden-references/ # Reference SQL (5 files)
│ └── edge-cases/ # Complex edge cases (7 files)
└── docs/ # Documentation
├── development/ # Developer guides
└── releases/ # Release notes
🔮 Future Features
The following features are planned but not yet implemented:
- Semantic Auto-Aliasing - Automatically add meaningful aliases to expressions
- Advanced Comment Handling - Comment alignment and wrapping
- More Dialects - Full support for Hive and ANSI SQL variations
🤝 Contributing
Contributions are welcome! Please see the development documentation for:
- Architecture overview
- Testing guidelines
- Development workflow
📝 License
MIT License - see LICENSE file for details
🔗 Links
📈 Version History
- v0.0.5 (2025-11-01) - Custom token-based formatter, 100% idempotent, 31 tests passing
- v0.0.4 (2025-10-12) - Security fixes, marketplace publishing
- v0.0.3 (2025-10-02) - Comment handling fixes, idempotence improvements
- v0.0.2 (2025-10-02) - Initial multi-pass formatter
- v0.0.1 (2025-10-02) - First release
Made with ❤️ for the Databricks community