PySpark DataFrame Column Analyzer
Static analyzer for PySpark DataFrame column usage in Python files.
Features
- Detects invalid column references for PySpark
DataFrame operations:
select()
withColumn()
withColumnRenamed()
drop()
join()
union() / unionByName()
- Infers DataFrame schemas from Palantir Foundry dataset metadata for the transform:
spark.read.table("table_name") uses the schema derived from Foundry metadata
- The first
select() on a DataFrame when no schema is known
- Incremental analysis on document changes.
- Nice-to-have features:
- Column completion inside
df.select("...").
- Hover on a DataFrame variable name to see its inferred schema.
Configuration
Schemas can come from:
- a Palantir Foundry transform metadata file referenced by
pysparkAnalyzer.foundryMetadataPath (extension-side preloaded map), or
- live Foundry API calls from the Rust sidecar when
FOUNDRY_TOKEN and FOUNDRY_API_URL is set.
For live sidecar fetches, set:
FOUNDRY_TOKEN (required): bearer token used for authorization.
FOUNDRY_API_URL (required): Foundry stack base URL, e.g. https://my-stack.palantirfoundry.com.
FOUNDRY_SCHEMA_CACHE_TTL_SECONDS (optional): in-memory cache TTL, default 300.
| |