Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Paxzas CUDA AnalyzerNew to Visual Studio Code? Get it now.
Paxzas CUDA Analyzer

Paxzas CUDA Analyzer

CudaPaxZas

|
2 installs
| (0) | Free
Local PTX/SASS fast analysis inside VS Code (no Python, no network).
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Paxzas CUDA Analyzer

Static analysis of NVIDIA CUDA kernels directly inside VS Code — no Python, no network, no GPU required. Point it at a PTX or SASS file and get occupancy, memory posture, bottleneck diagnosis, and instruction mix instantly, across every GPU architecture from Volta to Blackwell.


Features

Multi-Architecture What-if Analysis

Every analysis runs against all supported GPU presets in parallel. A dropdown in the panel lets you switch results between architectures instantly — no re-running needed. Presets derived from actual SASS binary targets are marked as native; all others are what-if estimates using the same instruction profile with different SM limits.

Supported architectures:

Preset Architecture SM
a100 Ampere A100 sm_80
ampere-like-default Ampere generic sm_80
rtx-4090 Ada Lovelace sm_89
h100-sxm Hopper H100 SXM sm_90
h100-pcie Hopper H100 PCIe sm_90
h200 Hopper H200 sm_90
b200 Blackwell B200 sm_100
gb200 Blackwell GB200 NVL sm_100
blackwell-consumer-default Blackwell consumer sm_120

Occupancy Model

  • Computes warp occupancy, blocks per SM, and the limiting factor (registers, shared memory, threads, or block limit) for any block size
  • Sweep charts — occupancy, blocks/SM, and per-constraint resource limits plotted across all valid block sizes
  • Waste metrics — unused threads, warps, registers, and shared bytes per SM at the current configuration
  • Register what-if: how many fewer registers to reach the next occupancy tier
  • Launch parameter inference from PTX hints and optional SASS register index

Bottleneck Diagnosis

  • Fuses memory posture, stall profile, and pattern class into a primary and secondary bottleneck with the firing rule that triggered it
  • Four stall profile flags: memory dependency, memory throttle, local memory (register spill), sync overhead
  • Per-bottleneck optimization suggestions ranked by impact

Memory Model

  • Classifies the kernel as memory-bound or compute-friendly
  • Arithmetic intensity (ops/byte), reuse ratio, cache policy, load/store balance and vectorization score
  • Distinguishes global, shared, and local (spill) traffic from SASS; falls back to PTX heuristics when SASS is absent

Pattern Model

  • Classifies as tiled, streaming, reduction, compute_heavy, or mixed
  • Detects archetypes: GEMM, CONV, ELEMENTWISE, STENCIL, and others
  • 15 micro-flags: register spill, uncoalesced loads/stores, atomic contention, missing tensor cores, SFU-heavy, over-synchronized, FP16 scalar, warp divergence, and more
  • Warp primitive detection: shuffle, vote, and reduction patterns

Instruction Mix (SASS)

  • Category breakdown: arithmetic, tensor, SFU, global mem, shared mem, local mem, control/sync
  • Vectorized load/store sub-counts (LDG.128 / LDG.64 / LDG.32, STG equivalents)
  • Tensor core op counts (WMMA / HMMA)
  • Atomic and warp-primitive counts
  • Productive instruction fraction and tensor utilization fraction

Roofline Chart

  • Plots the kernel's arithmetic intensity against the FP32 roof and bandwidth slope for the selected GPU
  • Region classification: memory-bound or compute-bound with a ridge-point marker

Raw Feature Inspection

  • Side-by-side PTX vs SASS instruction counts for every extracted feature
  • Rows with differing values highlighted; column headers adapt to the available data source

Commands

Command Description
Paxzas: Kernel Analysis Opens the full 10-tab analysis panel
Paxzas: Analyze CUDA File with Launch Spec Same analysis with optional threads=…,shared=…,regs=… overrides

Both commands are available from the Command Palette, editor title bar, editor right-click, and Explorer right-click on .ptx, .cu, and .sass files.


Settings

paxzas.gpuPreset — default GPU for the capability dropdown.

auto detects the local GPU via nvidia-smi when available; named presets force a specific architecture. Regardless of this setting, all presets are always shown in the panel dropdown.


Requirements

  • VS Code ≥ 1.85
  • Optional: nvidia-smi on PATH for automatic GPU detection in auto mode

Development

npm install
npm run compile   # or: npm run watch
npm test

F5 in VS Code (with this folder open) launches an Extension Development Host.

Build a .vsix

./build-vsix.sh
# or
npm run vsix

Install locally: Extensions → ··· → Install from VSIX…


Repository

github.com/CudaPaxZas/PaxZas

  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2026 Microsoft