Leakage DetectorVisual Studio Code extension that detects instances of data leakage in Jupyter Notebooks. FeaturesData leakage is a common problem in machine learning (ML) code where a model is trained on data that isn't in the training dataset. This skews the model results and causes an overly optimistic estimate of performance. This is why ML developers should separate data into three sets — training, evaluation, and a single-use test set — which many model makers overlook (Yang et al.). This extension will detect data leakage in Jupyter Notebooks (.ipynb) and suggest ways to fix it. Leakage comes in three types:
The extension creates two tables in the bottom panel. "Leakage Summary" shows how many instances of each type there are. "Leakage Instances" isolates each instance, the line it's on, and the variable that caused it. The user can click on each row to open that file and go to the line in question. Requirements
Known Issues |