Data Science Snippets for VSCode
This VSCode extension provides a collection of useful code snippets for data science tasks. These snippets cover a wide range of common operations including data manipulation, visualization, and machine learning using popular libraries like Pandas, NumPy, Matplotlib, Seaborn, and Scikit-learn.
Features
Import Libraries
Prefix: importlibs
Description: Import common data science libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Load CSV with Pandas
Prefix: loadcsv
Description: Load a CSV file into a Pandas DataFrame
df = pd.read('filename.csv'),
df.drop(['Index'],axis=1,inplace=True)
print(df.head(), '\\n')
print(df.isnull().sum())
Convert Data Types
Prefix: convert_dtypes
Description: Convert data types of DataFrame columns
df['column'] = df['column'].astype('data_type')
print(df.dtypes)
Remove Duplicates
Prefix: remove_duplicates
Description: Remove duplicate rows from the DataFrame
df = df.drop_duplicates()
print(df.head())
Basic Plot with Matplotlib
Prefix: plotbasic
Description: Create a basic plot with Matplotlib
plt.figure(figsize=(10, 6))
plt.plot(x, y)
plt.title(Title)
plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')
plt.show()
Subplots with Matplotlib
Prefix: subplots
Description: Create subplots with Matplotlib
fig, axes = plt.subplots(nrows, ncols, figsize=(15, 10))
axes[0, 0].plot(x1, y1)
axes[0, 0].set_title('Title 1')
axes[0, 1].plot(x2, y2)
axes[0, 1].set_title('Title 2')
# Add more subplots as needed
plt.tight_layout()
plt.show()
Correlation Heatmap with Seaborn
Prefix: heatmap
Description: Create a correlation heatmap with Seaborn
plt.figure(figsize=(10, 8))
numeric_df = df.select_dtypes(include=[np.number])
sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
Histogram with Seaborn
Prefix: histplot
Description: Create a histogram with Seaborn
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='column', bins=30, kde=True)
plt.title('Histogram')
plt.xlabel('Column')
plt.ylabel('Frequency')
plt.show()
Box Plot with Seaborn
Prefix: boxplot
Description: Create a box plot with Seaborn
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='category', y='values')
plt.title('Box Plot')
plt.xlabel('Category')
plt.ylabel('Values')
plt.show()
Pair Plot with Seaborn
Prefix: pairplot
Description: Create a pair plot with Seaborn
sns.pairplot(df)
plt.show()
Violin Plot with Seaborn
Prefix: violinplot
Description: Create a violin plot with Seaborn
plt.figure(figsize=(10, 6))
sns.violinplot(x='category', y='values', data=df)
plt.title('Violin Plot')
plt.show()
Prefix: formatted_heatmap
Description: Create a formatted correlation heatmap with Seaborn
plt.figure(figsize=(10, 2:8))
heatmap = sns.heatmap(df.corr(), annot=True, fmt='.2f', cmap='coolwarm')
heatmap.set_title('Correlation Heatmap', fontdict='fontsize':18}, pad=16)
plt.show()
One-Hot Encoding with Scikit-learn
Prefix: onehotencoder
Description: Apply one-hot encoding to a categorical column using Scikit-learn
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(sparse_output=False)
encoded_data = encoder.fit_transform(df[['category_column']])
encoded_df = pd.DataFrame(encoded_data, columns=encoder.get_feature_names_out(['category_column']))
df = pd.concat([df, encoded_df], axis=1).drop(columns=['category_column'])
print(df.head())
Train-Test Split with Scikit-learn
Prefix: train_test_split
Description: Split data into training and testing sets using Scikit-learn
from sklearn.model_selection import train_test_split
X = df.drop(columns=['target_column'])
y = df['target_column']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
Linear Regression with Scikit-learn
Prefix: linear_regression
Description: Perform linear regression using Scikit-learn
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
plt.scatter(y_test, predictions)
plt.xlabel('True Values')
plt.ylabel('Predictions')
plt.title('Linear Regression')
plt.show()
Logistic Regression with Scikit-learn
Prefix: logistic_regression
Description: Perform logistic regression using Scikit-learn
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
K-Nearest Neighbors with Scikit-learn
Prefix: knn
Description: Perform K-Nearest Neighbors classification using Scikit-learn
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
Support Vector Classifier with Scikit-learn
Prefix: svc
Description: Perform Support Vector Classification using Scikit-learn
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = SVC(kernel='linear')
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
Decision Tree Classifier with Scikit-learn
Prefix: decision_tree
Description: Perform Decision Tree Classification using Scikit-learn
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
Random Forest Classifier with Scikit-learn
Prefix: random_forest
Description: Perform Random Forest Classification using Scikit-learn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)
class_report = classification_report(y_test, predictions)
print('Accuracy:', accuracy)
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
K-Means Clustering with Scikit-learn
Prefix: kmeans
Description: Perform K-Means Clustering using Scikit-learn
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3, random_state=42)
model.fit(X)
labels = model.predict(X)
df['Cluster'] = labels
print(df.head())
Impute Missing Values
Prefix: impute_missing
Description: Impute missing values using SimpleImputer
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
df['column'] = imputer.fit_transform(df[['column']])
print(df.head())
Confusion Matrix Plot
Prefix: confusion_matrix_plot
Description: Plot a confusion matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
cm = confusion_matrix(y_test, predictions)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.show()
ROC Curve Plot
Prefix: roc_curve_plot
Description: Plot an ROC curve
from sklearn.metrics import roc_curve, roc_auc_score
fpr, tpr, _ = roc_curve(y_test, model.predict_proba(X_test)[:,1])
plt.figure(figsize=(10,6))
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc_score(y_test, predictions))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc='lower right')
plt.show()
Cross-Validation Score
Prefix: cross_val_score
Description: Calculate cross-validation scores
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print('Cross-validation scores:', scores)
print('Mean score:', scores.mean())
Standard Scaling with Scikit-learn
Prefix: standard_scaling
Description: Apply standard scaling to a column
"from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()",
df_scaled = scaler.fit_transform(df[['column']])
df['column_scaled'] = df_scaled
print(df.head())
Min-Max Scaling with Scikit-learn
Prefix: minmax_scaling
Description: Apply min-max scaling to a column
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[['column']])
df['column_scaled'] = df_scaled
print(df.head())
Normalize Data
Prefix: normalize_data
Description: Normalize data using MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[['column']] = scaler.fit_transform(df[['column']])
print(df.head())
Standardize Data
Prefix: standardize_data
Description: Standardize data using StandardScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['column']] = scaler.fit_transform(df[['column']])
print(df.head())
Usage
Open a Python file in VSCode.
Type the prefix of the snippet (e.g., importlibs) and select the snippet from the IntelliSense suggestions.
Customize the placeholder values as needed.
Contributors
Leonardo A. B. Noman