Zatch Bell 23 - MCA AIML Practical Journal

Project Overview
This is a very basic Visual Studio Code extension created to professionally display an MCA Semester 2 AIML Practical Journal on the VS Code Marketplace and inside the VS Code extension details page.
The extension is intentionally simple. Its main purpose is documentation presentation, not complex extension functionality.
Objective
- Publish a simple VS Code extension successfully.
- Display AIML practical assignments cleanly in the Marketplace README.
- Keep all practical questions and Python code in one professional document.
- Help students revise AIML practical programs from one place.
Features
- Clean README-based assignment presentation.
- MCA Semester 2 AIML practical topics included.
- Python practical programs for AI search, ML algorithms, preprocessing, and visualization.
- Basic command available in VS Code Command Palette.
- Beginner-friendly extension structure.
Technologies Used
- Visual Studio Code Extension API
- Node.js
- JavaScript
- Python practical programs
- Jupyter Notebook compatible code
- Libraries used in practicals:
- NumPy
- Pandas
- Matplotlib
- Scikit-learn
Practical Index
| Practical Topic |
Questions |
| 1. Water Jug Problem |
7 |
| 2. Tic Tac Toe |
7 |
| 3. Graph Search Algorithms - BFS |
8 |
| 4. DFS |
6 |
| 5. Informed Search - A* Algorithm |
9 |
| 6. Regression - Linear Regression |
9 |
| 7. Classification - Logistic Regression |
8 |
| 8. Decision Tree |
9 |
| 9. KNN |
9 |
| 10. Naive Bayes |
7 |
| 11. SVM |
9 |
| 12. K-Means Clustering |
9 |
| 13. Missing Value Handling |
5 |
| 14. Label Encoding |
8 |
| 15. Data Visualization |
10 |
How It Works
After installation, this extension appears in the VS Code Extensions panel. The content shown on the Marketplace and in VS Code comes from this README.md file automatically.
A simple command is also available:
Zatch Bell 23: Show AIML Practical Info
Run it from the Command Palette to display a welcome message.
Screenshots
Add screenshots here before publishing:
images/screenshot-1.png
images/screenshot-2.png
Recommended screenshots:
- Extension details page in VS Code.
- Practical journal README preview.
- Example Python practical output in Jupyter Notebook.
Installation
npm install -g @vscode/vsce
npm install
vsce package
To test locally:
- Open this folder in VS Code.
- Press
F5.
- A new Extension Development Host window opens.
- Open Command Palette.
- Run
Zatch Bell 23: Show AIML Practical Info.
Publishing
vsce login <your-publisher-name>
vsce publish
Author
Armaan Bimalpati
MCA Semester 2
AIML Practical Journal
Conclusion
This extension is created as a simple documentation-based VS Code extension to showcase AIML practical assignments professionally on the VS Code Marketplace.
Complete AIML Practical Content
Water Jug Problem
1.Solve the Water Jug Problem for jugs of capacity 5L and 3L to obtain exactly 4L
a = 0 # 5L jug
b = 0 # 3L jug
print("Initial State:", (a, b))
# Fill 5L jug
a = 5
print((a, b))
# Pour from 5L to 3L
a = 2
b = 3
print((a, b))
# Empty 3L jug
b = 0
print((a, b))
# Pour remaining 2L into 3L jug
a = 0
b = 2
print((a, b))
# Fill 5L jug again
a = 5
print((a, b))
# Pour from 5L to 3L until 3L jug is full
a = 4
b = 3
print((a, b))
print("Goal Achieved: 4 Liters in 5L Jug")
2.Implement Water Jug Problem using BFS.
from collections import deque
def water_jug_bfs():
queue = deque([(0, 0)]) # Initial state
visited = []
while queue:
state = queue.popleft()
if state in visited:
continue
visited.append(state)
print(state)
a, b = state
# Goal: 4L in 5L jug
if a == 4:
print("Goal Achieved!")
break
# Possible moves
next_states = [
(5, b), # Fill 5L jug
(a, 3), # Fill 3L jug
(0, b), # Empty 5L jug
(a, 0), # Empty 3L jug
(a - min(a, 3 - b), b + min(a, 3 - b)), # Pour 5L → 3L
(a + min(b, 5 - a), b - min(b, 5 - a)) # Pour 3L → 5L
]
for s in next_states:
if s not in visited:
queue.append(s)
water_jug_bfs()
3.Implement Water Jug Problem using DFS
def water_jug_dfs():
stack = [(0, 0)] # Initial state
visited = []
while stack:
state = stack.pop()
if state in visited:
continue
visited.append(state)
print(state)
a, b = state
# Goal: 4L in 5L jug
if a == 4:
print("Goal Achieved!")
break
# Possible moves
next_states = [
(5, b), # Fill 5L jug
(a, 3), # Fill 3L jug
(0, b), # Empty 5L jug
(a, 0), # Empty 3L jug
(a - min(a, 3 - b), b + min(a, 3 - b)), # Pour 5L → 3L
(a + min(b, 5 - a), b - min(b, 5 - a)) # Pour 3L → 5L
]
for s in next_states:
if s not in visited:
stack.append(s)
water_jug_dfs()
4.Display the complete state-space tree for the Water Jug Problem.
from collections import deque
visited = set()
queue = deque([(0, 0)])
print("State Space Tree:")
while queue:
state = queue.popleft()
if state in visited:
continue
visited.add(state)
print(state)
a, b = state
next_states = [
(5, b), # Fill 5L
(a, 3), # Fill 3L
(0, b), # Empty 5L
(a, 0), # Empty 3L
(a - min(a, 3 - b), b + min(a, 3 - b)),
(a + min(b, 5 - a), b - min(b, 5 - a))
]
for s in next_states:
if s not in visited:
queue.append(s)
5.Write a program to solve the Water Jug Problem using State Space Search.
from collections import deque
def water_jug():
queue = deque([(0, 0)]) # Initial state
visited = set()
while queue:
state = queue.popleft()
if state in visited:
continue
visited.add(state)
print(state)
a, b = state
# Goal State: 4L in 5L jug
if a == 4:
print("Goal Achieved!")
return
# Generate next states
next_states = [
(5, b), # Fill 5L jug
(a, 3), # Fill 3L jug
(0, b), # Empty 5L jug
(a, 0), # Empty 3L jug
(a - min(a, 3 - b), b + min(a, 3 - b)), # Pour 5L -> 3L
(a + min(b, 5 - a), b - min(b, 5 - a)) # Pour 3L -> 5L
]
for s in next_states:
if s not in visited:
queue.append(s)
water_jug()
6.Obtain exactly 2 liters using 4L and 3L jugs.
a = 0 # 4L jug
b = 0 # 3L jug
print((a, b))
# Fill 3L jug
b = 3
print((a, b))
# Pour into 4L jug
a = 3
b = 0
print((a, b))
# Fill 3L jug again
b = 3
print((a, b))
# Pour until 4L jug is full
a = 4
b = 2
print((a, b))
print("Goal Achieved: 2 Liters in 3L Jug")
7.Display all intermediate states of the Water Jug Problem.
states = [
(0, 0),
(5, 0),
(2, 3),
(2, 0),
(0, 2),
(5, 2),
(4, 3)
]
print("Intermediate States:")
for state in states:
print(state)
print("Goal Achieved!")
-----------------------------------------------------------------------------------------------
Tic Tac Toe
1.Implement Tic Tac Toe with Minimax Algorithm.
import math
board = [' ' for _ in range(9)]
def print_board():
for i in range(0, 9, 3):
print(board[i], '|', board[i+1], '|', board[i+2])
if i < 6:
print("---------")
def check_winner(player):
win_pos = [
[0,1,2],[3,4,5],[6,7,8],
[0,3,6],[1,4,7],[2,5,8],
[0,4,8],[2,4,6]
]
for pos in win_pos:
if board[pos[0]] == board[pos[1]] == board[pos[2]] == player:
return True
return False
def is_draw():
return ' ' not in board
def minimax(is_max):
if check_winner('O'):
return 1
if check_winner('X'):
return -1
if is_draw():
return 0
if is_max:
best = -math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
best = max(best, minimax(False))
board[i] = ' '
return best
else:
best = math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'X'
best = min(best, minimax(True))
board[i] = ' '
return best
def computer_move():
best_score = -math.inf
move = -1
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
score = minimax(False)
board[i] = ' '
if score > best_score:
best_score = score
move = i
board[move] = 'O'
while True:
print_board()
pos = int(input("Enter position (1-9): ")) - 1
if board[pos] == ' ':
board[pos] = 'X'
else:
print("Invalid Move")
continue
if check_winner('X'):
print_board()
print("You Win!")
break
if is_draw():
print_board()
print("Match Draw!")
break
computer_move()
if check_winner('O'):
print_board()
print("Computer Wins!")
break
if is_draw():
print_board()
print("Match Draw!")
break
2.Modify Tic Tac Toe to allow Human vs Computer mode.
import random
board = [' '] * 9
def show_board():
print()
print(board[0], '|', board[1], '|', board[2])
print("---------")
print(board[3], '|', board[4], '|', board[5])
print("---------")
print(board[6], '|', board[7], '|', board[8])
print()
def check_win(player):
wins = [
[0,1,2], [3,4,5], [6,7,8],
[0,3,6], [1,4,7], [2,5,8],
[0,4,8], [2,4,6]
]
for w in wins:
if board[w[0]] == board[w[1]] == board[w[2]] == player:
return True
return False
while True:
show_board()
# Human Move
pos = int(input("Enter position (1-9): ")) - 1
if board[pos] != ' ':
print("Position already occupied!")
continue
board[pos] = 'X'
if check_win('X'):
show_board()
print("Human Wins!")
break
if ' ' not in board:
show_board()
print("Draw!")
break
# Computer Move
empty = [i for i in range(9) if board[i] == ' ']
move = random.choice(empty)
board[move] = 'O'
print("Computer Played")
if check_win('O'):
show_board()
print("Computer Wins!")
break
if ' ' not in board:
show_board()
print("Draw!")
break
3.Implement Alpha-Beta Pruning for Tic Tac Toe.
import math
board = [' '] * 9
def print_board():
for i in range(0, 9, 3):
print(board[i], '|', board[i+1], '|', board[i+2])
if i < 6:
print("---------")
def winner(player):
wins = [
[0,1,2], [3,4,5], [6,7,8],
[0,3,6], [1,4,7], [2,5,8],
[0,4,8], [2,4,6]
]
for w in wins:
if board[w[0]] == board[w[1]] == board[w[2]] == player:
return True
return False
def draw():
return ' ' not in board
def alphabeta(is_max, alpha, beta):
if winner('O'):
return 1
if winner('X'):
return -1
if draw():
return 0
if is_max:
best = -math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
score = alphabeta(False, alpha, beta)
board[i] = ' '
best = max(best, score)
alpha = max(alpha, best)
if beta <= alpha:
break
return best
else:
best = math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'X'
score = alphabeta(True, alpha, beta)
board[i] = ' '
best = min(best, score)
beta = min(beta, best)
if beta <= alpha:
break
return best
def computer_move():
best_score = -math.inf
move = -1
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
score = alphabeta(False, -math.inf, math.inf)
board[i] = ' '
if score > best_score:
best_score = score
move = i
board[move] = 'O'
while True:
print_board()
pos = int(input("Enter position (1-9): ")) - 1
if board[pos] != ' ':
print("Invalid Move")
continue
board[pos] = 'X'
if winner('X'):
print_board()
print("You Win!")
break
if draw():
print_board()
print("Match Draw!")
break
computer_move()
if winner('O'):
print_board()
print("Computer Wins!")
break
if draw():
print_board()
print("Match Draw!")
break
4.Display evaluation scores for all possible moves.
import math
board = [' '] * 9
def print_board():
print()
print(board[0], '|', board[1], '|', board[2])
print("---------")
print(board[3], '|', board[4], '|', board[5])
print("---------")
print(board[6], '|', board[7], '|', board[8])
print()
def winner(player):
wins = [
[0,1,2], [3,4,5], [6,7,8],
[0,3,6], [1,4,7], [2,5,8],
[0,4,8], [2,4,6]
]
for w in wins:
if board[w[0]] == board[w[1]] == board[w[2]] == player:
return True
return False
def draw():
return ' ' not in board
def minimax(is_max):
if winner('O'):
return 1
if winner('X'):
return -1
if draw():
return 0
if is_max:
best = -math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
best = max(best, minimax(False))
board[i] = ' '
return best
else:
best = math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'X'
best = min(best, minimax(True))
board[i] = ' '
return best
def computer_move():
best_score = -math.inf
best_move = -1
print("\nEvaluation Scores:")
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
score = minimax(False)
board[i] = ' '
print("Position", i + 1, "-> Score =", score)
if score > best_score:
best_score = score
best_move = i
board[best_move] = 'O'
print("Computer chooses position", best_move + 1)
while True:
print_board()
try:
pos = int(input("Enter position (1-9): ")) - 1
if pos < 0 or pos > 8:
print("Enter a number between 1 and 9")
continue
if board[pos] != ' ':
print("Position already occupied")
continue
except ValueError:
print("Please enter only numbers")
continue
board[pos] = 'X'
if winner('X'):
print_board()
print("You Win!")
break
if draw():
print_board()
print("Match Draw!")
break
computer_move()
if winner('O'):
print_board()
print("Computer Wins!")
break
if draw():
print_board()
print("Match Draw!")
break
5.Compare Minimax and Alpha-Beta Pruning results.
import math
board = [' '] * 9
minimax_nodes = 0
alphabeta_nodes = 0
def winner(player):
wins = [
[0,1,2],[3,4,5],[6,7,8],
[0,3,6],[1,4,7],[2,5,8],
[0,4,8],[2,4,6]
]
for w in wins:
if board[w[0]] == board[w[1]] == board[w[2]] == player:
return True
return False
def draw():
return ' ' not in board
def minimax(is_max):
global minimax_nodes
minimax_nodes += 1
if winner('O'):
return 1
if winner('X'):
return -1
if draw():
return 0
if is_max:
best = -math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
best = max(best, minimax(False))
board[i] = ' '
return best
else:
best = math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'X'
best = min(best, minimax(True))
board[i] = ' '
return best
def alphabeta(is_max, alpha, beta):
global alphabeta_nodes
alphabeta_nodes += 1
if winner('O'):
return 1
if winner('X'):
return -1
if draw():
return 0
if is_max:
best = -math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
best = max(best, alphabeta(False, alpha, beta))
board[i] = ' '
alpha = max(alpha, best)
if beta <= alpha:
break
return best
else:
best = math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'X'
best = min(best, alphabeta(True, alpha, beta))
board[i] = ' '
beta = min(beta, best)
if beta <= alpha:
break
return best
# Compare Results
minimax_score = minimax(True)
alphabeta_score = alphabeta(True, -math.inf, math.inf)
print("Minimax Score:", minimax_score)
print("Minimax Nodes Explored:", minimax_nodes)
print()
print("Alpha-Beta Score:", alphabeta_score)
print("Alpha-Beta Nodes Explored:", alphabeta_nodes)
7.Write a program where AI never loses in Tic Tac Toe.
import math
board = [' '] * 9
def print_board():
print()
print(board[0], '|', board[1], '|', board[2])
print("---------")
print(board[3], '|', board[4], '|', board[5])
print("---------")
print(board[6], '|', board[7], '|', board[8])
print()
def winner(player):
wins = [
[0,1,2],[3,4,5],[6,7,8],
[0,3,6],[1,4,7],[2,5,8],
[0,4,8],[2,4,6]
]
for w in wins:
if board[w[0]] == board[w[1]] == board[w[2]] == player:
return True
return False
def draw():
return ' ' not in board
def minimax(is_max):
if winner('O'):
return 1
if winner('X'):
return -1
if draw():
return 0
if is_max:
best = -math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
best = max(best, minimax(False))
board[i] = ' '
return best
else:
best = math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'X'
best = min(best, minimax(True))
board[i] = ' '
return best
def ai_move():
best_score = -math.inf
move = -1
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
score = minimax(False)
board[i] = ' '
if score > best_score:
best_score = score
move = i
board[move] = 'O'
while True:
print_board()
try:
pos = int(input("Enter position (1-9): ")) - 1
if pos < 0 or pos > 8:
print("Enter a number between 1 and 9")
continue
if board[pos] != ' ':
print("Position already occupied")
continue
except ValueError:
print("Please enter numbers only")
continue
board[pos] = 'X'
if winner('X'):
print_board()
print("You Win!")
break
if draw():
print_board()
print("Match Draw!")
break
ai_move()
if winner('O'):
print_board()
print("AI Wins!")
break
if draw():
print_board()
print("Match Draw!")
break
8.Evaluate best move using Minimax.
import math
board = [
'X', 'O', 'X',
' ', 'O', ' ',
' ', ' ', 'X'
]
def winner(player):
wins = [
[0,1,2],[3,4,5],[6,7,8],
[0,3,6],[1,4,7],[2,5,8],
[0,4,8],[2,4,6]
]
for w in wins:
if board[w[0]] == board[w[1]] == board[w[2]] == player:
return True
return False
def draw():
return ' ' not in board
def minimax(is_max):
if winner('O'):
return 1
if winner('X'):
return -1
if draw():
return 0
if is_max:
best = -math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
best = max(best, minimax(False))
board[i] = ' '
return best
else:
best = math.inf
for i in range(9):
if board[i] == ' ':
board[i] = 'X'
best = min(best, minimax(True))
board[i] = ' '
return best
best_score = -math.inf
best_move = -1
print("Evaluation Scores:")
for i in range(9):
if board[i] == ' ':
board[i] = 'O'
score = minimax(False)
board[i] = ' '
print("Position", i + 1, "Score =", score)
if score > best_score:
best_score = score
best_move = i
print("\nBest Move =", best_move + 1)
print("Best Score =", best_score)
-----------------------------------------------------------------------------------------------
Unit 2: Graph Search Algorithms
BFS
1.Implement BFS for a graph with at least 8 nodes.
from collections import deque
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': ['H'],
'F': [],
'G': [],
'H': []
}
visited = []
queue = deque()
start = 'A'
visited.append(start)
queue.append(start)
print("BFS Traversal:")
while queue:
node = queue.popleft()
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor)
2.Find the shortest path between two given nodes using BFS.
from collections import deque
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': ['H'],
'F': [],
'G': [],
'H': []
}
def bfs_shortest_path(start, goal):
queue = deque([[start]])
visited = []
while queue:
path = queue.popleft()
node = path[-1]
if node == goal:
return path
if node not in visited:
visited.append(node)
for neighbor in graph[node]:
new_path = list(path)
new_path.append(neighbor)
queue.append(new_path)
return None
start = 'A'
goal = 'H'
path = bfs_shortest_path(start, goal)
print("Shortest Path:")
print(" -> ".join(path))
3.Traverse a tree using BFS and display level-wise nodes.
from collections import deque
tree = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': ['H'],
'G': [],
'H': []
}
queue = deque([('A', 0)])
current_level = -1
while queue:
node, level = queue.popleft()
if level != current_level:
current_level = level
print("\nLevel", level, ":", end=" ")
print(node, end=" ")
for child in tree[node]:
queue.append((child, level + 1))
4.Compare BFS traversal using adjacency matrix and adjacency list.
BFS Using Adjacency Matrix
from collections import deque
nodes = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
graph = [
[0,1,1,0,0,0,0], # A
[1,0,0,1,1,0,0], # B
[1,0,0,0,0,1,0], # C
[0,1,0,0,0,0,0], # D
[0,1,0,0,0,0,1], # E
[0,0,1,0,0,0,0], # F
[0,0,0,0,1,0,0] # G
]
visited = [False] * len(nodes)
queue = deque([0])
visited[0] = True
print("BFS using Adjacency Matrix:")
while queue:
v = queue.popleft()
print(nodes[v], end=" ")
for i in range(len(nodes)):
if graph[v][i] == 1 and not visited[i]:
visited[i] = True
queue.append(i)
BFS Using Adjacency List
from collections import deque
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F'],
'D': [],
'E': ['G'],
'F': [],
'G': []
}
visited = []
queue = deque(['A'])
visited.append('A')
print("BFS using Adjacency List:")
while queue:
node = queue.popleft()
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor)
5.Implement BFS for a social network graph.
from collections import deque
social_network = {
'Alice': ['Bob', 'Charlie'],
'Bob': ['Alice', 'David', 'Eve'],
'Charlie': ['Alice', 'Frank'],
'David': ['Bob'],
'Eve': ['Bob', 'Grace'],
'Frank': ['Charlie'],
'Grace': ['Eve']
}
visited = []
queue = deque()
start = 'Alice'
visited.append(start)
queue.append(start)
print("BFS Traversal of Social Network:")
while queue:
person = queue.popleft()
print(person, end=" ")
for friend in social_network[person]:
if friend not in visited:
visited.append(friend)
queue.append(friend)
6.Implement BFS traversal of a graph.
from collections import deque
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': [],
'G': []
}
visited = []
queue = deque()
start = 'A'
visited.append(start)
queue.append(start)
print("BFS Traversal:")
while queue:
node = queue.popleft()
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor)
7.Find BFS traversal starting from node A.
from collections import deque
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': [],
'G': []
}
visited = []
queue = deque()
start = 'A'
visited.append(start)
queue.append(start)
print("BFS Traversal starting from A:")
while queue:
node = queue.popleft()
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor)
8.Implement BFS using Queue.
from collections import deque
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': [],
'G': []
}
visited = []
queue = deque()
start = 'A'
visited.append(start)
queue.append(start)
print("BFS Traversal:")
while queue:
node = queue.popleft() # Remove from front
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor) # Insert at rear
-----------------------------------------------------------------------------------------------
DFS
1.Implement recursive DFS.
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': [],
'G': []
}
visited = []
def dfs(node):
if node not in visited:
print(node, end=" ")
visited.append(node)
for neighbor in graph[node]:
dfs(neighbor)
print("DFS Traversal:")
dfs('A')
2.Implement iterative DFS using stack.
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': [],
'G': []
}
visited = []
stack = ['A']
print("DFS Traversal:")
while stack:
node = stack.pop()
if node not in visited:
print(node, end=" ")
visited.append(node)
# Add neighbors in reverse order
for neighbor in reversed(graph[node]):
stack.append(neighbor)
3.Detect cycles in a graph using DFS.
graph = {
'A': ['B', 'D'],
'B': ['A', 'C'],
'C': ['B', 'D'],
'D': ['A', 'C']
}
visited = set()
def dfs(node, parent):
visited.add(node)
for neighbor in graph[node]:
if neighbor not in visited:
if dfs(neighbor, node):
return True
elif neighbor != parent:
return True
return False
if dfs('A', None):
print("Cycle Detected")
else:
print("No Cycle Found")
4.Find connected components using DFS.
graph = {
'A': ['B', 'C'],
'B': ['A'],
'C': ['A'],
'D': ['E'],
'E': ['D'],
'F': []
}
visited = set()
def dfs(node):
visited.add(node)
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
dfs(neighbor)
print("Connected Components:")
for node in graph:
if node not in visited:
print("\nComponent:", end=" ")
dfs(node)
5.Compare DFS and BFS traversal orders.
from collections import deque
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': [],
'G': []
}
# DFS
visited = []
def dfs(node):
if node not in visited:
print(node, end=" ")
visited.append(node)
for neighbor in graph[node]:
dfs(neighbor)
print("DFS Traversal:")
dfs('A')
print("\n")
# BFS
visited = []
queue = deque(['A'])
visited.append('A')
print("BFS Traversal:")
while queue:
node = queue.popleft()
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
visited.append(neighbor)
queue.append(neighbor)
6.Perform DFS using stack.
graph = {
'A': ['B', 'C'],
'B': ['D', 'E'],
'C': ['F', 'G'],
'D': [],
'E': [],
'F': [],
'G': []
}
visited = []
stack = ['A']
print("DFS Traversal using Stack:")
while stack:
node = stack.pop()
if node not in visited:
print(node, end=" ")
visited.append(node)
for neighbor in reversed(graph[node]):
stack.append(neighbor)
-----------------------------------------------------------------------------------------------
Informed Search
A* Algorithm
1.Implement A* for route finding between cities.
graph = {
'A': {'B': 4, 'C': 3},
'B': {'D': 5, 'E': 12},
'C': {'F': 7},
'D': {'G': 3},
'E': {'G': 2},
'F': {'G': 2},
'G': {}
}
heuristic = {
'A': 10,
'B': 8,
'C': 7,
'D': 3,
'E': 2,
'F': 1,
'G': 0
}
def astar(start, goal):
open_list = [(start, 0)]
visited = {}
while open_list:
open_list.sort(key=lambda x: x[1])
current, cost = open_list.pop(0)
if current == goal:
return cost
visited[current] = True
for neighbor, distance in graph[current].items():
if neighbor not in visited:
f = cost + distance + heuristic[neighbor]
open_list.append((neighbor, f))
return None
start = 'A'
goal = 'G'
result = astar(start, goal)
print("Estimated Cost from", start, "to", goal, "=", result)
2.Solve the 8-Puzzle Problem using A*.
from queue import PriorityQueue
goal = [1, 2, 3,
4, 5, 6,
7, 8, 0]
start = [1, 2, 3,
4, 0, 6,
7, 5, 8]
def heuristic(state):
count = 0
for i in range(9):
if state[i] != 0 and state[i] != goal[i]:
count += 1
return count
def get_neighbors(state):
neighbors = []
pos = state.index(0)
moves = {
0:[1,3], 1:[0,2,4], 2:[1,5],
3:[0,4,6], 4:[1,3,5,7], 5:[2,4,8],
6:[3,7], 7:[4,6,8], 8:[5,7]
}
for move in moves[pos]:
new_state = state[:]
new_state[pos], new_state[move] = new_state[move], new_state[pos]
neighbors.append(new_state)
return neighbors
pq = PriorityQueue()
pq.put((heuristic(start), start))
visited = []
while not pq.empty():
cost, state = pq.get()
if state == goal:
print("Goal Reached!")
print(state[0:3])
print(state[3:6])
print(state[6:9])
break
visited.append(state)
for neighbor in get_neighbors(state):
if neighbor not in visited:
f = heuristic(neighbor)
pq.put((f, neighbor))
3.Compare A* and BFS path lengths.
from collections import deque
graph = {
'A': {'B': 4, 'C': 3},
'B': {'D': 5},
'C': {'F': 7},
'D': {'G': 3},
'F': {'G': 2},
'G': {}
}
heuristic = {
'A': 7,
'B': 5,
'C': 4,
'D': 2,
'F': 1,
'G': 0
}
# BFS
def bfs(start, goal):
queue = deque([[start]])
while queue:
path = queue.popleft()
node = path[-1]
if node == goal:
return path
for neighbor in graph[node]:
new_path = path + [neighbor]
queue.append(new_path)
# A*
def astar(start, goal):
open_list = [(start, [start], 0)]
while open_list:
open_list.sort(key=lambda x: x[2])
node, path, cost = open_list.pop(0)
if node == goal:
return path
for neighbor, distance in graph[node].items():
f = cost + distance + heuristic[neighbor]
open_list.append((neighbor, path + [neighbor], f))
bfs_path = bfs('A', 'G')
astar_path = astar('A', 'G')
print("BFS Path :", " -> ".join(bfs_path))
print("BFS Length :", len(bfs_path)-1)
print()
print("A* Path :", " -> ".join(astar_path))
print("A* Length :", len(astar_path)-1)
4.Calculate heuristic values manually and verify using code.
heuristic = {
'A': 7,
'B': 5,
'C': 4,
'D': 2,
'G': 0
}
print("Heuristic Values:")
for node in heuristic:
print(node, "=", heuristic[node])
5.Implement A* using Manhattan Distance heuristic.
The evaluation function used is:
f(n)=g(n)+h(n)
from queue import PriorityQueue
goal = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 0]
]
start = [
[1, 2, 3],
[4, 0, 6],
[7, 5, 8]
]
# Manhattan Distance Heuristic
def manhattan(state):
distance = 0
for i in range(3):
for j in range(3):
value = state[i][j]
if value != 0:
for x in range(3):
for y in range(3):
if goal[x][y] == value:
distance += abs(i - x) + abs(j - y)
return distance
# A* Search
pq = PriorityQueue()
h = manhattan(start)
g = 0
f = g + h
pq.put((f, start))
visited = []
while not pq.empty():
f, state = pq.get()
print("Current State:")
for row in state:
print(row)
print("Heuristic =", manhattan(state))
if state == goal:
print("\nGoal Reached!")
break
visited.append(state)
break
6.Implement A* Search Algorithm.
graph = {
'A': {'B': 1, 'C': 4},
'B': {'D': 2, 'E': 5},
'C': {'F': 3},
'D': {'G': 1},
'E': {'G': 2},
'F': {'G': 2},
'G': {}
}
heuristic = {
'A': 7,
'B': 6,
'C': 4,
'D': 2,
'E': 1,
'F': 1,
'G': 0
}
def astar(start, goal):
open_list = [(start, [start], 0)]
while open_list:
open_list.sort(key=lambda x: x[2])
node, path, cost = open_list.pop(0)
if node == goal:
return path, cost
for neighbor, distance in graph[node].items():
g = cost + distance
h = heuristic[neighbor]
f = g + h
open_list.append((neighbor, path + [neighbor], f))
return None
path, cost = astar('A', 'G')
print("Path :", " -> ".join(path))
print("Cost :", cost)
7.Find shortest path using A*.
graph = {
'A': {'B': 1, 'C': 4},
'B': {'D': 2, 'E': 5},
'C': {'F': 3},
'D': {'G': 1},
'E': {'G': 2},
'F': {'G': 2},
'G': {}
}
heuristic = {
'A': 7,
'B': 6,
'C': 4,
'D': 2,
'E': 1,
'F': 1,
'G': 0
}
def astar(start, goal):
open_list = [(start, [start], 0)]
while open_list:
open_list.sort(key=lambda x: x[2])
node, path, cost = open_list.pop(0)
if node == goal:
return path
for neighbor, distance in graph[node].items():
g = cost + distance
h = heuristic[neighbor]
f = g + h
open_list.append((neighbor, path + [neighbor], f))
return None
path = astar('A', 'G')
print("Shortest Path:")
print(" -> ".join(path))
8.Calculate f(n), g(n), h(n) values.
heuristic = {
'A': 7,
'B': 6,
'D': 2,
'G': 0
}
g = {
'A': 0,
'B': 1,
'D': 3,
'G': 4
}
print("Node\tg(n)\th(n)\tf(n)")
for node in g:
f = g[node] + heuristic[node]
print(node, "\t", g[node], "\t", heuristic[node], "\t", f)
9.Use heuristic function in A*.
graph = {
'A': {'B': 1, 'C': 4},
'B': {'D': 2, 'E': 5},
'C': {'F': 3},
'D': {'G': 1},
'E': {'G': 2},
'F': {'G': 2},
'G': {}
}
# Heuristic values
heuristic = {
'A': 7,
'B': 6,
'C': 4,
'D': 2,
'E': 1,
'F': 1,
'G': 0
}
def astar(start, goal):
open_list = [(start, [start], 0)]
while open_list:
open_list.sort(key=lambda x: x[2])
node, path, cost = open_list.pop(0)
if node == goal:
return path
for neighbor, distance in graph[node].items():
g = cost + distance
h = heuristic[neighbor]
f = g + h
print("Node:", neighbor, "g=", g, "h=", h, "f=", f)
open_list.append((neighbor, path + [neighbor], f))
return None
path = astar('A', 'G')
print("\nShortest Path:")
print(" -> ".join(path))
-----------------------------------------------------------------------------------------------
Regression Jupiter notebook use kiya hai
Linear Regression
1.Predict house prices using Linear Regression.
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
# Training Data
X = np.array([[1000], [1500], [2000], [2500], [3000]])
y = np.array([200000, 300000, 400000, 500000, 600000])
# Create and Train Model
model = LinearRegression()
model.fit(X, y)
# Prediction
predicted_price = model.predict([[2200]])
print("Predicted House Price =", predicted_price[0])
# Plot Graph
plt.scatter(X, y)
plt.plot(X, model.predict(X))
plt.xlabel("House Size")
plt.ylabel("House Price")
plt.title("House Price Prediction using Linear Regression")
plt.show()
2.Predict student performance based on study hours.
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
# Study Hours
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
# Marks Obtained
y = np.array([20, 30, 40, 50, 60, 70, 80, 90])
# Create Model
model = LinearRegression()
# Train Model
model.fit(X, y)
# Predict Marks for 6.5 hours study
prediction = model.predict([[6.5]])
print("Predicted Marks =", prediction[0])
# Plot Graph
plt.scatter(X, y)
plt.plot(X, model.predict(X))
plt.xlabel("Study Hours")
plt.ylabel("Marks")
plt.title("Student Performance Prediction")
plt.show()
3.Plot the regression line and interpret the slope.
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
# Study Hours
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
# Marks
y = np.array([20, 30, 40, 50, 60, 70, 80, 90])
# Create and train model
model = LinearRegression()
model.fit(X, y)
# Slope and Intercept
print("Slope (m) =", model.coef_[0])
print("Intercept (b) =", model.intercept_)
# Plot
plt.scatter(X, y, label="Data Points")
plt.plot(X, model.predict(X), label="Regression Line")
plt.xlabel("Study Hours")
plt.ylabel("Marks")
plt.title("Regression Line")
plt.legend()
plt.show()
4.Compare actual and predicted values.
from sklearn.linear_model import LinearRegression
import numpy as np
# Study Hours
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
# Actual Marks
y = np.array([20, 30, 40, 50, 60, 70, 80, 90])
# Train Model
model = LinearRegression()
model.fit(X, y)
# Predicted Marks
predicted = model.predict(X)
print("Hours\tActual\tPredicted")
for i in range(len(X)):
print(X[i][0], "\t", y[i], "\t", round(predicted[i], 2))
5.Calculate Mean Squared Error (MSE) for the model.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
# Study Hours
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
# Marks
y = np.array([20, 30, 40, 50, 60, 70, 80, 90])
# Train Model
model = LinearRegression()
model.fit(X, y)
# Predict Values
y_pred = model.predict(X)
# Calculate MSE
mse = mean_squared_error(y, y_pred)
print("Actual Values :", y)
print("Predicted Values :", y_pred)
print("Mean Squared Error =", mse)
7.Predict house prices using Linear Regression.
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
# House Size (sq ft)
X = np.array([[1000], [1500], [2000], [2500], [3000]])
# House Prices
y = np.array([200000, 300000, 400000, 500000, 600000])
# Create Model
model = LinearRegression()
# Train Model
model.fit(X, y)
# Predict Price for 2200 sq ft
predicted_price = model.predict([[2200]])
print("Predicted House Price =", predicted_price[0])
# Plot Graph
plt.scatter(X, y, label="Actual Data")
plt.plot(X, model.predict(X), label="Regression Line")
plt.xlabel("House Size (sq ft)")
plt.ylabel("House Price")
plt.title("House Price Prediction using Linear Regression")
plt.legend()
plt.show()
8.Predict student marks using Linear Regression.
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
# Study Hours
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
# Student Marks
y = np.array([20, 30, 40, 50, 60, 70, 80, 90])
# Create and train model
model = LinearRegression()
model.fit(X, y)
# Predict marks for 6.5 hours
predicted_marks = model.predict([[6.5]])
print("Predicted Marks =", predicted_marks[0])
# Plot graph
plt.scatter(X, y, label="Actual Data")
plt.plot(X, model.predict(X), label="Regression Line")
plt.xlabel("Study Hours")
plt.ylabel("Marks")
plt.title("Student Marks Prediction")
plt.legend()
plt.show()
9.Train a Linear Regression model.
from sklearn.linear_model import LinearRegression
import numpy as np
# Input data (Study Hours)
X = np.array([[1], [2], [3], [4], [5]])
# Output data (Marks)
y = np.array([20, 40, 60, 80, 100])
# Create model
model = LinearRegression()
# Train model
model.fit(X, y)
print("Model Trained Successfully!")
# Display slope and intercept
print("Slope =", model.coef_[0])
print("Intercept =", model.intercept_)
10.Plot Regression Line.
from sklearn.linear_model import LinearRegression
import numpy as np
import matplotlib.pyplot as plt
# Study Hours
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
# Marks
y = np.array([20, 30, 40, 50, 60, 70, 80, 90])
# Train Model
model = LinearRegression()
model.fit(X, y)
# Plot Data Points
plt.scatter(X, y, label="Actual Data")
# Plot Regression Line
plt.plot(X, model.predict(X), label="Regression Line")
plt.xlabel("Study Hours")
plt.ylabel("Marks")
plt.title("Linear Regression")
plt.legend()
plt.show()
-----------------------------------------------------------------------------------------------
Classification
Logistic Regression
1.Predict whether a student will pass or fail based on marks.
from sklearn.linear_model import LogisticRegression
import numpy as np
# Marks
X = np.array([[20], [30], [40], [50], [60], [70], [80], [90]])
# Result (0 = Fail, 1 = Pass)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
# Create Model
model = LogisticRegression()
# Train Model
model.fit(X, y)
# Predict for a student with 55 marks
prediction = model.predict([[55]])
if prediction[0] == 1:
print("Pass")
else:
print("Fail")
#Check Multiple Predictions
test_marks = [[35], [55], [75]]
predictions = model.predict(test_marks)
for i in range(len(test_marks)):
print("Marks =", test_marks[i][0], "Result =", predictions[i])
2.Classify customers as buyers/non-buyers.
from sklearn.linear_model import LogisticRegression
import numpy as np
# Customer Age
X = np.array([[18], [22], [25], [28], [35], [40], [45], [50]])
# 0 = Non-Buyer, 1 = Buyer
y = np.array([0, 0, 0, 1, 1, 1, 1, 1])
# Create Model
model = LogisticRegression()
# Train Model
model.fit(X, y)
# Predict for a new customer
age = 30
prediction = model.predict([[age]])
if prediction[0] == 1:
print("Buyer")
else:
print("Non-Buyer")
#Predict Multiple Customers
test_data = [[20], [30], [48]]
result = model.predict(test_data)
for i in range(len(test_data)):
print("Age =", test_data[i][0], "Prediction =", result[i])
3.Evaluate Logistic Regression using confusion matrix.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
import numpy as np
# Marks
X = np.array([[20], [30], [40], [50], [60], [70], [80], [90]])
# 0 = Fail, 1 = Pass
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
# Train Model
model = LogisticRegression()
model.fit(X, y)
# Predictions
y_pred = model.predict(X)
# Confusion Matrix
cm = confusion_matrix(y, y_pred)
print("Confusion Matrix:")
print(cm)
4.Compare Logistic Regression and Decision Tree accuracy.
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import numpy as np
# Marks
X = np.array([[20], [30], [40], [50], [60], [70], [80], [90]])
# 0 = Fail, 1 = Pass
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
# Split Data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42
)
# Logistic Regression
lr = LogisticRegression()
lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)
# Decision Tree
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
# Accuracy
lr_acc = accuracy_score(y_test, lr_pred)
dt_acc = accuracy_score(y_test, dt_pred)
print("Logistic Regression Accuracy =", lr_acc)
print("Decision Tree Accuracy =", dt_acc)
5.Plot the sigmoid curve and explain its behavior.
import numpy as np
import matplotlib.pyplot as plt
# Generate x values
x = np.linspace(-10, 10, 100)
# Sigmoid function
y = 1 / (1 + np.exp(-x))
# Plot
plt.plot(x, y)
plt.xlabel("x")
plt.ylabel("Sigmoid(x)")
plt.title("Sigmoid Curve")
plt.grid(True)
plt.show()
6.Implement Logistic Regression for classification.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
# Create model
model = LogisticRegression(max_iter=200)
# Train model
model.fit(X_train, y_train)
# Prediction
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Predicted Values:", y_pred)
print("Accuracy:", accuracy)
7.Predict Pass/Fail using Logistic Regression.
from sklearn.linear_model import LogisticRegression
import numpy as np
# Student marks
X = np.array([[20], [30], [35], [40], [50], [60], [70], [80], [90]])
# Pass(1) / Fail(0)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1])
# Create model
model = LogisticRegression()
# Train model
model.fit(X, y)
# Predict for a student with 45 marks
prediction = model.predict([[45]])
if prediction[0] == 1:
print("Pass")
else:
print("Fail")
8.Train and test Logistic Regression model.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
# Create Logistic Regression model
model = LogisticRegression(max_iter=200)
# Train the model
model.fit(X_train, y_train)
# Test the model
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Actual Values:")
print(y_test)
print("\nPredicted Values:")
print(y_pred)
print("\nAccuracy:", accuracy)
-----------------------------------------------------------------------------------------------
Decision Tree
1.Build a Decision Tree for the Iris dataset.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Create model
model = DecisionTreeClassifier()
# Train model
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
accuracy = metrics.accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
2.Visualize the generated Decision Tree.
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Train model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Plot Decision Tree
plt.figure(figsize=(15,10))
plot_tree(
model,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True
)
plt.title("Decision Tree Visualization (Iris Dataset)")
plt.show()
3.Calculate Gini Index for a sample dataset.
# Sample dataset
yes = 4
no = 6
total = yes + no
# probabilities
p_yes = yes / total
p_no = no / total
# Gini Index calculation
gini = 1 - (p_yes**2 + p_no**2)
print("Gini Index:", gini)
4.Compare Entropy and Gini criteria.
import math
# Sample dataset
yes = 4
no = 6
total = yes + no
# Probabilities
p_yes = yes / total
p_no = no / total
# -----------------------
# GINI INDEX
# -----------------------
gini = 1 - (p_yes**2 + p_no**2)
# -----------------------
# ENTROPY
# -----------------------
entropy = - (p_yes * math.log2(p_yes) + p_no * math.log2(p_no))
# -----------------------
# OUTPUT
# -----------------------
print("Probability Yes:", p_yes)
print("Probability No:", p_no)
print("\nGini Index:", gini)
print("Entropy:", entropy)
5.Prune a Decision Tree and compare accuracy.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# ----------------------------
# 1. UNPRUNED TREE
# ----------------------------
model1 = DecisionTreeClassifier()
model1.fit(X_train, y_train)
y_pred1 = model1.predict(X_test)
acc1 = metrics.accuracy_score(y_test, y_pred1)
# ----------------------------
# 2. PRUNED TREE (max_depth)
# ----------------------------
model2 = DecisionTreeClassifier(max_depth=3)
model2.fit(X_train, y_train)
y_pred2 = model2.predict(X_test)
acc2 = metrics.accuracy_score(y_test, y_pred2)
# ----------------------------
# RESULTS
# ----------------------------
print("Accuracy (Unpruned Tree):", acc1)
print("Accuracy (Pruned Tree):", acc2)
6.Implement Decision Tree Classifier.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Create Decision Tree model
model = DecisionTreeClassifier()
# Train model
model.fit(X_train, y_train)
# Predict output
y_pred = model.predict(X_test)
# Accuracy calculation
accuracy = metrics.accuracy_score(y_test, y_pred)
print("Decision Tree Classifier Accuracy:", accuracy)
7.Classify Iris dataset using Decision Tree.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Create model
model = DecisionTreeClassifier()
# Train model
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
# Detailed classification report
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))
8.Visualize Decision Tree.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
import matplotlib.pyplot as plt
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Train model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Plot Decision Tree
plt.figure(figsize=(15,10))
plot_tree(
model,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True
)
plt.title("Decision Tree Visualization (Iris Dataset)")
plt.show()
9.Calculate model accuracy.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset into training and testing
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Create Decision Tree model
model = DecisionTreeClassifier()
# Train model
model.fit(X_train, y_train)
# Predict test data
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)
print("Accuracy Percentage:", accuracy * 100, "%")
-----------------------------------------------------------------------------------------------
KNN
1.Implement KNN for Iris dataset classification.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create KNN Model
knn = KNeighborsClassifier(n_neighbors=3)
# Train Model
knn.fit(X_train, y_train)
# Predict
y_pred = knn.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict a new flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = knn.predict(sample)
print("Predicted Class =", iris.target_names[prediction[0]])
2.Compare accuracy for K = 1, 3, 5, 7.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Different K values
k_values = [1, 3, 5, 7]
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("K =", k, " Accuracy =", round(accuracy * 100, 2), "%")
#Display Results in a Table
results = []
for k in [1, 3, 5, 7]:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
acc = accuracy_score(y_test, y_pred)
results.append([k, round(acc * 100, 2)])
print("K\tAccuracy")
for r in results:
print(r[0], "\t", r[1], "%")
3.Implement weighted KNN.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Weighted KNN
knn = KNeighborsClassifier(
n_neighbors=5,
weights='distance'
)
# Train Model
knn.fit(X_train, y_train)
# Predict
y_pred = knn.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict a new flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = knn.predict(sample)
print("Predicted Class =", iris.target_names[prediction[0]])
4.Calculate Euclidean Distance manually and verify results.
import math
x1, y1 = 2, 3
x2, y2 = 6, 7
distance = math.sqrt((x2-x1)**2 + (y2-y1)**2)
print("Euclidean Distance =", round(distance, 3))
5.Compare KNN and Decision Tree performance.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# KNN Model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
knn_pred = knn.predict(X_test)
knn_acc = accuracy_score(y_test, knn_pred)
# Decision Tree Model
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
dt_acc = accuracy_score(y_test, dt_pred)
# Display Results
print("KNN Accuracy =", round(knn_acc * 100, 2), "%")
print("Decision Tree Accuracy =", round(dt_acc * 100, 2), "%")
6.implement KNN classifier.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create KNN Classifier
knn = KNeighborsClassifier(n_neighbors=3)
# Train Model
knn.fit(X_train, y_train)
# Predict Test Data
y_pred = knn.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict New Flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = knn.predict(sample)
print("Predicted Flower =", iris.target_names[prediction[0]])
7.Classify Iris dataset using KNN.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create KNN Model
knn = KNeighborsClassifier(n_neighbors=3)
# Train Model
knn.fit(X_train, y_train)
# Predict Test Data
y_pred = knn.predict(X_test)
# Accuracy
print("Accuracy =", accuracy_score(y_test, y_pred))
# Predict New Flower
flower = [[5.1, 3.5, 1.4, 0.2]]
prediction = knn.predict(flower)
print("Predicted Flower =", iris.target_names[prediction[0]])
#Accuracy Calculation
accuracy_score(y_test, y_pred)
8.Predict class using KNN.
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Create KNN Model
knn = KNeighborsClassifier(n_neighbors=3)
# Train Model
knn.fit(X, y)
# New Flower Data
new_flower = [[5.1, 3.5, 1.4, 0.2]]
# Predict Class
prediction = knn.predict(new_flower)
print("Predicted Class:", iris.target_names[prediction[0]])
9.Compare different values of K.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Compare different K values
for k in [1, 3, 5, 7, 9]:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("K =", k, " Accuracy =", round(accuracy * 100, 2), "%")
#Simple Graph
import matplotlib.pyplot as plt
k_values = [1, 3, 5, 7, 9]
accuracy = [100, 100, 100, 96.67, 96.67]
plt.plot(k_values, accuracy, marker='o')
plt.xlabel("K Value")
plt.ylabel("Accuracy (%)")
plt.title("K vs Accuracy")
plt.show()
-----------------------------------------------------------------------------------------------
Unit 6: Probabilistic & Margin-Based Learning
Naive Bayes
1.Implement Gaussian Naive Bayes for Iris dataset.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create Gaussian Naive Bayes Model
model = GaussianNB()
# Train Model
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict New Flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = model.predict(sample)
print("Predicted Flower =", iris.target_names[prediction[0]])
2.Build an Email Spam Classifier using Naive Bayes.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# Sample Emails
emails = [
"Win a free lottery now",
"Claim your prize money",
"Meeting at 10 AM tomorrow",
"Project submission deadline",
"Free gift offer",
"Team meeting schedule"
]
# Labels
# 1 = Spam, 0 = Not Spam
labels = [1, 1, 0, 0, 1, 0]
# Convert text into numbers
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)
# Train Naive Bayes Model
model = MultinomialNB()
model.fit(X, labels)
# Test Email
test_email = ["Free prize waiting for you"]
# Convert test email
test_X = vectorizer.transform(test_email)
# Predict
prediction = model.predict(test_X)
if prediction[0] == 1:
print("Spam Email")
else:
print("Not Spam Email")
#Test More Emails
emails = [
"Congratulations you won a lottery",
"Project meeting tomorrow"
]
X_test = vectorizer.transform(emails)
predictions = model.predict(X_test)
for i in range(len(emails)):
if predictions[i] == 1:
print(emails[i], "-> Spam")
else:
print(emails[i], "-> Not Spam")
3.Calculate posterior probability for a sample record.
# Given probabilities
P_spam = 0.4
P_free_given_spam = 0.8
P_free = 0.5
# Bayes Theorem
posterior = (P_free_given_spam * P_spam) / P_free
print("Posterior Probability =", posterior)
4.Compare Naive Bayes and Logistic Regression accuracy.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Naive Bayes Model
nb = GaussianNB()
nb.fit(X_train, y_train)
nb_pred = nb.predict(X_test)
nb_acc = accuracy_score(y_test, nb_pred)
# Logistic Regression Model
lr = LogisticRegression(max_iter=200)
lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)
lr_acc = accuracy_score(y_test, lr_pred)
# Display Results
print("Naive Bayes Accuracy =", round(nb_acc * 100, 2), "%")
print("Logistic Regression Accuracy =", round(lr_acc * 100, 2), "%")
#Optional Accuracy Graph
import matplotlib.pyplot as plt
models = ["Naive Bayes", "Logistic Regression"]
accuracy = [nb_acc * 100, lr_acc * 100]
plt.bar(models, accuracy)
plt.xlabel("Algorithm")
plt.ylabel("Accuracy (%)")
plt.title("Accuracy Comparison")
plt.show()
5.Implement Naive Bayes classifier.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create Naive Bayes Model
model = GaussianNB()
# Train Model
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict New Flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = model.predict(sample)
print("Predicted Flower =", iris.target_names[prediction[0]])
6.Perform spam classification.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# Training Emails
emails = [
"Win a free lottery",
"Claim your prize now",
"Free gift offer",
"Meeting at 10 AM",
"Project submission tomorrow",
"Team meeting schedule"
]
# Labels
# 1 = Spam, 0 = Not Spam
labels = [1, 1, 1, 0, 0, 0]
# Convert text to numbers
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)
# Train Naive Bayes Model
model = MultinomialNB()
model.fit(X, labels)
# New Email
test_email = ["Congratulations! You won a free prize"]
# Convert email
test_X = vectorizer.transform(test_email)
# Predict
prediction = model.predict(test_X)
if prediction[0] == 1:
print("Spam Email")
else:
print("Not Spam Email")
#Test Multiple Emails
test_emails = [
"You won a free lottery",
"Project meeting tomorrow",
"Free gift available now"
]
X_test = vectorizer.transform(test_emails)
predictions = model.predict(X_test)
for i in range(len(test_emails)):
if predictions[i] == 1:
print(test_emails[i], "-> Spam")
else:
print(test_emails[i], "-> Not Spam")
7.Train Gaussian Naive Bayes model.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create Gaussian Naive Bayes Model
model = GaussianNB()
# Train Model
model.fit(X_train, y_train)
print("Gaussian Naive Bayes Model Trained Successfully")
#Train and Predict
from sklearn.metrics import accuracy_score
# Predict Test Data
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
#Predict a New Flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = model.predict(sample)
print("Predicted Flower =", iris.target_names[prediction[0]])
-----------------------------------------------------------------------------------------------
SVM
1.Implement SVM for binary classification and visualize the decision boundary.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
# Binary Dataset
X = np.array([
[1, 2],
[2, 3],
[3, 3],
[6, 5],
[7, 8],
[8, 8]
])
# Classes
y = np.array([0, 0, 0, 1, 1, 1])
# Train SVM Model
model = SVC(kernel='linear')
model.fit(X, y)
# Plot Data Points
plt.scatter(X[:,0], X[:,1], c=y)
# Create Decision Boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = model.decision_function(xy).reshape(XX.shape)
# Draw Decision Boundary
plt.contour(
XX, YY, Z,
colors='k',
levels=[-1, 0, 1],
alpha=0.5,
linestyles=['--', '-', '--']
)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("SVM Decision Boundary")
plt.show()
#Sample Classification Result
new_point = [[4, 4]]
prediction = model.predict(new_point)
print("Predicted Class =", prediction[0])
2.Implement Support Vector Machine.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create SVM Model
model = SVC(kernel='linear')
# Train Model
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict New Flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = model.predict(sample)
print("Predicted Flower =", iris.target_names[prediction[0]])
3.Classify Iris dataset using SVM.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load Iris Dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# SVM Model
model = SVC(kernel='rbf') # you can also try 'linear'
# Train model
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict new flower
sample = [[5.1, 3.5, 1.4, 0.2]]
prediction = model.predict(sample)
print("Predicted Flower =", iris.target_names[prediction[0]])
4.Predict class using trained SVM model.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train SVM model
model = SVC(kernel='linear')
model.fit(X_train, y_train)
# Test accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy =", accuracy)
# Predict new sample
new_sample = [[6.2, 3.1, 5.1, 2.2]]
prediction = model.predict(new_sample)
print("Predicted Class =", iris.target_names[prediction[0]])
5.Implement SVM Classifier using the Iris dataset.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset into training and testing
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create SVM Classifier
model = SVC(kernel='rbf') # try 'linear' also
# Train the model
model.fit(X_train, y_train)
# Predict on test data
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy =", accuracy)
# Predict new sample
sample = [[5.8, 2.7, 5.1, 1.9]]
prediction = model.predict(sample)
print("Predicted Class =", iris.target_names[prediction[0]])
6.Train and test an SVM model.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create SVM model
model = SVC(kernel='linear')
# Train the model
model.fit(X_train, y_train)
# Test the model
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy =", accuracy)
#Predict New Sample
sample = [[6.0, 3.0, 4.8, 1.8]]
prediction = model.predict(sample)
print("Predicted Class =", iris.target_names[prediction[0]])
7.Predict flower species using SVM.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train SVM model
model = SVC(kernel='rbf')
model.fit(X_train, y_train)
# New flower sample
sample = [[5.9, 3.0, 5.1, 1.8]]
# Predict species
prediction = model.predict(sample)
print("Predicted Flower Species =", iris.target_names[prediction[0]])
8.Calculate accuracy of an SVM classifier.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset into training and testing
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create SVM classifier
model = SVC(kernel='linear')
# Train model
model.fit(X_train, y_train)
# Predict test data
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("SVM Accuracy =", accuracy)
#Print Accuracy in Percentage
print("Accuracy =", round(accuracy * 100, 2), "%")
9.Compare SVM and KNN classification accuracy.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# ---------------- SVM Model ----------------
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)
svm_pred = svm_model.predict(X_test)
svm_acc = accuracy_score(y_test, svm_pred)
# ---------------- KNN Model ----------------
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)
knn_pred = knn_model.predict(X_test)
knn_acc = accuracy_score(y_test, knn_pred)
# Results
print("SVM Accuracy =", round(svm_acc * 100, 2), "%")
print("KNN Accuracy =", round(knn_acc * 100, 2), "%")
#visualization
import matplotlib.pyplot as plt
models = ["SVM", "KNN"]
accuracy = [svm_acc * 100, knn_acc * 100]
plt.bar(models, accuracy)
plt.ylabel("Accuracy (%)")
plt.title("SVM vs KNN Accuracy Comparison")
plt.show()
-----------------------------------------------------------------------------------------------
K-Means Clustering
1.Implement K-Means clustering.
import os
os.environ["OMP_NUM_THREADS"] = "1"
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
iris = load_iris()
X = iris.data
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X)
labels = kmeans.labels_
centers = kmeans.cluster_centers_
plt.scatter(X[:, 2], X[:, 3], c=labels, cmap='viridis')
plt.scatter(centers[:, 2], centers[:, 3], color='red', marker='X', s=200)
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.title("K-Means Clustering")
plt.show()
2.Cluster customer data.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Sample Customer Data (Income, Spending Score)
X = np.array([
[15, 39], [16, 81], [17, 6], [18, 77], [19, 40],
[20, 76], [21, 6], [22, 94], [23, 3], [24, 72],
[25, 14], [26, 99], [27, 35], [28, 60], [29, 50],
[30, 20], [31, 85], [32, 5], [33, 65], [34, 45]
])
# K-Means Model (3 clusters)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X)
labels = kmeans.labels_
centers = kmeans.cluster_centers_
# Print results
print("Cluster Labels:\n", labels)
print("\nCluster Centers:\n", centers)
#visualization
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
# Centroids
plt.scatter(
centers[:, 0], centers[:, 1],
color='red', marker='X', s=200
)
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Clusters using K-Means")
plt.show()
3.Find centroids using K-Means.
import os
os.environ["OMP_NUM_THREADS"] = "1"
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
# Load dataset
iris = load_iris()
X = iris.data
# Apply K-Means (K = 3 clusters)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
# Train model
kmeans.fit(X)
# Get centroids
centroids = kmeans.cluster_centers_
print("Centroids of Clusters:\n")
print(centroids)
Visualizing Centroids
import matplotlib.pyplot as plt
plt.scatter(X[:, 2], X[:, 3], c=kmeans.labels_, cmap='viridis')
# Plot centroids
plt.scatter(
centroids[:, 2],
centroids[:, 3],
color='red',
marker='X',
s=200
)
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.title("K-Means Clustering with Centroids")
plt.show()
4.Visualize clusters.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
# Load dataset
iris = load_iris()
X = iris.data
# Apply K-Means (3 clusters)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Visualize clusters (Petal Length vs Petal Width)
plt.scatter(X[:, 2], X[:, 3], c=labels, cmap='viridis')
# Plot centroids
plt.scatter(
centroids[:, 2],
centroids[:, 3],
color='red',
marker='X',
s=200
)
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.title("K-Means Cluster Visualization")
plt.show()
5.Implement K-Means Clustering using a sample dataset.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Sample Dataset (X, Y coordinates)
X = np.array([
[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0],
[5, 8], [6, 9], [5, 9]
])
# Apply K-Means (K = 3 clusters)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Print results
print("Cluster Labels:\n", labels)
print("\nCentroids:\n", centroids)
#Visualization
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
# Plot centroids
plt.scatter(
centroids[:, 0],
centroids[:, 1],
color='red',
marker='X',
s=200
)
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("K-Means Clustering (Sample Dataset)")
plt.show()
6.Cluster customer data into K groups.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Sample Customer Data (Income, Spending Score)
X = np.array([
[15, 39], [16, 81], [17, 6], [18, 77], [19, 40],
[20, 76], [21, 6], [22, 94], [23, 3], [24, 72],
[25, 14], [26, 99], [27, 35], [28, 60], [29, 50],
[30, 20], [31, 85], [32, 5], [33, 65], [34, 45]
])
# Number of clusters (K)
k = 3
# Apply K-Means
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Print results
print("Cluster Labels:\n", labels)
print("\nCluster Centers:\n", centroids)
#Visualization of Customer Clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
# Centroids
plt.scatter(
centroids[:, 0],
centroids[:, 1],
color='red',
marker='X',
s=200
)
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Segmentation using K-Means")
plt.show()
7.Find cluster centroids using K-Means.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np
from sklearn.cluster import KMeans
# Sample dataset
X = np.array([
[2, 10], [2, 5], [8, 4],
[5, 8], [7, 5], [6, 4],
[1, 2], [4, 9], [7, 3]
])
# Apply K-Means (K = 3 clusters)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
# Train model
kmeans.fit(X)
# Get centroids
centroids = kmeans.cluster_centers_
print("Cluster Centroids:\n")
print(centroids)
#Visualization
import matplotlib.pyplot as plt
labels = kmeans.labels_
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(
centroids[:, 0],
centroids[:, 1],
color='red',
marker='X',
s=200
)
plt.xlabel("X Coordinate")
plt.ylabel("Y Coordinate")
plt.title("K-Means Centroids Visualization")
plt.show()
8.Visualize clusters using scatter plot.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Sample dataset
X = np.array([
[1, 2], [2, 1], [3, 2],
[8, 8], [9, 8], [8, 9],
[1, 8], [2, 9], [3, 8]
])
# Apply K-Means (K = 3)
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Scatter plot of clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
# Plot centroids
plt.scatter(
centroids[:, 0],
centroids[:, 1],
color='red',
marker='X',
s=200
)
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.title("K-Means Cluster Visualization")
plt.show()
9.Compare results for K = 2, K = 3 and K = 4.
import os
os.environ["OMP_NUM_THREADS"] = "1"
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Sample dataset
X = np.array([
[1, 2], [2, 1], [3, 2],
[8, 8], [9, 8], [8, 9],
[1, 8], [2, 9], [3, 8],
[9, 9], [10, 10]
])
k_values = [2, 3, 4]
plt.figure(figsize=(12, 4))
for i, k in enumerate(k_values):
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
labels = kmeans.fit_predict(X)
centroids = kmeans.cluster_centers_
plt.subplot(1, 3, i + 1)
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(
centroids[:, 0],
centroids[:, 1],
color='red',
marker='X',
s=200
)
plt.title(f"K = {k}")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.tight_layout()
plt.show()
-----------------------------------------------------------------------------------------------
Missing Value Handling
1.Detect missing values in dataset.
import pandas as pd
import numpy as np
# Sample dataset with missing values
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"Math": [85, 90, np.nan, 75, 88],
"Science": [78, np.nan, 80, 85, 90],
"English": [np.nan, 88, 84, 79, 92]
}
df = pd.DataFrame(data)
print("Dataset:\n")
print(df)
#Detect Missing Values
print("\nMissing Values (True = Missing):\n")
print(df.isnull())
#Count Missing Values
print("\nTotal Missing Values in Each Column:\n")
print(df.isnull().sum())
#Total Missing Values in Dataset
print("\nTotal Missing Values in Dataset:")
print(df.isnull().sum().sum())
2.Replace missing values using mean.
import pandas as pd
import numpy as np
# Sample dataset with missing values
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"Math": [85, 90, np.nan, 75, 88],
"Science": [78, np.nan, 80, 85, 90],
"English": [np.nan, 88, 84, 79, 92]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
#Fill Missing Values Using Mean
df_filled = df.copy()
# Replace missing values with column mean
df_filled["Math"] = df_filled["Math"].fillna(df_filled["Math"].mean())
df_filled["Science"] = df_filled["Science"].fillna(df_filled["Science"].mean())
df_filled["English"] = df_filled["English"].fillna(df_filled["English"].mean())
print("\nDataset After Replacing Missing Values with Mean:\n")
print(df_filled)
3.Replace missing values using median.
import pandas as pd
import numpy as np
# Sample dataset with missing values
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"Math": [85, 90, np.nan, 75, 88],
"Science": [78, np.nan, 80, 85, 90],
"English": [np.nan, 88, 84, 79, 92]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
#Replace Missing Values Using Median
df_median = df.copy()
# Replace missing values with median
df_median["Math"] = df_median["Math"].fillna(df_median["Math"].median())
df_median["Science"] = df_median["Science"].fillna(df_median["Science"].median())
df_median["English"] = df_median["English"].fillna(df_median["English"].median())
print("\nDataset After Replacing Missing Values with Median:\n")
print(df_median)
4.Replace missing values using Mode.
import pandas as pd
import numpy as np
# Sample dataset with missing values
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"Math": [85, 90, np.nan, 75, 88],
"Science": [78, np.nan, 80, 85, 90],
"English": [np.nan, 88, 84, 79, 92],
"Grade": ["A", "B", np.nan, "B", "A"]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
#Replace Missing Values Using Mode
df_mode = df.copy()
# Fill missing values using mode (most frequent value)
for col in df_mode.columns:
df_mode[col] = df_mode[col].fillna(df_mode[col].mode()[0])
print("\nDataset After Replacing Missing Values with Mode:\n")
print(df_mode)
5.Remove rows containing missing values.
import pandas as pd
import numpy as np
# Sample dataset with missing values
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"Math": [85, 90, np.nan, 75, 88],
"Science": [78, np.nan, 80, 85, 90],
"English": [np.nan, 88, 84, 79, 92]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
#Remove Rows with Missing Values
df_cleaned = df.dropna()
print("\nDataset After Removing Rows with Missing Values:\n")
print(df_cleaned)
-----------------------------------------------------------------------------------------------
Label Encoding
1.Convert categorical data using Label Encoding.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"City": ["Mumbai", "Pune", "Delhi", "Mumbai", "Pune"],
"Grade": ["A", "B", "A", "C", "B"]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
#Apply Label Encoding
le = LabelEncoder()
# Convert categorical columns
df["City_encoded"] = le.fit_transform(df["City"])
df["Grade_encoded"] = le.fit_transform(df["Grade"])
print("\nDataset After Label Encoding:\n")
print(df)
2.Apply One-Hot Encoding.
import pandas as pd
# Sample dataset
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"City": ["Mumbai", "Pune", "Delhi", "Mumbai", "Pune"],
"Grade": ["A", "B", "A", "C", "B"]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
# Apply One-Hot Encoding
df_encoded = pd.get_dummies(df, columns=["City", "Grade"])
print("\nDataset After One-Hot Encoding:\n")
print(df_encoded)
3.Encode text labels into numeric values.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
"Student": ["Amit", "Riya", "John", "Sara", "Raj"],
"Result": ["Pass", "Fail", "Pass", "Pass", "Fail"]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
# Apply Label Encoding on Text Labels
le = LabelEncoder()
df["Result_encoded"] = le.fit_transform(df["Result"])
print("\nDataset After Encoding Text Labels:\n")
print(df)
4.Convert categorical data using Label Encoding.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"City": ["Mumbai", "Pune", "Delhi", "Mumbai", "Pune"],
"Grade": ["A", "B", "A", "C", "B"]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
# Apply Label Encoding
le = LabelEncoder()
# Convert categorical columns into numeric values
df["City_encoded"] = le.fit_transform(df["City"])
df["Grade_encoded"] = le.fit_transform(df["Grade"])
print("\nDataset After Label Encoding:\n")
print(df)
5.Encode Gender column using LabelEncoder.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"Gender": ["Male", "Female", "Male", "Female", "Male"],
"Marks": [85, 90, 78, 88, 92]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
# Apply Label Encoding on Gender Column
le = LabelEncoder()
df["Gender_encoded"] = le.fit_transform(df["Gender"])
print("\nDataset After Encoding Gender:\n")
print(df)
6.Perform One-Hot Encoding on City column.
import pandas as pd
# Sample dataset
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"City": ["Mumbai", "Pune", "Delhi", "Mumbai", "Pune"],
"Marks": [85, 90, 78, 88, 92]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
# Apply One-Hot Encoding on City Column
df_encoded = pd.get_dummies(df, columns=["City"])
print("\nDataset After One-Hot Encoding on City Column:\n")
print(df_encoded)
7.Compare Label Encoding and One-Hot Encoding.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
"City": ["Mumbai", "Pune", "Delhi", "Mumbai", "Pune"]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
#Label Encoding
le = LabelEncoder()
df["City_Label"] = le.fit_transform(df["City"])
print("\nAfter Label Encoding:\n")
print(df)
# One-Hot Encoding
df_onehot = pd.get_dummies(df, columns=["City"])
print("\nAfter One-Hot Encoding:\n")
print(df_onehot)
8.Transform categorical data into numerical form.
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Sample dataset
data = {
"Name": ["Amit", "Riya", "John", "Sara", "Raj"],
"City": ["Mumbai", "Pune", "Delhi", "Mumbai", "Pune"],
"Gender": ["Male", "Female", "Male", "Female", "Male"]
}
df = pd.DataFrame(data)
print("Original Dataset:\n")
print(df)
# Label Encoding
le = LabelEncoder()
df["City_encoded"] = le.fit_transform(df["City"])
df["Gender_encoded"] = le.fit_transform(df["Gender"])
print("\nAfter Label Encoding:\n")
print(df)
# One-Hot Encoding
df_onehot = pd.get_dummies(df, columns=["City", "Gender"])
print("\nAfter One-Hot Encoding:\n")
print(df_onehot)
-----------------------------------------------------------------------------------------------
Data Visualization
1.Create a Line Chart.
import matplotlib.pyplot as plt
# Sample data (Months vs Marks)
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
marks = [65, 70, 75, 80, 85, 90]
plt.plot(months, marks, marker='o')
plt.title("Student Performance Over Time")
plt.xlabel("Months")
plt.ylabel("Marks")
plt.grid(True)
plt.show()
2.Create a Bar Chart.
import matplotlib.pyplot as plt
# Sample data
subjects = ["Math", "Science", "English", "History", "CS"]
marks = [85, 90, 75, 80, 95]
plt.bar(subjects, marks)
plt.title("Student Marks by Subject")
plt.xlabel("Subjects")
plt.ylabel("Marks")
plt.show()
3.Create a Histogram.
import matplotlib.pyplot as plt
# Sample data (student marks)
marks = [45, 50, 55, 60, 65, 70, 75, 80, 85, 90,
95, 88, 76, 66, 59, 73, 82, 91, 47, 53]
plt.hist(marks, bins=5, edgecolor='black')
plt.title("Histogram of Student Marks")
plt.xlabel("Marks Range")
plt.ylabel("Frequency")
plt.show()
4.Create a Scatter Plot.
import matplotlib.pyplot as plt
# Sample data (Hours studied vs Marks scored)
hours = [1, 2, 3, 4, 5, 6, 7, 8]
marks = [40, 45, 50, 55, 65, 70, 80, 85]
plt.scatter(hours, marks)
plt.title("Study Hours vs Marks")
plt.xlabel("Hours Studied")
plt.ylabel("Marks Scored")
plt.show()
5.Plot student marks using Matplotlib.
import matplotlib.pyplot as plt
# Sample data
students = ["Amit", "Riya", "John", "Sara", "Raj"]
marks = [85, 90, 78, 88, 92]
plt.bar(students, marks)
plt.title("Student Marks")
plt.xlabel("Students")
plt.ylabel("Marks")
plt.show()
6.Plot sales data using a Bar Graph.
import matplotlib.pyplot as plt
# Sample sales data
products = ["Laptop", "Mobile", "Tablet", "Headphones", "Watch"]
sales = [120, 200, 150, 180, 90]
plt.bar(products, sales)
plt.title("Sales Data of Products")
plt.xlabel("Products")
plt.ylabel("Sales (Units)")
plt.show()
7.Display frequency distribution using Histogram.
import matplotlib.pyplot as plt
# Sample data (student marks)
marks = [42, 55, 61, 67, 70, 72, 75, 78, 80, 82,
85, 88, 90, 91, 93, 95, 60, 66, 74, 79]
plt.hist(marks, bins=5, edgecolor='black')
plt.title("Frequency Distribution of Student Marks")
plt.xlabel("Marks Range")
plt.ylabel("Frequency")
plt.show()
8.Show relationship between two variables using Scatter Plot.
import matplotlib.pyplot as plt
# Sample data (Study Hours vs Marks)
hours = [1, 2, 3, 4, 5, 6, 7, 8]
marks = [35, 40, 50, 55, 65, 70, 85, 90]
plt.scatter(hours, marks)
plt.title("Relationship Between Study Hours and Marks")
plt.xlabel("Study Hours")
plt.ylabel("Marks")
plt.grid(True)
plt.show()
9.Add title, labels, and legend to a graph.
import matplotlib.pyplot as plt
# Sample data
x = [1, 2, 3, 4, 5]
y1 = [10, 20, 25, 30, 40]
y2 = [5, 15, 20, 25, 35]
# Plot lines with labels (for legend)
plt.plot(x, y1, label="Series 1")
plt.plot(x, y2, label="Series 2")
# Title
plt.title("Graph with Title, Labels and Legend")
# Axis Labels
plt.xlabel("X Axis (Values)")
plt.ylabel("Y Axis (Values)")
# Legend
plt.legend()
plt.grid(True)
plt.show()
10.Compare multiple datasets using plots.
import matplotlib.pyplot as plt
# Sample data (Months)
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun"]
# Dataset 1 (Product A sales)
product_A = [100, 120, 140, 160, 180, 200]
# Dataset 2 (Product B sales)
product_B = [90, 110, 130, 150, 170, 190]
# Plot both datasets
plt.plot(months, product_A, marker='o', label="Product A")
plt.plot(months, product_B, marker='s', label="Product B")
# Title and labels
plt.title("Comparison of Two Products Sales")
plt.xlabel("Months")
plt.ylabel("Sales")
# Legend
plt.legend()
plt.grid(True)
plt.show()
--------------------*---------------------*--------------------------*------------------*-----