Initial commit (bdd0132c) · Commits · Johannes Bleher / bsu_ils_2025

01_neural_network/mnist.pkl.gz

0 → 100644

+19.4 MiB

File added.

No diff preview for this file type.

View file

01_neural_network/mnist_nn.ipynb

0 → 100644

+340 −0

Original line number	Diff line number	Diff line
		%% Cell type:markdown id:eb416ca5 tags:

		# MNIST Neural Network Implementation in Python

		%% Cell type:markdown id:4186ae33 tags:

		You have a dataset containing handwritten digits from the MNIST dataset. Each digit is represented as a vector of 784 pixel values (originally a 28×28 image) ranging from 0 (white) to 255 (black).

		Your goal is to construct a simple neural network to classify these images.

		%% Cell type:markdown id:35c57ea6 tags:

		<p style="float: left; margin: 20px 10px 10px 20px;padding-right:20px;">
		<img src="neuronal_network.png" alt="Strcture of the Neuronal Network" width="550"/>
		</p>
		The network structure is:

		- Input Layer: 784 neurons (pixels)
		- Hidden Layer: 10 neurons using the ReLU activation function
		- ReLU function: $ \operatorname{ReLU}(z) = \max(0, z) $
		- The input layer has dimensions $ 784 \times n $, where $n$ is the number of observations.

		- The first layer (input to hidden) computation:
		$$
		\mathbf{A}_1 = \operatorname{ReLU}\left(\mathbf{W}_1 \mathbf{A}_0 + \mathbf{b}_1\right)
		$$
		where $\mathbf{W}_1$ is a $10 \times 784$ weight matrix, and $\mathbf{b}_1$ is a $10 \times 1$ bias vector.

		- The second layer (hidden to output) uses the Softmax activation function:
		$$
		\mathbf{A}_2 = \text{softmax}(\mathbf{Z}_2) = \text{softmax}(\mathbf{W}_2 \mathbf{A}_1 + \mathbf{b}_2)
		$$

		where $\mathbf{W}_2$ is a $10 \times 10$ matrix, $\mathbf{b}_2$ a $10 \times 1$ vector, and the Softmax function is defined as:
		$$
		\sigma(\mathbf{z})_i = \frac{\exp(z_i)}{\sum_i \exp(z_i)}.
		$$

		- The output layer represents the predicted digit (0–9) as the neuron with the highest score.

		%% Cell type:markdown id:ecc8ba2a tags:

		## Training Procedure

		Use gradient descent and backpropagation to update parameters. Apply the following update rules with a learning rate $\alpha = 0.1$:
		$$
		\begin{align}
		\mathbf{W}_1 &:= \mathbf{W}_1 - \alpha\, \mathrm{d}\mathbf{W}_1, \\
		\mathbf{b}_1 &:= \mathbf{b}_1 - \alpha\, \mathrm{d}\mathbf{b}_1, \\
		\mathbf{W}_2 &:= \mathbf{W}_2 - \alpha\, \mathrm{d}\mathbf{W}_2, \\
		\mathbf{b}_2 &:=\mathbf{b}_2 - \alpha\, \mathrm{d}\mathbf{b}_2
		\end{align}
		$$
		Compute the gradients using backpropagation:
		$$
		\begin{align}
		\mathrm{d}\mathbf{Z}_2 = \mathbf{A}_2 - \mathbf{Y}, \\
		\mathrm{d}\mathbf{W}_2 = \frac{1}{n} \mathrm{d}\mathbf{Z}_2 \mathbf{A}_1^\prime, \\
		\mathrm{d}\mathbf{b}_2 = \frac{1}{n} \mathrm{d}\mathbf{Z}_2 \mathbf{1}_m, \\
		\mathrm{d}\mathbf{Z}_1 = \mathbf{W}_2^\prime \mathrm{d}\mathbf{Z}_2 \odot \operatorname{ReLU}^\prime(\mathbf{Z}_1), \\
		\mathrm{d}\mathbf{W}_1 = \frac{1}{n} \mathrm{d}\mathbf{Z}_1 \mathbf{X}^\prime, \\
		\mathrm{d}\mathbf{b}_1 = \frac{1}{n} \mathrm{d}\mathbf{Z}_1 \mathbf{1}_m
		\end{align}
		$$

		%% Cell type:markdown id:ce0ceda6 tags:

		## Import Required Libraries
		We start by importing necessary Python libraries for numerical computation, data handling, and visualization.

		%% Cell type:code id:6c7a038a tags:

		``` python
		import numpy as np
		import pandas as pd
		import matplotlib.pyplot as plt
		from sklearn.model_selection import train_test_split
		import pickle
		```

		%% Cell type:markdown id:af55e379 tags:

		## Load Data
		We load the MNIST dataset, which contains images of handwritten digits. Each image is represented by 784 pixel values.

		%% Cell type:code id:012469fa tags:

		``` python
		mnist = pd.read_pickle("mnist.pkl.gz")
		```

		%% Cell type:markdown id:a696fbf0 tags:

		## Data Preprocessing
		Normalize pixel values to the range [0, 1] and convert labels to one-hot encoding.

		%% Cell type:code id:a044995f tags:

		``` python
		X = mnist.drop("Label", axis=1).values / 255
		y = pd.get_dummies(mnist["Label"]).values
		```

		%% Cell type:markdown id:6c88849b tags:

		## The Task
		- Split the data into training and test sets.
		- Define the activation functions and the necessary derivatives.
		- Define functions for forward and backpropagation.
		- Train your neural network using gradient descent for at least 500 iterations.
		- Output the accuracy every 5th iteration.
		- Use learning rate $\alpha = 0.1$.

		%% Cell type:markdown id:c136ef0c tags:

		## Split Data
		Split the dataset into training and test sets to evaluate the model performance on unseen data.

		%% Cell type:code id:aa452588 tags:

		``` python
		X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
		```

		%% Cell type:markdown id:872f6ede tags:

		## Activation Functions
		Define the ReLU activation function and its derivative for the hidden layer, and the softmax function for the output layer.

		%% Cell type:code id:46bdd911 tags:

		``` python
		def ReLU(Z):
		return np.maximum(0, Z)

		def ReLU_derivative(Z):
		return Z > 0

		def softmax(Z):
		expZ = np.exp(Z - np.max(Z, axis=0, keepdims=True))
		return expZ / np.sum(expZ, axis=0, keepdims=True)
		```

		%% Cell type:markdown id:34fee4cc tags:

		## Initialize Network Parameters
		Initialize weights and biases with small random values.

		%% Cell type:code id:6e971e6c tags:

		``` python
		def init_params():
		W1 = np.random.randn(10, 784) * 0.01
		b1 = np.zeros((10, 1))
		W2 = np.random.randn(10, 10) * 0.01
		b2 = np.zeros((10, 1))
		return W1, b1, W2, b2
		```

		%% Cell type:markdown id:3168052c tags:

		## Forward Propagation
		Define the forward propagation function, which calculates outputs based on current network parameters.

		%% Cell type:code id:2b1b6b20 tags:

		``` python
		def forward_prop(W1, b1, W2, b2, X):
		Z1 = W1.dot(X) + b1
		A1 = ReLU(Z1)
		Z2 = W2.dot(A1) + b2
		A2 = softmax(Z2)
		return Z1, A1, Z2, A2
		```

		%% Cell type:markdown id:8bf20d02 tags:

		## Backward Propagation
		Calculate gradients using backpropagation to update parameters.

		%% Cell type:code id:8864edb3 tags:

		``` python
		def back_prop(Z1, A1, A2, W2, X, Y):
		m = X.shape[1]
		dZ2 = A2 - Y
		dW2 = (1/m) * dZ2.dot(A1.T)
		db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)
		dZ1 = W2.T.dot(dZ2) * ReLU_derivative(Z1)
		dW1 = (1/m) * dZ1.dot(X.T)
		db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)
		return dW1, db1, dW2, db2
		```

		%% Cell type:markdown id:73101d4c tags:

		## Accuracy Calculation
		Define a function to measure the accuracy of predictions.

		%% Cell type:code id:5a4cf2ea tags:

		``` python
		def accuracy(W1, b1, W2, b2, X, Y):
		_, _, _, A2 = forward_prop(W1, b1, W2, b2, X)
		predictions = np.argmax(A2, axis=0)
		labels = np.argmax(Y, axis=0)
		return np.mean(predictions == labels)
		```

		%% Cell type:markdown id:cbd9156e tags:

		## Model Training
		Train the neural network using gradient descent and print accuracy every 5 iterations.

		%% Cell type:code id:aa6f9c74 tags:

		``` python
		W1, b1, W2, b2 = init_params()
		X_train, X_test, y_train, y_test = X_train.T, X_test.T, y_train.T, y_test.T

		for i in range(501):
		Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X_train)
		dW1, db1, dW2, db2 = back_prop(Z1, A1, A2, W2, X_train, y_train)

		W1 -= 0.1 * dW1
		b1 -= 0.1 * db1
		W2 -= 0.1 * dW2
		b2 -= 0.1 * db2

		if i % 5 == 0:
		acc = accuracy(W1, b1, W2, b2, X_train, y_train)
		print(f"Iteration {i}, Training-Accuracy: {acc:.4f}")

		print("Final accuracy on test dataset:", accuracy(W1, b1, W2, b2, X_test, y_test))
		```

		%% Output

		Iteration 0, Training-Accuracy: 0.1595
		Iteration 5, Training-Accuracy: 0.1390
		Iteration 10, Training-Accuracy: 0.1226
		Iteration 15, Training-Accuracy: 0.1331
		Iteration 20, Training-Accuracy: 0.1758
		Iteration 25, Training-Accuracy: 0.2520
		Iteration 30, Training-Accuracy: 0.3235
		Iteration 35, Training-Accuracy: 0.3579
		Iteration 40, Training-Accuracy: 0.3653
		Iteration 45, Training-Accuracy: 0.3570
		Iteration 50, Training-Accuracy: 0.3435
		Iteration 55, Training-Accuracy: 0.3316
		Iteration 60, Training-Accuracy: 0.3236
		Iteration 65, Training-Accuracy: 0.3202
		Iteration 70, Training-Accuracy: 0.3251
		Iteration 75, Training-Accuracy: 0.3408
		Iteration 80, Training-Accuracy: 0.3614
		Iteration 85, Training-Accuracy: 0.3876
		Iteration 90, Training-Accuracy: 0.4244
		Iteration 95, Training-Accuracy: 0.4846
		Iteration 100, Training-Accuracy: 0.5282
		Iteration 105, Training-Accuracy: 0.5669
		Iteration 110, Training-Accuracy: 0.6061
		Iteration 115, Training-Accuracy: 0.6414
		Iteration 120, Training-Accuracy: 0.6689
		Iteration 125, Training-Accuracy: 0.6880
		Iteration 130, Training-Accuracy: 0.7020
		Iteration 135, Training-Accuracy: 0.7146
		Iteration 140, Training-Accuracy: 0.7268
		Iteration 145, Training-Accuracy: 0.7359
		Iteration 150, Training-Accuracy: 0.7452
		Iteration 155, Training-Accuracy: 0.7529
		Iteration 160, Training-Accuracy: 0.7600
		Iteration 165, Training-Accuracy: 0.7663
		Iteration 170, Training-Accuracy: 0.7720
		Iteration 175, Training-Accuracy: 0.7775
		Iteration 180, Training-Accuracy: 0.7823
		Iteration 185, Training-Accuracy: 0.7865
		Iteration 190, Training-Accuracy: 0.7914
		Iteration 195, Training-Accuracy: 0.7951
		Iteration 200, Training-Accuracy: 0.7987
		Iteration 205, Training-Accuracy: 0.8022
		Iteration 210, Training-Accuracy: 0.8058
		Iteration 215, Training-Accuracy: 0.8088
		Iteration 220, Training-Accuracy: 0.8116
		Iteration 225, Training-Accuracy: 0.8144
		Iteration 230, Training-Accuracy: 0.8178
		Iteration 235, Training-Accuracy: 0.8206
		Iteration 240, Training-Accuracy: 0.8236
		Iteration 245, Training-Accuracy: 0.8261
		Iteration 250, Training-Accuracy: 0.8283
		Iteration 255, Training-Accuracy: 0.8310
		Iteration 260, Training-Accuracy: 0.8338
		Iteration 265, Training-Accuracy: 0.8364
		Iteration 270, Training-Accuracy: 0.8384
		Iteration 275, Training-Accuracy: 0.8405
		Iteration 280, Training-Accuracy: 0.8429
		Iteration 285, Training-Accuracy: 0.8451
		Iteration 290, Training-Accuracy: 0.8473
		Iteration 295, Training-Accuracy: 0.8494
		Iteration 300, Training-Accuracy: 0.8514
		Iteration 305, Training-Accuracy: 0.8532
		Iteration 310, Training-Accuracy: 0.8550
		Iteration 315, Training-Accuracy: 0.8562
		Iteration 320, Training-Accuracy: 0.8579
		Iteration 325, Training-Accuracy: 0.8594
		Iteration 330, Training-Accuracy: 0.8610
		Iteration 335, Training-Accuracy: 0.8623
		Iteration 340, Training-Accuracy: 0.8636
		Iteration 345, Training-Accuracy: 0.8648
		Iteration 350, Training-Accuracy: 0.8662
		Iteration 355, Training-Accuracy: 0.8674
		Iteration 360, Training-Accuracy: 0.8687
		Iteration 365, Training-Accuracy: 0.8697
		Iteration 370, Training-Accuracy: 0.8707
		Iteration 375, Training-Accuracy: 0.8715
		Iteration 380, Training-Accuracy: 0.8726
		Iteration 385, Training-Accuracy: 0.8734
		Iteration 390, Training-Accuracy: 0.8744
		Iteration 395, Training-Accuracy: 0.8753
		Iteration 400, Training-Accuracy: 0.8761
		Iteration 405, Training-Accuracy: 0.8769
		Iteration 410, Training-Accuracy: 0.8779
		Iteration 415, Training-Accuracy: 0.8786
		Iteration 420, Training-Accuracy: 0.8790
		Iteration 425, Training-Accuracy: 0.8799
		Iteration 430, Training-Accuracy: 0.8807
		Iteration 435, Training-Accuracy: 0.8815
		Iteration 440, Training-Accuracy: 0.8820
		Iteration 445, Training-Accuracy: 0.8825
		Iteration 450, Training-Accuracy: 0.8831
		Iteration 455, Training-Accuracy: 0.8837
		Iteration 460, Training-Accuracy: 0.8840
		Iteration 465, Training-Accuracy: 0.8849
		Iteration 470, Training-Accuracy: 0.8854
		Iteration 475, Training-Accuracy: 0.8861
		Iteration 480, Training-Accuracy: 0.8864
		Iteration 485, Training-Accuracy: 0.8869
		Iteration 490, Training-Accuracy: 0.8875
		Iteration 495, Training-Accuracy: 0.8881
		Iteration 500, Training-Accuracy: 0.8888
		Final accuracy on test dataset: 0.8870714285714286

02_deep_belief_network/deep_belief_network.ipynb

0 → 100644

+660 −0

File added.

Preview size limit exceeded, changes collapsed.

02_deep_belief_network/movies.dat

0 → 100644

+3883 −0

File added.

Preview size limit exceeded, changes collapsed.

02_deep_belief_network/ratings.dat

0 → 100644

+1000209 −0

File added.

Preview size limit exceeded, changes collapsed.