Commit bdd0132c authored by jbleher's avatar jbleher
Browse files

Initial commit

parents
Loading
Loading
Loading
Loading
+19.4 MiB

File added.

No diff preview for this file type.

+340 −0
Original line number Diff line number Diff line
%% Cell type:markdown id:eb416ca5 tags:

# MNIST Neural Network Implementation in Python

%% Cell type:markdown id:4186ae33 tags:

You have a dataset containing handwritten digits from the MNIST dataset. Each digit is represented as a vector of 784 pixel values (originally a 28×28 image) ranging from 0 (white) to 255 (black).

Your goal is to construct a simple neural network to classify these images.

%% Cell type:markdown id:35c57ea6 tags:

<p style="float: left; margin: 20px 10px 10px 20px;padding-right:20px;">
<img src="neuronal_network.png" alt="Strcture of the Neuronal Network" width="550"/>
</p>
The network structure is:

- **Input Layer:** 784 neurons (pixels)
- **Hidden Layer:** 10 neurons using the ReLU activation function
- ReLU function: $ \operatorname{ReLU}(z) = \max(0, z) $
- The input layer has dimensions $ 784 \times n $, where $n$ is the number of observations.

- The first layer (input to hidden) computation:
$$
\mathbf{A}_1 = \operatorname{ReLU}\left(\mathbf{W}_1 \mathbf{A}_0 + \mathbf{b}_1\right)
$$
where $\mathbf{W}_1$ is a $10 \times 784$ weight matrix, and $\mathbf{b}_1$ is a $10 \times 1$ bias vector.

- The second layer (hidden to output) uses the Softmax activation function:
$$
\mathbf{A}_2 = \text{softmax}(\mathbf{Z}_2) = \text{softmax}(\mathbf{W}_2 \mathbf{A}_1 + \mathbf{b}_2)
$$

where $\mathbf{W}_2$ is a $10 \times 10$ matrix, $\mathbf{b}_2$ a $10 \times 1$ vector, and the Softmax function is defined as:
$$
\sigma(\mathbf{z})_i = \frac{\exp(z_i)}{\sum_i \exp(z_i)}.
$$

- The output layer represents the predicted digit (0–9) as the neuron with the highest score.

%% Cell type:markdown id:ecc8ba2a tags:

## Training Procedure

Use gradient descent and backpropagation to update parameters. Apply the following update rules with a learning rate $\alpha = 0.1$:
$$
\begin{align}
\mathbf{W}_1 &:= \mathbf{W}_1 - \alpha\, \mathrm{d}\mathbf{W}_1, \\
\mathbf{b}_1 &:= \mathbf{b}_1 - \alpha\, \mathrm{d}\mathbf{b}_1, \\
\mathbf{W}_2 &:= \mathbf{W}_2 - \alpha\, \mathrm{d}\mathbf{W}_2, \\
\mathbf{b}_2 &:=\mathbf{b}_2 - \alpha\, \mathrm{d}\mathbf{b}_2
\end{align}
$$
Compute the gradients using backpropagation:
$$
\begin{align}
\mathrm{d}\mathbf{Z}_2 = \mathbf{A}_2 - \mathbf{Y}, \\
 \mathrm{d}\mathbf{W}_2 = \frac{1}{n} \mathrm{d}\mathbf{Z}_2 \mathbf{A}_1^\prime, \\
 \mathrm{d}\mathbf{b}_2 = \frac{1}{n} \mathrm{d}\mathbf{Z}_2 \mathbf{1}_m, \\
\mathrm{d}\mathbf{Z}_1 = \mathbf{W}_2^\prime \mathrm{d}\mathbf{Z}_2 \odot \operatorname{ReLU}^\prime(\mathbf{Z}_1),  \\
 \mathrm{d}\mathbf{W}_1 = \frac{1}{n} \mathrm{d}\mathbf{Z}_1 \mathbf{X}^\prime, \\
 \mathrm{d}\mathbf{b}_1 = \frac{1}{n} \mathrm{d}\mathbf{Z}_1 \mathbf{1}_m
 \end{align}
$$

%% Cell type:markdown id:ce0ceda6 tags:

## Import Required Libraries
We start by importing necessary Python libraries for numerical computation, data handling, and visualization.

%% Cell type:code id:6c7a038a tags:

``` python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import pickle
```

%% Cell type:markdown id:af55e379 tags:

## Load Data
We load the MNIST dataset, which contains images of handwritten digits. Each image is represented by 784 pixel values.

%% Cell type:code id:012469fa tags:

``` python
mnist = pd.read_pickle("mnist.pkl.gz")
```

%% Cell type:markdown id:a696fbf0 tags:

## Data Preprocessing
Normalize pixel values to the range [0, 1] and convert labels to one-hot encoding.

%% Cell type:code id:a044995f tags:

``` python
X = mnist.drop("Label", axis=1).values / 255
y = pd.get_dummies(mnist["Label"]).values
```

%% Cell type:markdown id:6c88849b tags:

## The Task
- Split the data into training and test sets.
- Define the activation functions and the necessary derivatives.
- Define functions for forward and backpropagation.
- Train your neural network using gradient descent for at least 500 iterations.
- Output the accuracy every 5th iteration.
- Use learning rate $\alpha = 0.1$.

%% Cell type:markdown id:c136ef0c tags:

## Split Data
Split the dataset into training and test sets to evaluate the model performance on unseen data.

%% Cell type:code id:aa452588 tags:

``` python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

%% Cell type:markdown id:872f6ede tags:

## Activation Functions
Define the ReLU activation function and its derivative for the hidden layer, and the softmax function for the output layer.

%% Cell type:code id:46bdd911 tags:

``` python
def ReLU(Z):
    return np.maximum(0, Z)

def ReLU_derivative(Z):
    return Z > 0

def softmax(Z):
    expZ = np.exp(Z - np.max(Z, axis=0, keepdims=True))
    return expZ / np.sum(expZ, axis=0, keepdims=True)
```

%% Cell type:markdown id:34fee4cc tags:

## Initialize Network Parameters
Initialize weights and biases with small random values.

%% Cell type:code id:6e971e6c tags:

``` python
def init_params():
    W1 = np.random.randn(10, 784) * 0.01
    b1 = np.zeros((10, 1))
    W2 = np.random.randn(10, 10) * 0.01
    b2 = np.zeros((10, 1))
    return W1, b1, W2, b2
```

%% Cell type:markdown id:3168052c tags:

## Forward Propagation
Define the forward propagation function, which calculates outputs based on current network parameters.

%% Cell type:code id:2b1b6b20 tags:

``` python
def forward_prop(W1, b1, W2, b2, X):
    Z1 = W1.dot(X) + b1
    A1 = ReLU(Z1)
    Z2 = W2.dot(A1) + b2
    A2 = softmax(Z2)
    return Z1, A1, Z2, A2
```

%% Cell type:markdown id:8bf20d02 tags:

## Backward Propagation
Calculate gradients using backpropagation to update parameters.

%% Cell type:code id:8864edb3 tags:

``` python
def back_prop(Z1, A1, A2, W2, X, Y):
    m = X.shape[1]
    dZ2 = A2 - Y
    dW2 = (1/m) * dZ2.dot(A1.T)
    db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 = W2.T.dot(dZ2) * ReLU_derivative(Z1)
    dW1 = (1/m) * dZ1.dot(X.T)
    db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)
    return dW1, db1, dW2, db2
```

%% Cell type:markdown id:73101d4c tags:

## Accuracy Calculation
Define a function to measure the accuracy of predictions.

%% Cell type:code id:5a4cf2ea tags:

``` python
def accuracy(W1, b1, W2, b2, X, Y):
    _, _, _, A2 = forward_prop(W1, b1, W2, b2, X)
    predictions = np.argmax(A2, axis=0)
    labels = np.argmax(Y, axis=0)
    return np.mean(predictions == labels)
```

%% Cell type:markdown id:cbd9156e tags:

## Model Training
Train the neural network using gradient descent and print accuracy every 5 iterations.

%% Cell type:code id:aa6f9c74 tags:

``` python
W1, b1, W2, b2 = init_params()
X_train, X_test, y_train, y_test = X_train.T, X_test.T, y_train.T, y_test.T

for i in range(501):
    Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X_train)
    dW1, db1, dW2, db2 = back_prop(Z1, A1, A2, W2, X_train, y_train)

    W1 -= 0.1 * dW1
    b1 -= 0.1 * db1
    W2 -= 0.1 * dW2
    b2 -= 0.1 * db2

    if i % 5 == 0:
        acc = accuracy(W1, b1, W2, b2, X_train, y_train)
        print(f"Iteration {i}, Training-Accuracy: {acc:.4f}")

print("Final accuracy on test dataset:", accuracy(W1, b1, W2, b2, X_test, y_test))
```

%% Output

    Iteration 0, Training-Accuracy: 0.1595
    Iteration 5, Training-Accuracy: 0.1390
    Iteration 10, Training-Accuracy: 0.1226
    Iteration 15, Training-Accuracy: 0.1331
    Iteration 20, Training-Accuracy: 0.1758
    Iteration 25, Training-Accuracy: 0.2520
    Iteration 30, Training-Accuracy: 0.3235
    Iteration 35, Training-Accuracy: 0.3579
    Iteration 40, Training-Accuracy: 0.3653
    Iteration 45, Training-Accuracy: 0.3570
    Iteration 50, Training-Accuracy: 0.3435
    Iteration 55, Training-Accuracy: 0.3316
    Iteration 60, Training-Accuracy: 0.3236
    Iteration 65, Training-Accuracy: 0.3202
    Iteration 70, Training-Accuracy: 0.3251
    Iteration 75, Training-Accuracy: 0.3408
    Iteration 80, Training-Accuracy: 0.3614
    Iteration 85, Training-Accuracy: 0.3876
    Iteration 90, Training-Accuracy: 0.4244
    Iteration 95, Training-Accuracy: 0.4846
    Iteration 100, Training-Accuracy: 0.5282
    Iteration 105, Training-Accuracy: 0.5669
    Iteration 110, Training-Accuracy: 0.6061
    Iteration 115, Training-Accuracy: 0.6414
    Iteration 120, Training-Accuracy: 0.6689
    Iteration 125, Training-Accuracy: 0.6880
    Iteration 130, Training-Accuracy: 0.7020
    Iteration 135, Training-Accuracy: 0.7146
    Iteration 140, Training-Accuracy: 0.7268
    Iteration 145, Training-Accuracy: 0.7359
    Iteration 150, Training-Accuracy: 0.7452
    Iteration 155, Training-Accuracy: 0.7529
    Iteration 160, Training-Accuracy: 0.7600
    Iteration 165, Training-Accuracy: 0.7663
    Iteration 170, Training-Accuracy: 0.7720
    Iteration 175, Training-Accuracy: 0.7775
    Iteration 180, Training-Accuracy: 0.7823
    Iteration 185, Training-Accuracy: 0.7865
    Iteration 190, Training-Accuracy: 0.7914
    Iteration 195, Training-Accuracy: 0.7951
    Iteration 200, Training-Accuracy: 0.7987
    Iteration 205, Training-Accuracy: 0.8022
    Iteration 210, Training-Accuracy: 0.8058
    Iteration 215, Training-Accuracy: 0.8088
    Iteration 220, Training-Accuracy: 0.8116
    Iteration 225, Training-Accuracy: 0.8144
    Iteration 230, Training-Accuracy: 0.8178
    Iteration 235, Training-Accuracy: 0.8206
    Iteration 240, Training-Accuracy: 0.8236
    Iteration 245, Training-Accuracy: 0.8261
    Iteration 250, Training-Accuracy: 0.8283
    Iteration 255, Training-Accuracy: 0.8310
    Iteration 260, Training-Accuracy: 0.8338
    Iteration 265, Training-Accuracy: 0.8364
    Iteration 270, Training-Accuracy: 0.8384
    Iteration 275, Training-Accuracy: 0.8405
    Iteration 280, Training-Accuracy: 0.8429
    Iteration 285, Training-Accuracy: 0.8451
    Iteration 290, Training-Accuracy: 0.8473
    Iteration 295, Training-Accuracy: 0.8494
    Iteration 300, Training-Accuracy: 0.8514
    Iteration 305, Training-Accuracy: 0.8532
    Iteration 310, Training-Accuracy: 0.8550
    Iteration 315, Training-Accuracy: 0.8562
    Iteration 320, Training-Accuracy: 0.8579
    Iteration 325, Training-Accuracy: 0.8594
    Iteration 330, Training-Accuracy: 0.8610
    Iteration 335, Training-Accuracy: 0.8623
    Iteration 340, Training-Accuracy: 0.8636
    Iteration 345, Training-Accuracy: 0.8648
    Iteration 350, Training-Accuracy: 0.8662
    Iteration 355, Training-Accuracy: 0.8674
    Iteration 360, Training-Accuracy: 0.8687
    Iteration 365, Training-Accuracy: 0.8697
    Iteration 370, Training-Accuracy: 0.8707
    Iteration 375, Training-Accuracy: 0.8715
    Iteration 380, Training-Accuracy: 0.8726
    Iteration 385, Training-Accuracy: 0.8734
    Iteration 390, Training-Accuracy: 0.8744
    Iteration 395, Training-Accuracy: 0.8753
    Iteration 400, Training-Accuracy: 0.8761
    Iteration 405, Training-Accuracy: 0.8769
    Iteration 410, Training-Accuracy: 0.8779
    Iteration 415, Training-Accuracy: 0.8786
    Iteration 420, Training-Accuracy: 0.8790
    Iteration 425, Training-Accuracy: 0.8799
    Iteration 430, Training-Accuracy: 0.8807
    Iteration 435, Training-Accuracy: 0.8815
    Iteration 440, Training-Accuracy: 0.8820
    Iteration 445, Training-Accuracy: 0.8825
    Iteration 450, Training-Accuracy: 0.8831
    Iteration 455, Training-Accuracy: 0.8837
    Iteration 460, Training-Accuracy: 0.8840
    Iteration 465, Training-Accuracy: 0.8849
    Iteration 470, Training-Accuracy: 0.8854
    Iteration 475, Training-Accuracy: 0.8861
    Iteration 480, Training-Accuracy: 0.8864
    Iteration 485, Training-Accuracy: 0.8869
    Iteration 490, Training-Accuracy: 0.8875
    Iteration 495, Training-Accuracy: 0.8881
    Iteration 500, Training-Accuracy: 0.8888
    Final accuracy on test dataset: 0.8870714285714286
+0 −0

File added.

Preview size limit exceeded, changes collapsed.

+0 −0

File added.

Preview size limit exceeded, changes collapsed.

+0 −0

File added.

Preview size limit exceeded, changes collapsed.