MNIST — Handwritten digit recognition using tensorflow 2 and Keras
In this article, We are going to train digit recognition model using Tensorflow Keras and MNIST dataset.
This article is intended for those who have some experience in Python and machine learning basics, but new to Computer Vision. This task is a perfect introduction to Computer Vision.
Quick Intro
What is digit recognition ?
Digit recognition is simply a task for computer to correctly identify digits from a given image.
What is MNIST ?
MNIST (“Modified National Institute of Standards and Technology”) is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.
Practice Skills
- Computer vision fundamentals including simple neural networks
Requirements
- Python 3
Required Tools:
- Jupyter Notebook
- Numpy for linear algebra
- Pandas for data processing
- Matplotlib for plotting graph
- Tensorflow 2 & Keras for deep learning model
Make sure these required tools are already installed, simply use pip
/> pip install jupyter
/> pip install numpy
/> pip install pandas
/> pip install matplotlib
/> pip install tensorflow
/> pip install keras
Join the competition on Kaggle & you can download the dataset from:
https://www.kaggle.com/c/digit-recognizer/data
So let’s get started!
Create Notebook
Let’s start the jupyter notebook and create new python 3 notebook.
- Open command line / prompt and type
/> jupyter notebook
- Create a new Python 3 notebook
Next, we want to import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as pltfrom keras.models import Sequential, load_model
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras import layers
from keras.utils import np_utils
Let’s break down what we just imported
from keras.models import Sequential
will be used to define a Sequential neural network model.from keras import layers
will be used to define layer to store in the sequential model.from keras.callbacks import EarlyStopping
. EarlyStopping will be used for training proccess. Train process will be stopped automatically if certain conditions meet, for example if there is no accuracy improvement in 10 epochs of training then the train process will be stopped.from keras.callbacks import ModelCheckpoint
. ModelCheckpoint will be used to save the best model, with the highest accuracy or loss.
Load the dataset
After we finish downloaded the dataset, we need to extract them, in my case, I will extract the dataset to the same directory as my Python script or notebook.
Let’s load the dataset using Pandas and analyze it a bit
train_df = pd.read_csv('train.csv')
Print the 5 top record image
train_df.head()
Output column will look like this:
label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783
Each record contains a label in the first column and rests are the image pixel data
label
contains 10 unique digits, 0 to 9.- pixel 0 to 784, are the flattened 2d array (image pixel data)
Why pixel only have 1 value?
Should not the pixel have 3 values: (R, G, B) ?
Good question. In this case the image has only 1 channel which means 1 value each pixel, 0–255.
Image with 1 channel also means that it is a grayscaled image.
For (R, G, B) case, the image has 3 channels (0–255, 0–255, 0–255).
- Image has 1 channel (1 value each pixel, 0–255), that means it is a grayscaled image.
Prepare Training Dataset
Next, we want to split the label & the image into X
and y
like the classic machine learning prepared dataset. Where X
is a list of images and y
is the labels
X_train = train_df.iloc[:, 1:]
y_train = train_df.iloc[:, 0]X_train.shape, y_train.shape
Output should look like this:
((42000, 784), (42000,))
Where X contains images and y contains labels. Both should have the same record count, 42000 in this case.
Visualize sample
Next let’s visualize the sample. We will use matplotlib to plot the image. But first, let’s take a sample image with index 42.
image_sample = X_train.loc[42]
Whoops! Remember! each record or image data is a flattened 2d array (pixel 0–784) that means it needs to be converted back to 2d array before we can visualize them.
To do this, first we need to define width and height for this image. In this case, we will use 28
as width and height which is the square root of 784 or 28x28.
W = 28
H = 28
Next we want to reshape the image to 28x28
image_sample = sample.values.reshape(W, H)
finally, use matplotlib to plot the image_sample
:
plt.imshow(image_sample)
plt.show()
Output:
Why image not grayscaled ?
It is grayscaled, just need a few more code to visualize it correctly
plt.imshow(sample.values.reshape(W, H), cmap=plt.cm.gray)
plt.show()
Output:
Normalize data
Next, we will scale the pixel values. Currently the pixel values are integers in range 0 to 255, we want these values to be in range 0 to 1.
To do this, simply divide the pixel values with 255:
X_train = X_train.astype('float32') / 255
Prepare dataset shape to feed into CNN
With tensorflow as backend the input shape is (batch, width, height, channels)
Where:
batch
or batch size is the dataset size / row count, in this case 42000width
is the image widthheight
is the image heightchannel
is the image channel. 1 for grayscale and 3 for rgb or bgr.
Next, we reshape the training dataset
X_train = X_train.values.reshape(-1, W, H, 1)
X_train.shape
- why the batch size is
-1
? Tensorflow will automatically count the batch size. In this case, 42000.
Output:
(42000, 28, 28, 1)
Looks good, (batch, width, height, channels)
Define Neural Network model
Next we want to define the neural network model. First we will define an empty sequential model.
model = Sequential()
Done, we have initialized an empty sequential model. Next, we want to add layers to this empty model:
First layer we add is an input layer and a convolution layer, with input shape (width, height, channels)
input_shape = (W, H, 1)model.add(layers.Conv2D(filters=32,
kernel_size=(5,5),
padding='same',
activation='relu',
input_shape=input_shape)) # input shapemodel.add(layers.MaxPool2D())
model.add(layers.Dropout(0.4))
Next, we use 1 more convolution layer and flatten and 1 dense layer.
model.add(layers.Conv2D(filters=64, kernel_size=(5,5), padding='same', activation='relu'))model.add(layers.MaxPool2D())
model.add(layers.Dropout(0.4))model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.4))
Finally for the output layer, we use dense layer as output layer with output size equals to the number class, in this case it is 10 (digit 0 to 9)
number_of_classes = y_train.nunique() # 10
model.add(layers.Dense(number_of_classes, activation='softmax'))
Compile model & summary
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])model.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 32) 832
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 14, 14, 64) 51264
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 7, 7, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 3136) 0
_________________________________________________________________
dense (Dense) (None, 128) 401536
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 454,922
Trainable params: 454,922
Non-trainable params: 0
_________________________________________________________________
references for these layers:
- https://www.tensorflow.org/swift/api_docs/Structs/Conv2D
- https://www.tensorflow.org/swift/api_docs/Structs/MaxPool2D
- https://www.tensorflow.org/swift/api_docs/Structs/Dropout
- https://www.tensorflow.org/swift/api_docs/Structs/Flatten
- https://www.tensorflow.org/swift/api_docs/Structs/Dense
Prepare labels to fit output layers
Next we need to prepare the labels, so the labels can fit the output layers of our neural network. Output layers size is 10 which means we need our labels to be length 10.
To do this we will use techniques called One Hot Encoding.
Here is the simple example of the output of one hot encoding
# Example label = 0
# One hot encode Output, index 0 value is 1, rests are 0
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10# Example label = 1
# One hot encode Output, index 1 value is 1, rests are 0
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10# Example label = 2
# One hot encode Output, index 2 value is 1, rests are 0
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10# ...
We will use np_utils from tensorflow that can helps us achive one hot encoded labels
y_train = np_utils.to_categorical(y_train, number_of_classes)
y_train.shape
Output
(42000, 10)
Sample of one hot encoded y_train
# Input
y_train# Output
array([[0., 1., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
[0., 1., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.]], dtype=float32)
Training process
Before training, we will configure callbacks for early stopping and checkpoint
early_stopping = EarlyStopping(monitor='val_loss',
patience=8,
verbose=1,
mode='min')mcp_save = ModelCheckpoint('digit_recognizer.h5',
save_best_only=True,
monitor='val_loss',
verbose=1,
mode='auto')
EarlyStopping()
will stop the training if there is no improvement on the monitored value (in this case: val_loss).
ModelCheckpoint()
will save the best val_loss model, with filename "digit_recognizer.h5".
Next, start the training!
history = model.fit(X_train,
y_train,
epochs=100,
batch_size=32,
verbose=1,
validation_split=0.2,
callbacks=[early_stopping, mcp_save])
epochs
means the number of passes of the entire training dataset the machine learning algorithm has completed.batch_size
defines the number of samples that will be propagated through the network. (https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network).validation_split
, validation split proportion each epoch. In this case 20% of the dataset.
Plot training graph
model.fit()
function will return a dictionary of training datas including loss, val_loss, accuracy, and val_accuracy
the dictionary will look something like this
{
"val_loss": [ list of validation losses... ],
"val_accuracy": [ list of validation accuracies... ],
"loss": [ list of losses... ],
"accuracy": [ list of accuracies... ]
}
Plot function
# function to plot accuracy / loss
def plotgraph(epochs, acc, val_acc):
# Plot training & validation accuracy values
plt.plot(epochs, acc, 'b')
plt.plot(epochs, val_acc, 'r')
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()
Plot the training history
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(accuracy))
# Plot accuracy & val accuracy
plotgraph(epochs, accuracy, val_accuracy)
Output:
plotgraph(epochs, loss, val_loss)
Output:
Test Model Accuracy
We need to process the test data like what we did to train data.
Load dataset
Using pandas:
test_df = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
test_df.shape
Output:
(28000, 784)
There are 28000 records of flattened images.
Next, we will repeat the same preprocessing as we did to the train set
- Scale image pixel values
- Reshape flattened image to (batch, width, height, channels)
X_test = test_df.values
print(X_test.shape)# Scale pixel values
X_test = X_test.astype('float32') / 255# Reshape to (batch, W, H, channels)
X_test = X_test.reshape(-1, W, H, 1)
print(X_test.shape)
Take sample and Visualize:
# plot image entry number 42, reshape image to size 28x28 pixel
plt.imshow(X_test[42].reshape(28, 28), cmap=plt.cm.gray)
plt.show()
Output:
Predict test image
First we load the saved best model
model = load_model('digit_recognizer.h5')
Next, we predict all 28k images on X_test
y_pred = model.predict(X_test)
y_pred
Each item in list y_pred
is a list with length 10 (output layer size/number_of_classes), the accuracy of each label predicted.
For example: value of index 1 is 0.93, means label 1 has accuracy 93% which also means this image 93% might be 1.
Sample prediction from y_pred
will look like the one hot encoded label:
# just an example
[0., 0.93, 0., 0., 0., 0.23, 0., 0., 0.48, 0.]
To get the actual prediction, we want to get the highest number in the list. To achive this, we can use .argmax()
.
Take Sample 10–16 & Visualize prediction
# get prediction and visualize
for i in range(10, 16):
plt.subplot(280 + (i%10+1))
plt.imshow(X_test[i].reshape(28, 28), cmap=plt.cm.gray)
plt.title(y_pred[i].argmax())
plt.show()
Output:
Take prediction sample entry number 42 & visualize
plt.imshow(X_test[42].reshape(W, H), cmap=plt.cm.gray)
print(y_pred[42].argmax())
Output:
4
You can find the notebook at my Github
https://github.com/madeyoga/MNIST
or Kaggle kernel
https://www.kaggle.com/yeogaa/mnist-train-digit-recognizer
I hope you enjoyed this post! If you think this can be useful to others, Don’t forget to share with them.
Thanks!