MNIST — Handwritten digit recognition using tensorflow 2 and Keras

madey
9 min readApr 22, 2021

--

Photo by Charles Deluvio on Unsplash

In this article, We are going to train digit recognition model using Tensorflow Keras and MNIST dataset.

This article is intended for those who have some experience in Python and machine learning basics, but new to Computer Vision. This task is a perfect introduction to Computer Vision.

Quick Intro

What is digit recognition ?

Digit recognition is simply a task for computer to correctly identify digits from a given image.

What is MNIST ?

MNIST (“Modified National Institute of Standards and Technology”) is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

Practice Skills

  • Computer vision fundamentals including simple neural networks

Requirements

  • Python 3

Required Tools:

  • Jupyter Notebook
  • Numpy for linear algebra
  • Pandas for data processing
  • Matplotlib for plotting graph
  • Tensorflow 2 & Keras for deep learning model

Make sure these required tools are already installed, simply use pip

/> pip install jupyter
/> pip install numpy
/> pip install pandas
/> pip install matplotlib
/> pip install tensorflow
/> pip install keras

Join the competition on Kaggle & you can download the dataset from:
https://www.kaggle.com/c/digit-recognizer/data

So let’s get started!

Create Notebook

Let’s start the jupyter notebook and create new python 3 notebook.

  • Open command line / prompt and type
/> jupyter notebook
  • Create a new Python 3 notebook

Next, we want to import the libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from keras.models import Sequential, load_model
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras import layers
from keras.utils import np_utils

Let’s break down what we just imported

  • from keras.models import Sequential will be used to define a Sequential neural network model.
  • from keras import layers will be used to define layer to store in the sequential model.
  • from keras.callbacks import EarlyStopping. EarlyStopping will be used for training proccess. Train process will be stopped automatically if certain conditions meet, for example if there is no accuracy improvement in 10 epochs of training then the train process will be stopped.
  • from keras.callbacks import ModelCheckpoint. ModelCheckpoint will be used to save the best model, with the highest accuracy or loss.

Load the dataset

After we finish downloaded the dataset, we need to extract them, in my case, I will extract the dataset to the same directory as my Python script or notebook.

Let’s load the dataset using Pandas and analyze it a bit

train_df = pd.read_csv('train.csv')

Print the 5 top record image

train_df.head()

Output column will look like this:

label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783

Each record contains a label in the first column and rests are the image pixel data

  • label contains 10 unique digits, 0 to 9.
  • pixel 0 to 784, are the flattened 2d array (image pixel data)

Why pixel only have 1 value?

Should not the pixel have 3 values: (R, G, B) ?

Good question. In this case the image has only 1 channel which means 1 value each pixel, 0–255.
Image with 1 channel also means that it is a grayscaled image.

For (R, G, B) case, the image has 3 channels (0–255, 0–255, 0–255).

  • Image has 1 channel (1 value each pixel, 0–255), that means it is a grayscaled image.

Prepare Training Dataset

Next, we want to split the label & the image into X and y like the classic machine learning prepared dataset. Where X is a list of images and y is the labels

X_train = train_df.iloc[:, 1:]
y_train = train_df.iloc[:, 0]
X_train.shape, y_train.shape

Output should look like this:

((42000, 784), (42000,))

Where X contains images and y contains labels. Both should have the same record count, 42000 in this case.

Visualize sample

Next let’s visualize the sample. We will use matplotlib to plot the image. But first, let’s take a sample image with index 42.

image_sample = X_train.loc[42]

Whoops! Remember! each record or image data is a flattened 2d array (pixel 0–784) that means it needs to be converted back to 2d array before we can visualize them.

To do this, first we need to define width and height for this image. In this case, we will use 28 as width and height which is the square root of 784 or 28x28.

W = 28
H = 28

Next we want to reshape the image to 28x28

image_sample = sample.values.reshape(W, H)

finally, use matplotlib to plot the image_sample:

plt.imshow(image_sample)
plt.show()

Output:

Why image not grayscaled ?

It is grayscaled, just need a few more code to visualize it correctly

plt.imshow(sample.values.reshape(W, H), cmap=plt.cm.gray)
plt.show()

Output:

Normalize data

Next, we will scale the pixel values. Currently the pixel values are integers in range 0 to 255, we want these values to be in range 0 to 1.

To do this, simply divide the pixel values with 255:

X_train = X_train.astype('float32') / 255

Prepare dataset shape to feed into CNN

With tensorflow as backend the input shape is (batch, width, height, channels)

Where:

  • batch or batch size is the dataset size / row count, in this case 42000
  • width is the image width
  • height is the image height
  • channel is the image channel. 1 for grayscale and 3 for rgb or bgr.

Next, we reshape the training dataset

X_train = X_train.values.reshape(-1, W, H, 1)
X_train.shape
  • why the batch size is -1? Tensorflow will automatically count the batch size. In this case, 42000.

Output:

(42000, 28, 28, 1)

Looks good, (batch, width, height, channels)

Define Neural Network model

Next we want to define the neural network model. First we will define an empty sequential model.

model = Sequential()

Done, we have initialized an empty sequential model. Next, we want to add layers to this empty model:

First layer we add is an input layer and a convolution layer, with input shape (width, height, channels)

input_shape = (W, H, 1)model.add(layers.Conv2D(filters=32,
kernel_size=(5,5),
padding='same',
activation='relu',
input_shape=input_shape)) # input shape
model.add(layers.MaxPool2D())
model.add(layers.Dropout(0.4))

Next, we use 1 more convolution layer and flatten and 1 dense layer.

model.add(layers.Conv2D(filters=64, kernel_size=(5,5), padding='same', activation='relu'))model.add(layers.MaxPool2D())
model.add(layers.Dropout(0.4))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.4))

Finally for the output layer, we use dense layer as output layer with output size equals to the number class, in this case it is 10 (digit 0 to 9)

number_of_classes = y_train.nunique() # 10
model.add(layers.Dense(number_of_classes, activation='softmax'))

Compile model & summary

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])model.summary()

Output:

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 32) 832
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 14, 14, 64) 51264
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 7, 7, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 3136) 0
_________________________________________________________________
dense (Dense) (None, 128) 401536
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 454,922
Trainable params: 454,922
Non-trainable params: 0
_________________________________________________________________

references for these layers:

Prepare labels to fit output layers

Next we need to prepare the labels, so the labels can fit the output layers of our neural network. Output layers size is 10 which means we need our labels to be length 10.
To do this we will use techniques called One Hot Encoding.

Here is the simple example of the output of one hot encoding

# Example label = 0
# One hot encode Output, index 0 value is 1, rests are 0
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10
# Example label = 1
# One hot encode Output, index 1 value is 1, rests are 0
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10
# Example label = 2
# One hot encode Output, index 2 value is 1, rests are 0
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10
# ...

We will use np_utils from tensorflow that can helps us achive one hot encoded labels

y_train = np_utils.to_categorical(y_train, number_of_classes)
y_train.shape

Output

(42000, 10)

Sample of one hot encoded y_train

# Input
y_train
# Output
array([[0., 1., 0., ..., 0., 0., 0.],
[1., 0., 0., ..., 0., 0., 0.],
[0., 1., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 1., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.]], dtype=float32)

Training process

Before training, we will configure callbacks for early stopping and checkpoint

early_stopping = EarlyStopping(monitor='val_loss',
patience=8,
verbose=1,
mode='min')
mcp_save = ModelCheckpoint('digit_recognizer.h5',
save_best_only=True,
monitor='val_loss',
verbose=1,
mode='auto')

EarlyStopping() will stop the training if there is no improvement on the monitored value (in this case: val_loss).

ModelCheckpoint() will save the best val_loss model, with filename "digit_recognizer.h5".

Next, start the training!

history = model.fit(X_train,
y_train,
epochs=100,
batch_size=32,
verbose=1,
validation_split=0.2,
callbacks=[early_stopping, mcp_save])

Plot training graph

model.fit() function will return a dictionary of training datas including loss, val_loss, accuracy, and val_accuracy

the dictionary will look something like this

{
"val_loss": [ list of validation losses... ],
"val_accuracy": [ list of validation accuracies... ],
"loss": [ list of losses... ],
"accuracy": [ list of accuracies... ]
}

Plot function

# function to plot accuracy / loss
def plotgraph(epochs, acc, val_acc):
# Plot training & validation accuracy values
plt.plot(epochs, acc, 'b')
plt.plot(epochs, val_acc, 'r')
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

Plot the training history

accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(accuracy))
# Plot accuracy & val accuracy
plotgraph(epochs, accuracy, val_accuracy)

Output:

plotgraph(epochs, loss, val_loss)

Output:

Test Model Accuracy

We need to process the test data like what we did to train data.

Load dataset

Using pandas:

test_df = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
test_df.shape

Output:

(28000, 784)

There are 28000 records of flattened images.

Next, we will repeat the same preprocessing as we did to the train set

  • Scale image pixel values
  • Reshape flattened image to (batch, width, height, channels)
X_test = test_df.values
print(X_test.shape)
# Scale pixel values
X_test = X_test.astype('float32') / 255
# Reshape to (batch, W, H, channels)
X_test = X_test.reshape(-1, W, H, 1)
print(X_test.shape)

Take sample and Visualize:

# plot image entry number 42, reshape image to size 28x28 pixel
plt.imshow(X_test[42].reshape(28, 28), cmap=plt.cm.gray)
plt.show()

Output:

Predict test image

First we load the saved best model

model = load_model('digit_recognizer.h5')

Next, we predict all 28k images on X_test

y_pred = model.predict(X_test)
y_pred

Each item in list y_pred is a list with length 10 (output layer size/number_of_classes), the accuracy of each label predicted.

For example: value of index 1 is 0.93, means label 1 has accuracy 93% which also means this image 93% might be 1.

Sample prediction from y_pred will look like the one hot encoded label:

# just an example
[0., 0.93, 0., 0., 0., 0.23, 0., 0., 0.48, 0.]

To get the actual prediction, we want to get the highest number in the list. To achive this, we can use .argmax().

Take Sample 10–16 & Visualize prediction

# get prediction and visualize
for i in range(10, 16):
plt.subplot(280 + (i%10+1))
plt.imshow(X_test[i].reshape(28, 28), cmap=plt.cm.gray)
plt.title(y_pred[i].argmax())
plt.show()

Output:

Take prediction sample entry number 42 & visualize

plt.imshow(X_test[42].reshape(W, H), cmap=plt.cm.gray)
print(y_pred[42].argmax())

Output:

4

You can find the notebook at my Github
https://github.com/madeyoga/MNIST

or Kaggle kernel
https://www.kaggle.com/yeogaa/mnist-train-digit-recognizer

I hope you enjoyed this post! If you think this can be useful to others, Don’t forget to share with them.

Thanks!

--

--

madey
madey

Written by madey

Independent Software Engineer

No responses yet