MNIST — Handwritten digit recognition using tensorflow 2 and Keras

9 min readApr 22, 2021

In this article, We are going to train digit recognition model using Tensorflow Keras and MNIST dataset.

This article is intended for those who have some experience in Python and machine learning basics, but new to Computer Vision. This task is a perfect introduction to Computer Vision.

Quick Intro

What is digit recognition ?

Digit recognition is simply a task for computer to correctly identify digits from a given image.

What is MNIST ?

MNIST (“Modified National Institute of Standards and Technology”) is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

Practice Skills

Computer vision fundamentals including simple neural networks

Requirements

Python 3

Required Tools:

Jupyter Notebook
Numpy for linear algebra
Pandas for data processing
Matplotlib for plotting graph
Tensorflow 2 & Keras for deep learning model

Make sure these required tools are already installed, simply use pip

/> pip install jupyter
/> pip install numpy
/> pip install pandas
/> pip install matplotlib
/> pip install tensorflow
/> pip install keras

Join the competition on Kaggle & you can download the dataset from:
https://www.kaggle.com/c/digit-recognizer/data

So let’s get started!

Create Notebook

Let’s start the jupyter notebook and create new python 3 notebook.

Open command line / prompt and type

/> jupyter notebook

Create a new Python 3 notebook

Next, we want to import the libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as pltfrom keras.models import Sequential, load_model
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras import layers
from keras.utils import np_utils

Let’s break down what we just imported

from keras.models import Sequential will be used to define a Sequential neural network model.
from keras import layers will be used to define layer to store in the sequential model.
from keras.callbacks import EarlyStopping. EarlyStopping will be used for training proccess. Train process will be stopped automatically if certain conditions meet, for example if there is no accuracy improvement in 10 epochs of training then the train process will be stopped.
from keras.callbacks import ModelCheckpoint. ModelCheckpoint will be used to save the best model, with the highest accuracy or loss.

Load the dataset

After we finish downloaded the dataset, we need to extract them, in my case, I will extract the dataset to the same directory as my Python script or notebook.

Let’s load the dataset using Pandas and analyze it a bit

train_df = pd.read_csv('train.csv')

Print the 5 top record image

train_df.head()

Output column will look like this:

label pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 ... pixel774 pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 pixel781 pixel782 pixel783

Each record contains a label in the first column and rests are the image pixel data

label contains 10 unique digits, 0 to 9.
pixel 0 to 784, are the flattened 2d array (image pixel data)

Why pixel only have 1 value?

Should not the pixel have 3 values: (R, G, B) ?

Good question. In this case the image has only 1 channel which means 1 value each pixel, 0–255.
Image with 1 channel also means that it is a grayscaled image.

For (R, G, B) case, the image has 3 channels (0–255, 0–255, 0–255).

Image has 1 channel (1 value each pixel, 0–255), that means it is a grayscaled image.

Prepare Training Dataset

Next, we want to split the label & the image into X and y like the classic machine learning prepared dataset. Where X is a list of images and y is the labels

X_train = train_df.iloc[:, 1:]
y_train = train_df.iloc[:, 0]X_train.shape, y_train.shape

Output should look like this:

((42000, 784), (42000,))

Where X contains images and y contains labels. Both should have the same record count, 42000 in this case.

Visualize sample

Next let’s visualize the sample. We will use matplotlib to plot the image. But first, let’s take a sample image with index 42.

image_sample = X_train.loc[42]

Whoops! Remember! each record or image data is a flattened 2d array (pixel 0–784) that means it needs to be converted back to 2d array before we can visualize them.

To do this, first we need to define width and height for this image. In this case, we will use 28 as width and height which is the square root of 784 or 28x28.

W = 28
H = 28

Next we want to reshape the image to 28x28

image_sample = sample.values.reshape(W, H)

finally, use matplotlib to plot the image_sample:

plt.imshow(image_sample)
plt.show()

Output:

Why image not grayscaled ?

It is grayscaled, just need a few more code to visualize it correctly

plt.imshow(sample.values.reshape(W, H), cmap=plt.cm.gray)
plt.show()

Output:

Normalize data

Next, we will scale the pixel values. Currently the pixel values are integers in range 0 to 255, we want these values to be in range 0 to 1.

To do this, simply divide the pixel values with 255:

X_train = X_train.astype('float32') / 255

Prepare dataset shape to feed into CNN

With tensorflow as backend the input shape is (batch, width, height, channels)

Where:

batch or batch size is the dataset size / row count, in this case 42000
width is the image width
height is the image height
channel is the image channel. 1 for grayscale and 3 for rgb or bgr.

Next, we reshape the training dataset

X_train = X_train.values.reshape(-1, W, H, 1)
X_train.shape

why the batch size is -1? Tensorflow will automatically count the batch size. In this case, 42000.

Output:

(42000, 28, 28, 1)

Looks good, (batch, width, height, channels)

Define Neural Network model

Next we want to define the neural network model. First we will define an empty sequential model.

model = Sequential()

Done, we have initialized an empty sequential model. Next, we want to add layers to this empty model:

First layer we add is an input layer and a convolution layer, with input shape (width, height, channels)

input_shape = (W, H, 1)model.add(layers.Conv2D(filters=32,
      kernel_size=(5,5),
      padding='same',
      activation='relu',
      input_shape=input_shape)) # input shapemodel.add(layers.MaxPool2D())
model.add(layers.Dropout(0.4))

Next, we use 1 more convolution layer and flatten and 1 dense layer.

model.add(layers.Conv2D(filters=64, kernel_size=(5,5), padding='same', activation='relu'))model.add(layers.MaxPool2D())
model.add(layers.Dropout(0.4))model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.4))

Finally for the output layer, we use dense layer as output layer with output size equals to the number class, in this case it is 10 (digit 0 to 9)

number_of_classes = y_train.nunique() # 10
model.add(layers.Dense(number_of_classes, activation='softmax'))

Compile model & summary

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])model.summary()

Output:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 28, 28, 32)        832       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        51264     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 7, 7, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               401536    
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 454,922
Trainable params: 454,922
Non-trainable params: 0
_________________________________________________________________

references for these layers:

Prepare labels to fit output layers

Next we need to prepare the labels, so the labels can fit the output layers of our neural network. Output layers size is 10 which means we need our labels to be length 10.
To do this we will use techniques called One Hot Encoding.

Here is the simple example of the output of one hot encoding

# Example label = 0
# One hot encode Output, index 0 value is 1, rests are 0
[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10# Example label = 1
# One hot encode Output, index 1 value is 1, rests are 0
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10# Example label = 2
# One hot encode Output, index 2 value is 1, rests are 0
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.] # Array length = number of classes, this case 10# ...

We will use np_utils from tensorflow that can helps us achive one hot encoded labels

y_train = np_utils.to_categorical(y_train, number_of_classes)
y_train.shape

Output

(42000, 10)

Sample of one hot encoded y_train

# Input
y_train# Output
array([[0., 1., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.]], dtype=float32)

Training process

Before training, we will configure callbacks for early stopping and checkpoint

early_stopping = EarlyStopping(monitor='val_loss',
          patience=8,
          verbose=1,
          mode='min')mcp_save = ModelCheckpoint('digit_recognizer.h5',
         save_best_only=True,
         monitor='val_loss',
         verbose=1,
         mode='auto')

EarlyStopping() will stop the training if there is no improvement on the monitored value (in this case: val_loss).

ModelCheckpoint() will save the best val_loss model, with filename "digit_recognizer.h5".

Next, start the training!

history = model.fit(X_train,
                    y_train, 
                    epochs=100, 
                    batch_size=32, 
                    verbose=1, 
                    validation_split=0.2,
                    callbacks=[early_stopping, mcp_save])

epochs means the number of passes of the entire training dataset the machine learning algorithm has completed.
batch_size defines the number of samples that will be propagated through the network. (https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network).
validation_split, validation split proportion each epoch. In this case 20% of the dataset.

Plot training graph

model.fit() function will return a dictionary of training datas including loss, val_loss, accuracy, and val_accuracy

the dictionary will look something like this

{
    "val_loss": [ list of validation losses... ],
    "val_accuracy": [ list of validation accuracies... ],
    "loss": [ list of losses... ],
    "accuracy": [ list of accuracies... ]
}

Plot function

# function to plot accuracy / loss
def plotgraph(epochs, acc, val_acc):
    # Plot training & validation accuracy values
    plt.plot(epochs, acc, 'b')
    plt.plot(epochs, val_acc, 'r')
    plt.title('Model accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'], loc='upper left')
    plt.show()

Plot the training history

accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(accuracy))
# Plot accuracy & val accuracy
plotgraph(epochs, accuracy, val_accuracy)

Output:

plotgraph(epochs, loss, val_loss)

Output:

Test Model Accuracy

We need to process the test data like what we did to train data.

Load dataset

Using pandas:

test_df = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
test_df.shape

Output:

(28000, 784)

There are 28000 records of flattened images.

Next, we will repeat the same preprocessing as we did to the train set

Scale image pixel values
Reshape flattened image to (batch, width, height, channels)

X_test = test_df.values
print(X_test.shape)# Scale pixel values
X_test = X_test.astype('float32') / 255# Reshape to (batch, W, H, channels)
X_test = X_test.reshape(-1, W, H, 1)
print(X_test.shape)

Take sample and Visualize:

# plot image entry number 42, reshape image to size 28x28 pixel
plt.imshow(X_test[42].reshape(28, 28), cmap=plt.cm.gray)
plt.show()

Output:

Predict test image

First we load the saved best model

model = load_model('digit_recognizer.h5')

Next, we predict all 28k images on X_test

y_pred = model.predict(X_test)
y_pred

Each item in list y_pred is a list with length 10 (output layer size/number_of_classes), the accuracy of each label predicted.

For example: value of index 1 is 0.93, means label 1 has accuracy 93% which also means this image 93% might be 1.

Sample prediction from y_pred will look like the one hot encoded label:

# just an example
[0., 0.93, 0., 0., 0., 0.23, 0., 0., 0.48, 0.]

To get the actual prediction, we want to get the highest number in the list. To achive this, we can use .argmax().

Take Sample 10–16 & Visualize prediction

# get prediction and visualize
for i in range(10, 16):
    plt.subplot(280 + (i%10+1))
    plt.imshow(X_test[i].reshape(28, 28), cmap=plt.cm.gray)
    plt.title(y_pred[i].argmax())
plt.show()

Output:

Take prediction sample entry number 42 & visualize

plt.imshow(X_test[42].reshape(W, H), cmap=plt.cm.gray)
print(y_pred[42].argmax())

Output:

You can find the notebook at my Github
https://github.com/madeyoga/MNIST

or Kaggle kernel
https://www.kaggle.com/yeogaa/mnist-train-digit-recognizer

I hope you enjoyed this post! If you think this can be useful to others, Don’t forget to share with them.

Thanks!

MNIST — Handwritten digit recognition using tensorflow 2 and Keras

Quick Intro

What is digit recognition ?

What is MNIST ?

Practice Skills

Requirements

Create Notebook

Load the dataset

Why pixel only have 1 value?

Prepare Training Dataset

Visualize sample

Why image not grayscaled ?

Normalize data

Prepare dataset shape to feed into CNN

Define Neural Network model

Prepare labels to fit output layers

Sample of one hot encoded y_train

Training process

Plot training graph

Plot function

Plot the training history

Test Model Accuracy

Load dataset

Predict test image

Written by madey

No responses yet