Case Study: Facial Key-Points Detection¶

1 Case Study: Facial Key-Points Detection
2 Introduction
3 Review
4 Import libraries And Data
5 Data Exploration And Visualization
6 Image Augmentation
7 Normalization And Training Data Preparation
8 Create Model Architecture (Residual Neural Network)
9 Compile And Train Deep Learning Model
10 Model Evaluation
11 Conclusion

Introduction¶

Facial key-point detection serves as a basis for Emotional AI applications like detecting customer emotional responses to Ads and Driver monitoring Systems. In this project, a deep learning model based on Convolutional Neural Network and Residual Blocks to predict facial key points.

Affectiva is one of the leading players in Emotional AI and their software detects human emotion, complex cognitive states and behaviours. (https://www.affectiva.com/)

Problem:

Build and train a deep learning model based on Convolutional Neural Network and Residual blocks using Keras with Tensorflow 2.0 as a backend.
Assess the performance of trained CNN and ensure its generalization using various Key performance indicators.

Dataset:

The dataset consist of x and y coordinates of 15 facial key points.
Input images are 96 x 96 pixels
Images consist of only one color channel (Gray-scale images)

Source: Kaggle Competition

Review¶

Import libraries And Data¶

# Import the necessary packages
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.python.keras import Sequential
from tensorflow.keras import layers, optimizers
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint, LearningRateScheduler
from IPython.display import display
from tensorflow.keras import backend as K
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# load the data
facialpoints_df = pd.read_csv('KeyFacialPoints.csv')

Data Exploration And Visualization¶

# Check the data
facialpoints_df.head()

# Check data info
facialpoints_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2140 entries, 0 to 2139
Data columns (total 31 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   left_eye_center_x          2140 non-null   float64
 1   left_eye_center_y          2140 non-null   float64
 2   right_eye_center_x         2140 non-null   float64
 3   right_eye_center_y         2140 non-null   float64
 4   left_eye_inner_corner_x    2140 non-null   float64
 5   left_eye_inner_corner_y    2140 non-null   float64
 6   left_eye_outer_corner_x    2140 non-null   float64
 7   left_eye_outer_corner_y    2140 non-null   float64
 8   right_eye_inner_corner_x   2140 non-null   float64
 9   right_eye_inner_corner_y   2140 non-null   float64
 10  right_eye_outer_corner_x   2140 non-null   float64
 11  right_eye_outer_corner_y   2140 non-null   float64
 12  left_eyebrow_inner_end_x   2140 non-null   float64
 13  left_eyebrow_inner_end_y   2140 non-null   float64
 14  left_eyebrow_outer_end_x   2140 non-null   float64
 15  left_eyebrow_outer_end_y   2140 non-null   float64
 16  right_eyebrow_inner_end_x  2140 non-null   float64
 17  right_eyebrow_inner_end_y  2140 non-null   float64
 18  right_eyebrow_outer_end_x  2140 non-null   float64
 19  right_eyebrow_outer_end_y  2140 non-null   float64
 20  nose_tip_x                 2140 non-null   float64
 21  nose_tip_y                 2140 non-null   float64
 22  mouth_left_corner_x        2140 non-null   float64
 23  mouth_left_corner_y        2140 non-null   float64
 24  mouth_right_corner_x       2140 non-null   float64
 25  mouth_right_corner_y       2140 non-null   float64
 26  mouth_center_top_lip_x     2140 non-null   float64
 27  mouth_center_top_lip_y     2140 non-null   float64
 28  mouth_center_bottom_lip_x  2140 non-null   float64
 29  mouth_center_bottom_lip_y  2140 non-null   float64
 30  Image                      2140 non-null   object 
dtypes: float64(30), object(1)
memory usage: 518.4+ KB

# Let's take a look at a sample image
facialpoints_df['Image'][1];

# Since values for the image is given as space separated string, we will need to separate the values using ' ' as separator.
# Then convert this into numpy array using np.fromstring and convert the obtained 1D array into 2D array of shape (96,96)
facialpoints_df['Image'] = facialpoints_df['Image'].apply(lambda x: np.fromstring(x, dtype= int, sep = ' ').reshape(96,96))

# Let's obtain the shape of the resized image
facialpoints_df['Image'][1].shape

(96, 96)

# Let's confirm that there are no null values 
facialpoints_df.isnull().sum()

left_eye_center_x            0
left_eye_center_y            0
right_eye_center_x           0
right_eye_center_y           0
left_eye_inner_corner_x      0
left_eye_inner_corner_y      0
left_eye_outer_corner_x      0
left_eye_outer_corner_y      0
right_eye_inner_corner_x     0
right_eye_inner_corner_y     0
right_eye_outer_corner_x     0
right_eye_outer_corner_y     0
left_eyebrow_inner_end_x     0
left_eyebrow_inner_end_y     0
left_eyebrow_outer_end_x     0
left_eyebrow_outer_end_y     0
right_eyebrow_inner_end_x    0
right_eyebrow_inner_end_y    0
right_eyebrow_outer_end_x    0
right_eyebrow_outer_end_y    0
nose_tip_x                   0
nose_tip_y                   0
mouth_left_corner_x          0
mouth_left_corner_y          0
mouth_right_corner_x         0
mouth_right_corner_y         0
mouth_center_top_lip_x       0
mouth_center_top_lip_y       0
mouth_center_bottom_lip_x    0
mouth_center_bottom_lip_y    0
Image                        0
dtype: int64

# Plot a random image from the dataset along with facial keypoints. 
i = np.random.randint(1, len(facialpoints_df))
plt.imshow(facialpoints_df['Image'][i],cmap='gray')
plt.show

<function matplotlib.pyplot.show(*args, **kw)>

# The (x, y) coordinates for the 15 key features are plotted on top of the image
# Below is a for loop starting from index = 1 to 32 with step of 2
# In the first iteration j would be 1, followed by 3 and so on.
# since x-coordinates are in even columns like 0,2,4,.. and y-coordinates are in odd columns like 1,3,5,..
# we access their value using .loc command, which get the values for coordinates of the image based on the column it is refering to.
# in the first iteration df[i][j-1] would be df[i][0] refering the value in 1st column(x-coordinate) of the image in 'i' row.

plt.figure()
plt.imshow(facialpoints_df['Image'][i],cmap='gray')
for j in range(1,31,2):
        plt.plot(facialpoints_df.loc[i][j-1], facialpoints_df.loc[i][j], 'rx')

# Import library
import random

# Let's view more images in a grid format
fig = plt.figure(figsize=(20, 20))

for i in range(16):
    ax = fig.add_subplot(4, 4, i + 1)    
    image = plt.imshow(facialpoints_df['Image'][i], cmap = 'gray')
    for j in range(1,31,2):
        plt.plot(facialpoints_df.loc[i][j-1], facialpoints_df.loc[i][j], 'rx')

# Import library
import random

# Let's view more images in a grid format
fig = plt.figure(figsize=(20, 20))

for i in range(64):
    ax = fig.add_subplot(8, 8, i + 1)   
    # Generate random number within a range
    img = np.random.randint(1, len(facialpoints_df))
    image = plt.imshow(facialpoints_df['Image'][img], cmap = 'gray')
    for j in range(1,31,2):
        plt.plot(facialpoints_df.loc[img][j-1], facialpoints_df.loc[img][j], 'rx')

Image Augmentation¶

# Import library
import copy

# Create a new copy of the dataframe
facialpoints_df_copy = copy.copy(facialpoints_df)

# obtain the header of the DataFrame (names of columns) 
columns = facialpoints_df_copy.columns[:-1]

# Check Cloumns
columns

Index(['left_eye_center_x', 'left_eye_center_y', 'right_eye_center_x',
       'right_eye_center_y', 'left_eye_inner_corner_x',
       'left_eye_inner_corner_y', 'left_eye_outer_corner_x',
       'left_eye_outer_corner_y', 'right_eye_inner_corner_x',
       'right_eye_inner_corner_y', 'right_eye_outer_corner_x',
       'right_eye_outer_corner_y', 'left_eyebrow_inner_end_x',
       'left_eyebrow_inner_end_y', 'left_eyebrow_outer_end_x',
       'left_eyebrow_outer_end_y', 'right_eyebrow_inner_end_x',
       'right_eyebrow_inner_end_y', 'right_eyebrow_outer_end_x',
       'right_eyebrow_outer_end_y', 'nose_tip_x', 'nose_tip_y',
       'mouth_left_corner_x', 'mouth_left_corner_y', 'mouth_right_corner_x',
       'mouth_right_corner_y', 'mouth_center_top_lip_x',
       'mouth_center_top_lip_y', 'mouth_center_bottom_lip_x',
       'mouth_center_bottom_lip_y'],
      dtype='object')

# Take a look at the pixel values of a sample image and see if it makes sense!
facialpoints_df['Image'][0];

# plot the sample image
plt.imshow(facialpoints_df['Image'][0], cmap = 'gray')
plt.show()

# Now Let's flip the image column horizontally 
facialpoints_df_copy['Image'] = facialpoints_df_copy['Image'].apply(lambda x: np.flip(x, axis = 1))

# Now take a look at the flipped image and do a sanity check!
# Notice that the values of pixels are now flipped
facialpoints_df_copy['Image'][0];

# Notice that the image is flipped now
plt.imshow(facialpoints_df_copy['Image'][0], cmap = 'gray')
plt.show()

# Since we are flipping the images horizontally, y coordinate values would be the same
# X coordinate values only would need to change, all we have to do is to subtract our initial x-coordinate values from width of the image(96)
for i in range(len(columns)):
  if i%2 == 0:
    facialpoints_df_copy[columns[i]] = facialpoints_df_copy[columns[i]].apply(lambda x: 96. - float(x) )

# View the Original image
plt.imshow(facialpoints_df['Image'][0],cmap='gray')
for j in range(1, 31, 2):
        plt.plot(facialpoints_df.loc[0][j-1], facialpoints_df.loc[0][j], 'rx')

# View the Horizontally flipped image
plt.imshow(facialpoints_df_copy['Image'][0], cmap='gray')
for j in range(1, 31, 2):
        plt.plot(facialpoints_df_copy.loc[0][j-1], facialpoints_df_copy.loc[0][j], 'rx')

# Concatenate the original dataframe with the augmented dataframe
facialpoints_df_augmented = np.concatenate((facialpoints_df,facialpoints_df_copy))

# Check dimension
facialpoints_df_augmented.shape

(4280, 31)

# Import library
import random

# Let's try to perform another image augmentation by randomly increasing images brightness
# We multiply pixel values by random values between 1 and 2 to increase the brightness of the image
# we clip the value between 0 and 255
facialpoints_df_copy = copy.copy(facialpoints_df)
facialpoints_df_copy['Image'] = facialpoints_df['Image'].apply(lambda x:np.clip(random.uniform(1, 2) * x, 0.0, 255.0))
facialpoints_df_augmented = np.concatenate((facialpoints_df_augmented, facialpoints_df_copy))

# Check dimension
facialpoints_df_augmented.shape

(6420, 31)

# Create another copy
facialpoints_df_copy = copy.copy(facialpoints_df)

# Flip the image column vertically (note that axis = 0) 
facialpoints_df_copy['Image'] = facialpoints_df_copy['Image'].apply(lambda x: np.flip(x, axis = 0))

facialpoints_df['Image'][0]

facialpoints_df_copy['Image'][0]

# Since we are flipping the images vertically, x coordinate values would be the same
# y coordinate values only would need to change, all we have to do is to subtract our initial y-coordinate values from width of the image(96)
for i in range(len(columns)):
  if i%2 == 1:
    facialpoints_df_copy[columns[i]] = facialpoints_df_copy[columns[i]].apply(lambda x: 96. - float(x) )
    
# View the Horizontally flipped image
plt.imshow(facialpoints_df_copy['Image'][0], cmap='gray')
for j in range(1, 31, 2):
        plt.plot(facialpoints_df_copy.loc[0][j-1], facialpoints_df_copy.loc[0][j], 'rx')

Normalization And Training Data Preparation¶

# Obtain the value of 'Images' and normalize it
# Note that 'Images' are in the 31st column but since indexing start from 0, we refer 31st column by 30
img = facialpoints_df_augmented[:, 30]
img = img/255.

# Create an empty array of shape (10700, 96, 96, 1) to train the model
X = np.empty((len(img), 96, 96, 1))

# Iterate through the normalized images list and add image values to the empty array 
# Note that we need to expand it's dimension from (96,96) to (96,96,1)
for i in range(len(img)):
  X[i,] = np.expand_dims(img[i], axis = 2)

# Convert the array type to float32
X = np.asarray(X).astype(np.float32)
X.shape

(6420, 96, 96, 1)

# Obtain the values of key face points coordinates, which are to used as target.
y = facialpoints_df_augmented[:,:30]
y = np.asarray(y).astype(np.float32)
y.shape

(6420, 30)

# Split the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1)

# Check train data dimensions
X_train.shape, X_test.shape

((5778, 96, 96, 1), (642, 96, 96, 1))

# Check test data dimiensions
y_test.shape, y_test.shape

((642, 30), (642, 30))

# Let's view more images in a grid format
fig = plt.figure(figsize=(20, 20))

for i in range(64):
    ax = fig.add_subplot(8, 8, i + 1)    
    image = plt.imshow(X_train[i].reshape(96,96), cmap = 'gray')
    for j in range(1,31,2):
        plt.plot(y_train[i][j-1], y_train[i][j], 'rx')

Create Model Architecture (Residual Neural Network)¶

def res_block(X, filter, stage):
    
  # CONVOLUTIONAL BLOCK
  X_copy = X
  f1 , f2, f3 = filter

  # Main Path
  X = Conv2D(f1, (1,1), strides = (1,1), name ='res_'+str(stage)+'_conv_a', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = MaxPool2D((2,2))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_a')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_conv_b', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_b')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_conv_c', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_c')(X)

  # Short path
  X_copy = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_conv_copy', kernel_initializer= glorot_uniform(seed = 0))(X_copy)
  X_copy = MaxPool2D((2,2))(X_copy)
  X_copy = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_copy')(X_copy)

  # Add data from main and short paths
  X = Add()([X,X_copy])
  X = Activation('relu')(X)
  
  # IDENTITY BLOCK 1
  X_copy = X
    
  # Main Path
  X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_identity_1_a', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_a')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_identity_1_b', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_b')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_identity_1_c', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_c')(X)

  # Add both paths together (Note that we feed the original input as is hence the name "identity")
  X = Add()([X,X_copy])
  X = Activation('relu')(X)

  # IDENTITY BLOCK 2
  X_copy = X

  # Main Path
  X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_identity_2_a', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_a')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_identity_2_b', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_b')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_identity_2_c', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_c')(X)

  # Add both paths together (Note that we feed the original input as is hence the name "identity")
  X = Add()([X,X_copy])
  X = Activation('relu')(X)

  return X

input_shape = (96,96,1)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# Stage #1
X = Conv2D(64, (7,7), strides= (2,2), name = 'conv1', kernel_initializer= glorot_uniform(seed = 0))(X)
X = BatchNormalization(axis =3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides= (2,2))(X)

# Stage #2
X = res_block(X, filter= [64,64,256], stage= 2)

# Stage #3
X = res_block(X, filter= [128,128,512], stage= 3)

# Average Pooling
X = AveragePooling2D((2,2), name = 'Averagea_Pooling')(X)

# Final layer
X = Flatten()(X)
X = Dense(4096, activation = 'relu')(X)
X = Dropout(0.2)(X)
X = Dense(2048, activation = 'relu')(X)
X = Dropout(0.1)(X)
X = Dense(30, activation = 'relu')(X)

model = Model( inputs= X_input, outputs = X)
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 96, 96, 1)]  0                                            
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D)  (None, 102, 102, 1)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 48, 48, 64)   3200        zero_padding2d[0][0]             
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 48, 48, 64)   256         conv1[0][0]                      
__________________________________________________________________________________________________
activation (Activation)         (None, 48, 48, 64)   0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 23, 23, 64)   0           activation[0][0]                 
__________________________________________________________________________________________________
res_2_conv_a (Conv2D)           (None, 23, 23, 64)   4160        max_pooling2d[0][0]              
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 11, 11, 64)   0           res_2_conv_a[0][0]               
__________________________________________________________________________________________________
bn_2_conv_a (BatchNormalization (None, 11, 11, 64)   256         max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 11, 11, 64)   0           bn_2_conv_a[0][0]                
__________________________________________________________________________________________________
res_2_conv_b (Conv2D)           (None, 11, 11, 64)   36928       activation_1[0][0]               
__________________________________________________________________________________________________
bn_2_conv_b (BatchNormalization (None, 11, 11, 64)   256         res_2_conv_b[0][0]               
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 11, 11, 64)   0           bn_2_conv_b[0][0]                
__________________________________________________________________________________________________
res_2_conv_copy (Conv2D)        (None, 23, 23, 256)  16640       max_pooling2d[0][0]              
__________________________________________________________________________________________________
res_2_conv_c (Conv2D)           (None, 11, 11, 256)  16640       activation_2[0][0]               
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 11, 11, 256)  0           res_2_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_2_conv_c (BatchNormalization (None, 11, 11, 256)  1024        res_2_conv_c[0][0]               
__________________________________________________________________________________________________
bn_2_conv_copy (BatchNormalizat (None, 11, 11, 256)  1024        max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
add (Add)                       (None, 11, 11, 256)  0           bn_2_conv_c[0][0]                
                                                                 bn_2_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 11, 11, 256)  0           add[0][0]                        
__________________________________________________________________________________________________
res_2_identity_1_a (Conv2D)     (None, 11, 11, 64)   16448       activation_3[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_a (BatchNormali (None, 11, 11, 64)   256         res_2_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 11, 11, 64)   0           bn_2_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_2_identity_1_b (Conv2D)     (None, 11, 11, 64)   36928       activation_4[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_b (BatchNormali (None, 11, 11, 64)   256         res_2_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 11, 11, 64)   0           bn_2_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_2_identity_1_c (Conv2D)     (None, 11, 11, 256)  16640       activation_5[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_c (BatchNormali (None, 11, 11, 256)  1024        res_2_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_1 (Add)                     (None, 11, 11, 256)  0           bn_2_identity_1_c[0][0]          
                                                                 activation_3[0][0]               
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 11, 11, 256)  0           add_1[0][0]                      
__________________________________________________________________________________________________
res_2_identity_2_a (Conv2D)     (None, 11, 11, 64)   16448       activation_6[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_a (BatchNormali (None, 11, 11, 64)   256         res_2_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 11, 11, 64)   0           bn_2_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_2_identity_2_b (Conv2D)     (None, 11, 11, 64)   36928       activation_7[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_b (BatchNormali (None, 11, 11, 64)   256         res_2_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 11, 11, 64)   0           bn_2_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_2_identity_2_c (Conv2D)     (None, 11, 11, 256)  16640       activation_8[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_c (BatchNormali (None, 11, 11, 256)  1024        res_2_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_2 (Add)                     (None, 11, 11, 256)  0           bn_2_identity_2_c[0][0]          
                                                                 activation_6[0][0]               
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 11, 11, 256)  0           add_2[0][0]                      
__________________________________________________________________________________________________
res_3_conv_a (Conv2D)           (None, 11, 11, 128)  32896       activation_9[0][0]               
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 5, 5, 128)    0           res_3_conv_a[0][0]               
__________________________________________________________________________________________________
bn_3_conv_a (BatchNormalization (None, 5, 5, 128)    512         max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 5, 5, 128)    0           bn_3_conv_a[0][0]                
__________________________________________________________________________________________________
res_3_conv_b (Conv2D)           (None, 5, 5, 128)    147584      activation_10[0][0]              
__________________________________________________________________________________________________
bn_3_conv_b (BatchNormalization (None, 5, 5, 128)    512         res_3_conv_b[0][0]               
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 5, 5, 128)    0           bn_3_conv_b[0][0]                
__________________________________________________________________________________________________
res_3_conv_copy (Conv2D)        (None, 11, 11, 512)  131584      activation_9[0][0]               
__________________________________________________________________________________________________
res_3_conv_c (Conv2D)           (None, 5, 5, 512)    66048       activation_11[0][0]              
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 5, 5, 512)    0           res_3_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_3_conv_c (BatchNormalization (None, 5, 5, 512)    2048        res_3_conv_c[0][0]               
__________________________________________________________________________________________________
bn_3_conv_copy (BatchNormalizat (None, 5, 5, 512)    2048        max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
add_3 (Add)                     (None, 5, 5, 512)    0           bn_3_conv_c[0][0]                
                                                                 bn_3_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 5, 5, 512)    0           add_3[0][0]                      
__________________________________________________________________________________________________
res_3_identity_1_a (Conv2D)     (None, 5, 5, 128)    65664       activation_12[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_a (BatchNormali (None, 5, 5, 128)    512         res_3_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 5, 5, 128)    0           bn_3_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_3_identity_1_b (Conv2D)     (None, 5, 5, 128)    147584      activation_13[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_b (BatchNormali (None, 5, 5, 128)    512         res_3_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 5, 5, 128)    0           bn_3_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_3_identity_1_c (Conv2D)     (None, 5, 5, 512)    66048       activation_14[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_c (BatchNormali (None, 5, 5, 512)    2048        res_3_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_4 (Add)                     (None, 5, 5, 512)    0           bn_3_identity_1_c[0][0]          
                                                                 activation_12[0][0]              
__________________________________________________________________________________________________
activation_15 (Activation)      (None, 5, 5, 512)    0           add_4[0][0]                      
__________________________________________________________________________________________________
res_3_identity_2_a (Conv2D)     (None, 5, 5, 128)    65664       activation_15[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_a (BatchNormali (None, 5, 5, 128)    512         res_3_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_16 (Activation)      (None, 5, 5, 128)    0           bn_3_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_3_identity_2_b (Conv2D)     (None, 5, 5, 128)    147584      activation_16[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_b (BatchNormali (None, 5, 5, 128)    512         res_3_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_17 (Activation)      (None, 5, 5, 128)    0           bn_3_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_3_identity_2_c (Conv2D)     (None, 5, 5, 512)    66048       activation_17[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_c (BatchNormali (None, 5, 5, 512)    2048        res_3_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_5 (Add)                     (None, 5, 5, 512)    0           bn_3_identity_2_c[0][0]          
                                                                 activation_15[0][0]              
__________________________________________________________________________________________________
activation_18 (Activation)      (None, 5, 5, 512)    0           add_5[0][0]                      
__________________________________________________________________________________________________
Averagea_Pooling (AveragePoolin (None, 2, 2, 512)    0           activation_18[0][0]              
__________________________________________________________________________________________________
flatten (Flatten)               (None, 2048)         0           Averagea_Pooling[0][0]           
__________________________________________________________________________________________________
dense (Dense)                   (None, 4096)         8392704     flatten[0][0]                    
__________________________________________________________________________________________________
dropout (Dropout)               (None, 4096)         0           dense[0][0]                      
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 2048)         8390656     dropout[0][0]                    
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 2048)         0           dense_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 30)           61470       dropout_1[0][0]                  
==================================================================================================
Total params: 18,016,286
Trainable params: 18,007,710
Non-trainable params: 8,576
__________________________________________________________________________________________________

Compile And Train Deep Learning Model¶

#  Pick optimizer
adam = tf.keras.optimizers.Adam(lr = 0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)

# Compile model
model.compile(loss="mean_squared_error", optimizer = adam, metrics = ['accuracy'])

# Save the best model with least validation loss
checkpointer = ModelCheckpoint(filepath = "weights.hdf5", verbose = 1, save_best_only = True)

history = model.fit(X_train, y_train, batch_size = 256, epochs= 100, validation_split = 0.05, callbacks=[checkpointer])

Epoch 1/100
22/22 [==============================] - ETA: 0s - loss: 347.7941 - accuracy: 0.3780 
Epoch 00001: val_loss improved from inf to 2080.01465, saving model to weights.hdf5
22/22 [==============================] - 391s 18s/step - loss: 347.7941 - accuracy: 0.3780 - val_loss: 2080.0146 - val_accuracy: 0.6747
Epoch 2/100
22/22 [==============================] - ETA: 0s - loss: 133.9629 - accuracy: 0.6251 
Epoch 00002: val_loss improved from 2080.01465 to 1629.81067, saving model to weights.hdf5
22/22 [==============================] - 438s 20s/step - loss: 133.9629 - accuracy: 0.6251 - val_loss: 1629.8107 - val_accuracy: 0.6747
Epoch 3/100
22/22 [==============================] - ETA: 0s - loss: 88.8106 - accuracy: 0.6069 
Epoch 00003: val_loss improved from 1629.81067 to 1261.41711, saving model to weights.hdf5
22/22 [==============================] - 479s 22s/step - loss: 88.8106 - accuracy: 0.6069 - val_loss: 1261.4171 - val_accuracy: 0.6747
Epoch 4/100
22/22 [==============================] - ETA: 0s - loss: 63.3095 - accuracy: 0.5981 
Epoch 00004: val_loss improved from 1261.41711 to 1091.34131, saving model to weights.hdf5
22/22 [==============================] - 494s 22s/step - loss: 63.3095 - accuracy: 0.5981 - val_loss: 1091.3413 - val_accuracy: 0.6747
Epoch 5/100
22/22 [==============================] - ETA: 0s - loss: 49.8129 - accuracy: 0.6016 
Epoch 00005: val_loss improved from 1091.34131 to 857.86877, saving model to weights.hdf5
22/22 [==============================] - 470s 21s/step - loss: 49.8129 - accuracy: 0.6016 - val_loss: 857.8688 - val_accuracy: 0.6747
Epoch 6/100
22/22 [==============================] - ETA: 0s - loss: 44.2106 - accuracy: 0.6039 
Epoch 00006: val_loss did not improve from 857.86877
22/22 [==============================] - 453s 21s/step - loss: 44.2106 - accuracy: 0.6039 - val_loss: 875.6435 - val_accuracy: 0.6747
Epoch 7/100
22/22 [==============================] - ETA: 0s - loss: 38.5724 - accuracy: 0.6072 
Epoch 00007: val_loss improved from 857.86877 to 719.25934, saving model to weights.hdf5
22/22 [==============================] - 463s 21s/step - loss: 38.5724 - accuracy: 0.6072 - val_loss: 719.2593 - val_accuracy: 0.6747
Epoch 8/100
22/22 [==============================] - ETA: 0s - loss: 31.5788 - accuracy: 0.6176 
Epoch 00008: val_loss did not improve from 719.25934
22/22 [==============================] - 447s 20s/step - loss: 31.5788 - accuracy: 0.6176 - val_loss: 726.7003 - val_accuracy: 0.6747
Epoch 9/100
22/22 [==============================] - ETA: 0s - loss: 35.5492 - accuracy: 0.6274 
Epoch 00009: val_loss improved from 719.25934 to 504.68286, saving model to weights.hdf5
22/22 [==============================] - 451s 20s/step - loss: 35.5492 - accuracy: 0.6274 - val_loss: 504.6829 - val_accuracy: 0.6747
Epoch 10/100
 5/22 [=====>........................] - ETA: 5:01 - loss: 39.5517 - accuracy: 0.6109

-------------------------------------------------------------------
KeyboardInterrupt                 Traceback (most recent call last)
<ipython-input-39-4b613b137b5a> in <module>
----> 1 history = model.fit(X_train, y_train, batch_size = 256, epochs= 100, validation_split = 0.05, callbacks=[checkpointer])

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs)
     64   def _method_wrapper(self, *args, **kwargs):
     65     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
---> 66       return method(self, *args, **kwargs)
     67 
     68     # Running inside `run_distribute_coordinator` already.

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
    846                 batch_size=batch_size):
    847               callbacks.on_train_batch_begin(step)
--> 848               tmp_logs = train_function(iterator)
    849               # Catch OutOfRangeError for Datasets of unknown size.
    850               # This blocks until the batch has finished executing.

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
    578         xla_context.Exit()
    579     else:
--> 580       result = self._call(*args, **kwds)
    581 
    582     if tracing_count == self._get_tracing_count():

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
    609       # In this case we have created variables on the first call, so we run the
    610       # defunned version which is guaranteed to never create variables.
--> 611       return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
    612     elif self._stateful_fn is not None:
    613       # Release the lock early so that multiple threads can perform the call

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
   2418     with self._lock:
   2419       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 2420     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   2421 
   2422   @property

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\eager\function.py in _filtered_call(self, args, kwargs)
   1659       `args` and `kwargs`.
   1660     """
-> 1661     return self._call_flat(
   1662         (t for t in nest.flatten((args, kwargs), expand_composites=True)
   1663          if isinstance(t, (ops.Tensor,

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1743         and executing_eagerly):
   1744       # No tape is watching; skip to running the function.
-> 1745       return self._build_call_outputs(self._inference_function.call(
   1746           ctx, args, cancellation_manager=cancellation_manager))
   1747     forward_backward = self._select_forward_and_backward_functions(

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
    591       with _InterpolateFunctionError(self):
    592         if cancellation_manager is None:
--> 593           outputs = execute.execute(
    594               str(self.signature.name),
    595               num_outputs=self._num_outputs,

~\miniconda3\envs\joseff\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57   try:
     58     ctx.ensure_initialized()
---> 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:

KeyboardInterrupt:

# Save trained model
model_json = model.to_json()
with open('KeyPointDetector.json', 'w') as json_file:
        json_file.write(model_json)

Model Evaluation¶

# My laptop is very slow, it would take a lifetime to finish the training.
# Instead of training from scratch, just a load trained model weights.
with open('KeyPointDetector.json', 'r') as json_file:
    json_SavedModel = json_file.read()
model = tf.keras.models.model_from_json(json_SavedModel)
model.load_weights('weights.hdf5')
model.compile(loss="mean_squared_error", optimizer = adam, metrics = ['accuracy'])

# Evaluate trained model
result = model.evaluate(X_test,y_test)
print("Accuracy : {}".format(result[1]))

21/21 [==============================] - 5s 257ms/step - loss: 476.0299 - accuracy: 0.6900
Accuracy : 0.6900311708450317

# Make prediction using the testing dataset
df_predict = model.predict(X_test)

# Import library
from sklearn.metrics import mean_squared_error
from math import sqrt

# Print the rmse loss values
rms = sqrt(mean_squared_error(y_test, df_predict))
print("RMSE value : {}".format(rms))

RMSE value : 21.818107518156292

# Convert the predicted values into a dataframe
df_predict= pd.DataFrame(df_predict, columns = columns)
df_predict.head()

# Plot the test images and their predicted keypoints
fig = plt.figure(figsize=(20, 20))

for i in range(8):
    ax = fig.add_subplot(4, 2, i + 1)
    # Using squeeze to convert the image shape from (96,96,1) to (96,96)
    plt.imshow(X_test[i].squeeze(),cmap='gray')
    for j in range(1,31,2):
            plt.plot(df_predict.loc[i][j-1], df_predict.loc[i][j], 'rx')

Conclusion¶

Model architecture were built properly but obtained a 21.81 root mean squared error. It is probably due to lack of iteration during the training. Some points are not align properly in the face but there also points somehow able to fit and possible reasons for that because of image augmentation. Image augmentation was needed to implement to avoid machine from memorizing the key point's distance. The model can be further improve by increasing the iteration .

	left_eye_center_x	left_eye_center_y	right_eye_center_x	right_eye_center_y	left_eye_inner_corner_x	left_eye_inner_corner_y	left_eye_outer_corner_x	left_eye_outer_corner_y	right_eye_inner_corner_x	right_eye_inner_corner_y	...	nose_tip_y	mouth_left_corner_x	mouth_left_corner_y	mouth_right_corner_x	mouth_right_corner_y	mouth_center_top_lip_x	mouth_center_top_lip_y	mouth_center_bottom_lip_x	mouth_center_bottom_lip_y	Image
0	66.033564	39.002274	30.227008	36.421678	59.582075	39.647423	73.130346	39.969997	36.356571	37.389402	...	57.066803	61.195308	79.970165	28.614496	77.388992	43.312602	72.935459	43.130707	84.485774	238 236 237 238 240 240 239 241 241 243 240 23...
1	64.332936	34.970077	29.949277	33.448715	58.856170	35.274349	70.722723	36.187166	36.034723	34.361532	...	55.660936	56.421447	76.352000	35.122383	76.047660	46.684596	70.266553	45.467915	85.480170	219 215 204 196 204 211 212 200 180 168 178 19...
2	65.057053	34.909642	30.903789	34.909642	59.412000	36.320968	70.984421	36.320968	37.678105	36.320968	...	53.538947	60.822947	73.014316	33.726316	72.732000	47.274947	70.191789	47.274947	78.659368	144 142 159 180 188 188 184 180 167 132 84 59 ...
3	65.225739	37.261774	32.023096	37.261774	60.003339	39.127179	72.314713	38.380967	37.618643	38.754115	...	54.166539	65.598887	72.703722	37.245496	74.195478	50.303165	70.091687	51.561183	78.268383	193 192 193 194 194 194 193 192 168 111 50 12 ...
4	66.725301	39.621261	32.244810	38.042032	58.565890	39.621261	72.515926	39.884466	36.982380	39.094852	...	64.889521	60.671411	77.523239	31.191755	76.997301	44.962748	73.707387	44.227141	86.871166	147 148 160 196 215 214 216 217 219 220 206 18...

	left_eye_center_x	left_eye_center_y	right_eye_center_x	right_eye_center_y	left_eye_inner_corner_x	left_eye_inner_corner_y	left_eye_outer_corner_x	left_eye_outer_corner_y	right_eye_inner_corner_x	right_eye_inner_corner_y	...	nose_tip_x	nose_tip_y	mouth_left_corner_x	mouth_left_corner_y	mouth_right_corner_x	mouth_right_corner_y	mouth_center_top_lip_x	mouth_center_top_lip_y	mouth_center_bottom_lip_x	mouth_center_bottom_lip_y
0	22.778009	28.818249	52.856094	29.326164	28.165041	29.808807	17.324516	29.893482	47.958397	29.761343	...	38.798412	43.810204	25.850697	59.173306	49.213085	58.788628	38.003124	55.552040	37.594334	65.537857
1	22.509260	28.501558	52.419891	28.984007	27.981737	29.490114	17.245420	29.668568	47.613380	29.395571	...	38.465717	43.369198	25.675537	58.526833	48.818245	58.200172	37.682991	55.026527	37.400166	64.956757
2	22.428795	28.491047	52.655380	28.969599	27.918726	29.507660	16.980984	29.697350	47.646870	29.427084	...	38.576267	43.257053	25.575991	58.691360	48.943867	58.236973	37.665890	54.949150	37.336044	65.125656
3	22.678215	28.647377	52.625305	29.127613	28.127213	29.641539	17.355888	29.836592	47.707329	29.529728	...	38.675137	43.555832	25.798347	58.779263	49.020725	58.477512	37.842480	55.258682	37.500546	65.259003
4	22.612787	28.692408	52.836418	29.142765	28.130503	29.621716	17.300056	29.803293	47.931747	29.574932	...	38.747307	43.543362	25.778595	58.861271	49.183014	58.555443	37.934826	55.312286	37.624649	65.364517