Case Study: Scene Classification and GradCam VisualizationΒΆ

IntroductionΒΆ

In this project, a deep learning model will be train based on Convolutional Neural Networks (CNNs) and Residual Blocks to detect the type of scenery in images. This project could be practically used for detecting the type of scenery from the satellite images. In addition, this project will cover the use of a technique known as Grad-Cam to observe and explain how AI models think.

Microsoft AI For Earth, has created the most detailed United states forest map using satellite imagery and AI, which would essentially be a game changer in reducing deforestation, pests and wildfires.

Explainable AI: Gradient-Weight Class Activation Mapping (Grad-Cam) helps visualize the region of the input that contributed towards making prediction by the model.

Problem:

  • Create a machine learning model that will classify the different imagery, with a Grad-Cam.

Dataset:

  • This dataset contains about ~25k images from a wide range of natural scenes from all around the world. The task is to identify which kind of scene can the image be categorized into.

  • It is a 6 class problem - Buildings, Forests, Mountains, Glacier, Street, Sea

Source: Kaggle Competition

ReviewΒΆ

Libraries and Data ImportationΒΆ

InΒ [1]:
# Import the necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.utils import plot_model
from IPython.display import display
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint, LearningRateScheduler
import os
import PIL
InΒ [3]:
# Check folders
os.listdir('./seg_train')
Out[3]:
['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']
InΒ [4]:
# Check the number of images in training, validation and test dataset
# Create empty list
train = []
test = []

# os.listdir returns the list of files in the folder, in this case image class names
for i in os.listdir('./seg_train'):
  train_class = os.listdir(os.path.join('seg_train', i))
  train.extend(train_class)
  test_class = os.listdir(os.path.join('seg_test', i))
  test.extend(test_class)

# Show the number of train and test images
print('Number of the train images: {}\nNumber of test images: {}'.format(len(train), len(test)))
Number of the train images: 14034
Number of test images: 2762

Data Exploration and VisualizationΒΆ

Train datasetΒΆ

InΒ [9]:
# Visualize the images in the train dataset
fig, axs = plt.subplots(6,5, figsize=(32,32))

# Define count
count = 0

# Create function
for i in os.listdir('./seg_train'):
    
  # Get the list of images in the particular class
  train_class = os.listdir(os.path.join('seg_train',i))
    
  # Plot 5 images per class
  for j in range(5):
        img = os.path.join('seg_train', i, train_class[j])
        img = PIL.Image.open(img)
        axs[count][j].imshow(img)
        axs[count][j].set_title(i, fontsize = 30)

  count+=1 
    
fig.tight_layout()
InΒ [10]:
# Create empty list for train dataset
No_images_per_class = []
Class_name = []

# Check the number of images in each class in the training dataset
for i in os.listdir('./seg_train'):
  train_class = os.listdir(os.path.join('seg_train', i))
  No_images_per_class.append(len(train_class))
  Class_name.append(i)
  print('Number of images in {} = {} \n'.format(i, len(train_class)))
Number of images in buildings = 2191 

Number of images in forest = 2271 

Number of images in glacier = 2404 

Number of images in mountain = 2512 

Number of images in sea = 2274 

Number of images in street = 2382 

InΒ [12]:
# Check list in train dataset
Class_name
Out[12]:
['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']
InΒ [13]:
# Check the numbers in each class in train dataset
No_images_per_class
Out[13]:
[2191, 2271, 2404, 2512, 2274, 2382]
InΒ [14]:
# Plot pie chart for train dataset
fig1, ax1 = plt.subplots()
ax1.pie(No_images_per_class, labels = Class_name, autopct = '%1.1f%%')
plt.show()

Test datasetΒΆ

InΒ [15]:
# Visualize the images in the test dataset
fig, axs = plt.subplots(6,5, figsize=(32,32))

# Define count
count = 0

# Create function
for i in os.listdir('./seg_train'):
    
  # Get the list of images in the particular class
  train_class = os.listdir(os.path.join('seg_test',i))
    
  # Plot 5 images per class
  for j in range(5):
        img = os.path.join('seg_test', i, train_class[j])
        img = PIL.Image.open(img)
        axs[count][j].imshow(img)
        axs[count][j].set_title(i, fontsize = 30)

  count+=1 
    
fig.tight_layout()
InΒ [17]:
# Create empty list for test data
No_images_per_class = []
Class_name = []

# check the number of images in each class in the training dataset
for i in os.listdir('./seg_test'):
  train_class = os.listdir(os.path.join('seg_test', i))
  No_images_per_class.append(len(train_class))
  Class_name.append(i)
  print('Number of images in {} = {} \n'.format(i, len(train_class)))
Number of images in buildings = 200 

Number of images in forest = 474 

Number of images in glacier = 553 

Number of images in mountain = 525 

Number of images in sea = 510 

Number of images in street = 500 

InΒ [18]:
# Check list for test dataset
Class_name
Out[18]:
['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']
InΒ [19]:
# Check the numbers in each class in test dataset
No_images_per_class
Out[19]:
[200, 474, 553, 525, 510, 500]
InΒ [20]:
# Plot pie chart for test data
fig1, ax1 = plt.subplots()
ax1.pie(No_images_per_class, labels = Class_name, autopct = '%1.1f%%')
plt.show()

Data Augmentation And GeneratorΒΆ

InΒ [22]:
# Create run-time augmentation on training and test dataset
# For training datagenerator, we add normalization, shear angle, zooming range and horizontal flip
train_datagen = ImageDataGenerator(
                rescale = 1./255,
                zoom_range = 0.2,
                validation_split = 0.15,
                horizontal_flip = True)

# For test datagenerator, we only normalize the data.
test_datagen = ImageDataGenerator(rescale=1./255)
InΒ [23]:
# Creating datagenerator for training, validation and test dataset.

train_generator = train_datagen.flow_from_directory(
        'seg_train',
        target_size=(256, 256),
        batch_size=32,
        class_mode='categorical',
        subset ='training')

validation_generator = train_datagen.flow_from_directory(
        'seg_train',
        target_size=(256, 256),
        batch_size=32,
        class_mode='categorical',
        subset ='validation')

test_generator = test_datagen.flow_from_directory(
        'seg_test',
        target_size=(256, 256),
        batch_size=32,
        class_mode='categorical')
Found 11932 images belonging to 6 classes.
Found 2102 images belonging to 6 classes.
Found 2762 images belonging to 6 classes.

Residual Neural NetworkΒΆ

InΒ [24]:
def res_block(X, filter, stage):
  
  # Convolutional_block
  X_copy = X

  f1 , f2, f3 = filter
    
  # Main Path
  X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_conv_a', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = MaxPool2D((2,2))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_a')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_conv_b', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_b')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_conv_c', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_c')(X)


  # Short path
  X_copy = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_conv_copy', kernel_initializer= glorot_uniform(seed = 0))(X_copy)
  X_copy = MaxPool2D((2,2))(X_copy)
  X_copy = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_conv_copy')(X_copy)

  # ADD
  X = Add()([X,X_copy])
  X = Activation('relu')(X)

  # Identity Block 1
  X_copy = X


  # Main Path
  X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_identity_1_a', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_a')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_identity_1_b', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_b')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_identity_1_c', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_1_c')(X)

  # ADD
  X = Add()([X,X_copy])
  X = Activation('relu')(X)

  # Identity Block 2
  X_copy = X


  # Main Path
  X = Conv2D(f1, (1,1),strides = (1,1), name ='res_'+str(stage)+'_identity_2_a', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_a')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f2, kernel_size = (3,3), strides =(1,1), padding = 'same', name ='res_'+str(stage)+'_identity_2_b', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_b')(X)
  X = Activation('relu')(X) 

  X = Conv2D(f3, kernel_size = (1,1), strides =(1,1),name ='res_'+str(stage)+'_identity_2_c', kernel_initializer= glorot_uniform(seed = 0))(X)
  X = BatchNormalization(axis =3, name = 'bn_'+str(stage)+'_identity_2_c')(X)

  # ADD
  X = Add()([X,X_copy])
  X = Activation('relu')(X)

  return X
InΒ [25]:
input_shape = (256,256,3)

# Input tensor shape
X_input = Input(input_shape)

# Zero-padding
X = ZeroPadding2D((3,3))(X_input)

# 1 - stage
X = Conv2D(64, (7,7), strides= (2,2), name = 'conv1', kernel_initializer= glorot_uniform(seed = 0))(X)
X = BatchNormalization(axis =3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides= (2,2))(X)

# 2- stage
X = res_block(X, filter= [64,64,256], stage= 2)

# 3- stage
X = res_block(X, filter= [128,128,512], stage= 3)

# 4- stage
X = res_block(X, filter= [256,256,1024], stage= 4)

# 5- stage
X = res_block(X, filter= [512,512,2048], stage= 5)

# Average Pooling
X = AveragePooling2D((2,2), name = 'Averagea_Pooling')(X)

# Final layer
X = Flatten()(X)
X = Dense(6, activation = 'softmax', name = 'Dense_final', kernel_initializer= glorot_uniform(seed=0))(X)


model = Model( inputs= X_input, outputs = X, name = 'Resnet18')

model.summary()
Model: "Resnet18"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256, 256, 3) 0                                            
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D)  (None, 262, 262, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 128, 128, 64) 9472        zero_padding2d[0][0]             
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 128, 128, 64) 256         conv1[0][0]                      
__________________________________________________________________________________________________
activation (Activation)         (None, 128, 128, 64) 0           bn_conv1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d (MaxPooling2D)    (None, 63, 63, 64)   0           activation[0][0]                 
__________________________________________________________________________________________________
res_2_conv_a (Conv2D)           (None, 63, 63, 64)   4160        max_pooling2d[0][0]              
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 31, 31, 64)   0           res_2_conv_a[0][0]               
__________________________________________________________________________________________________
bn_2_conv_a (BatchNormalization (None, 31, 31, 64)   256         max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 31, 31, 64)   0           bn_2_conv_a[0][0]                
__________________________________________________________________________________________________
res_2_conv_b (Conv2D)           (None, 31, 31, 64)   36928       activation_1[0][0]               
__________________________________________________________________________________________________
bn_2_conv_b (BatchNormalization (None, 31, 31, 64)   256         res_2_conv_b[0][0]               
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 31, 31, 64)   0           bn_2_conv_b[0][0]                
__________________________________________________________________________________________________
res_2_conv_copy (Conv2D)        (None, 63, 63, 256)  16640       max_pooling2d[0][0]              
__________________________________________________________________________________________________
res_2_conv_c (Conv2D)           (None, 31, 31, 256)  16640       activation_2[0][0]               
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 31, 31, 256)  0           res_2_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_2_conv_c (BatchNormalization (None, 31, 31, 256)  1024        res_2_conv_c[0][0]               
__________________________________________________________________________________________________
bn_2_conv_copy (BatchNormalizat (None, 31, 31, 256)  1024        max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
add (Add)                       (None, 31, 31, 256)  0           bn_2_conv_c[0][0]                
                                                                 bn_2_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_3 (Activation)       (None, 31, 31, 256)  0           add[0][0]                        
__________________________________________________________________________________________________
res_2_identity_1_a (Conv2D)     (None, 31, 31, 64)   16448       activation_3[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_a (BatchNormali (None, 31, 31, 64)   256         res_2_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_4 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_2_identity_1_b (Conv2D)     (None, 31, 31, 64)   36928       activation_4[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_b (BatchNormali (None, 31, 31, 64)   256         res_2_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_5 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_2_identity_1_c (Conv2D)     (None, 31, 31, 256)  16640       activation_5[0][0]               
__________________________________________________________________________________________________
bn_2_identity_1_c (BatchNormali (None, 31, 31, 256)  1024        res_2_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_1 (Add)                     (None, 31, 31, 256)  0           bn_2_identity_1_c[0][0]          
                                                                 activation_3[0][0]               
__________________________________________________________________________________________________
activation_6 (Activation)       (None, 31, 31, 256)  0           add_1[0][0]                      
__________________________________________________________________________________________________
res_2_identity_2_a (Conv2D)     (None, 31, 31, 64)   16448       activation_6[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_a (BatchNormali (None, 31, 31, 64)   256         res_2_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_7 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_2_identity_2_b (Conv2D)     (None, 31, 31, 64)   36928       activation_7[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_b (BatchNormali (None, 31, 31, 64)   256         res_2_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_8 (Activation)       (None, 31, 31, 64)   0           bn_2_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_2_identity_2_c (Conv2D)     (None, 31, 31, 256)  16640       activation_8[0][0]               
__________________________________________________________________________________________________
bn_2_identity_2_c (BatchNormali (None, 31, 31, 256)  1024        res_2_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_2 (Add)                     (None, 31, 31, 256)  0           bn_2_identity_2_c[0][0]          
                                                                 activation_6[0][0]               
__________________________________________________________________________________________________
activation_9 (Activation)       (None, 31, 31, 256)  0           add_2[0][0]                      
__________________________________________________________________________________________________
res_3_conv_a (Conv2D)           (None, 31, 31, 128)  32896       activation_9[0][0]               
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 15, 15, 128)  0           res_3_conv_a[0][0]               
__________________________________________________________________________________________________
bn_3_conv_a (BatchNormalization (None, 15, 15, 128)  512         max_pooling2d_3[0][0]            
__________________________________________________________________________________________________
activation_10 (Activation)      (None, 15, 15, 128)  0           bn_3_conv_a[0][0]                
__________________________________________________________________________________________________
res_3_conv_b (Conv2D)           (None, 15, 15, 128)  147584      activation_10[0][0]              
__________________________________________________________________________________________________
bn_3_conv_b (BatchNormalization (None, 15, 15, 128)  512         res_3_conv_b[0][0]               
__________________________________________________________________________________________________
activation_11 (Activation)      (None, 15, 15, 128)  0           bn_3_conv_b[0][0]                
__________________________________________________________________________________________________
res_3_conv_copy (Conv2D)        (None, 31, 31, 512)  131584      activation_9[0][0]               
__________________________________________________________________________________________________
res_3_conv_c (Conv2D)           (None, 15, 15, 512)  66048       activation_11[0][0]              
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 15, 15, 512)  0           res_3_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_3_conv_c (BatchNormalization (None, 15, 15, 512)  2048        res_3_conv_c[0][0]               
__________________________________________________________________________________________________
bn_3_conv_copy (BatchNormalizat (None, 15, 15, 512)  2048        max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
add_3 (Add)                     (None, 15, 15, 512)  0           bn_3_conv_c[0][0]                
                                                                 bn_3_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_12 (Activation)      (None, 15, 15, 512)  0           add_3[0][0]                      
__________________________________________________________________________________________________
res_3_identity_1_a (Conv2D)     (None, 15, 15, 128)  65664       activation_12[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_a (BatchNormali (None, 15, 15, 128)  512         res_3_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_13 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_3_identity_1_b (Conv2D)     (None, 15, 15, 128)  147584      activation_13[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_b (BatchNormali (None, 15, 15, 128)  512         res_3_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_14 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_3_identity_1_c (Conv2D)     (None, 15, 15, 512)  66048       activation_14[0][0]              
__________________________________________________________________________________________________
bn_3_identity_1_c (BatchNormali (None, 15, 15, 512)  2048        res_3_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_4 (Add)                     (None, 15, 15, 512)  0           bn_3_identity_1_c[0][0]          
                                                                 activation_12[0][0]              
__________________________________________________________________________________________________
activation_15 (Activation)      (None, 15, 15, 512)  0           add_4[0][0]                      
__________________________________________________________________________________________________
res_3_identity_2_a (Conv2D)     (None, 15, 15, 128)  65664       activation_15[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_a (BatchNormali (None, 15, 15, 128)  512         res_3_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_16 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_3_identity_2_b (Conv2D)     (None, 15, 15, 128)  147584      activation_16[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_b (BatchNormali (None, 15, 15, 128)  512         res_3_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_17 (Activation)      (None, 15, 15, 128)  0           bn_3_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_3_identity_2_c (Conv2D)     (None, 15, 15, 512)  66048       activation_17[0][0]              
__________________________________________________________________________________________________
bn_3_identity_2_c (BatchNormali (None, 15, 15, 512)  2048        res_3_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_5 (Add)                     (None, 15, 15, 512)  0           bn_3_identity_2_c[0][0]          
                                                                 activation_15[0][0]              
__________________________________________________________________________________________________
activation_18 (Activation)      (None, 15, 15, 512)  0           add_5[0][0]                      
__________________________________________________________________________________________________
res_4_conv_a (Conv2D)           (None, 15, 15, 256)  131328      activation_18[0][0]              
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 7, 7, 256)    0           res_4_conv_a[0][0]               
__________________________________________________________________________________________________
bn_4_conv_a (BatchNormalization (None, 7, 7, 256)    1024        max_pooling2d_5[0][0]            
__________________________________________________________________________________________________
activation_19 (Activation)      (None, 7, 7, 256)    0           bn_4_conv_a[0][0]                
__________________________________________________________________________________________________
res_4_conv_b (Conv2D)           (None, 7, 7, 256)    590080      activation_19[0][0]              
__________________________________________________________________________________________________
bn_4_conv_b (BatchNormalization (None, 7, 7, 256)    1024        res_4_conv_b[0][0]               
__________________________________________________________________________________________________
activation_20 (Activation)      (None, 7, 7, 256)    0           bn_4_conv_b[0][0]                
__________________________________________________________________________________________________
res_4_conv_copy (Conv2D)        (None, 15, 15, 1024) 525312      activation_18[0][0]              
__________________________________________________________________________________________________
res_4_conv_c (Conv2D)           (None, 7, 7, 1024)   263168      activation_20[0][0]              
__________________________________________________________________________________________________
max_pooling2d_6 (MaxPooling2D)  (None, 7, 7, 1024)   0           res_4_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_4_conv_c (BatchNormalization (None, 7, 7, 1024)   4096        res_4_conv_c[0][0]               
__________________________________________________________________________________________________
bn_4_conv_copy (BatchNormalizat (None, 7, 7, 1024)   4096        max_pooling2d_6[0][0]            
__________________________________________________________________________________________________
add_6 (Add)                     (None, 7, 7, 1024)   0           bn_4_conv_c[0][0]                
                                                                 bn_4_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_21 (Activation)      (None, 7, 7, 1024)   0           add_6[0][0]                      
__________________________________________________________________________________________________
res_4_identity_1_a (Conv2D)     (None, 7, 7, 256)    262400      activation_21[0][0]              
__________________________________________________________________________________________________
bn_4_identity_1_a (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_22 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_4_identity_1_b (Conv2D)     (None, 7, 7, 256)    590080      activation_22[0][0]              
__________________________________________________________________________________________________
bn_4_identity_1_b (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_23 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_4_identity_1_c (Conv2D)     (None, 7, 7, 1024)   263168      activation_23[0][0]              
__________________________________________________________________________________________________
bn_4_identity_1_c (BatchNormali (None, 7, 7, 1024)   4096        res_4_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_7 (Add)                     (None, 7, 7, 1024)   0           bn_4_identity_1_c[0][0]          
                                                                 activation_21[0][0]              
__________________________________________________________________________________________________
activation_24 (Activation)      (None, 7, 7, 1024)   0           add_7[0][0]                      
__________________________________________________________________________________________________
res_4_identity_2_a (Conv2D)     (None, 7, 7, 256)    262400      activation_24[0][0]              
__________________________________________________________________________________________________
bn_4_identity_2_a (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_25 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_4_identity_2_b (Conv2D)     (None, 7, 7, 256)    590080      activation_25[0][0]              
__________________________________________________________________________________________________
bn_4_identity_2_b (BatchNormali (None, 7, 7, 256)    1024        res_4_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_26 (Activation)      (None, 7, 7, 256)    0           bn_4_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_4_identity_2_c (Conv2D)     (None, 7, 7, 1024)   263168      activation_26[0][0]              
__________________________________________________________________________________________________
bn_4_identity_2_c (BatchNormali (None, 7, 7, 1024)   4096        res_4_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_8 (Add)                     (None, 7, 7, 1024)   0           bn_4_identity_2_c[0][0]          
                                                                 activation_24[0][0]              
__________________________________________________________________________________________________
activation_27 (Activation)      (None, 7, 7, 1024)   0           add_8[0][0]                      
__________________________________________________________________________________________________
res_5_conv_a (Conv2D)           (None, 7, 7, 512)    524800      activation_27[0][0]              
__________________________________________________________________________________________________
max_pooling2d_7 (MaxPooling2D)  (None, 3, 3, 512)    0           res_5_conv_a[0][0]               
__________________________________________________________________________________________________
bn_5_conv_a (BatchNormalization (None, 3, 3, 512)    2048        max_pooling2d_7[0][0]            
__________________________________________________________________________________________________
activation_28 (Activation)      (None, 3, 3, 512)    0           bn_5_conv_a[0][0]                
__________________________________________________________________________________________________
res_5_conv_b (Conv2D)           (None, 3, 3, 512)    2359808     activation_28[0][0]              
__________________________________________________________________________________________________
bn_5_conv_b (BatchNormalization (None, 3, 3, 512)    2048        res_5_conv_b[0][0]               
__________________________________________________________________________________________________
activation_29 (Activation)      (None, 3, 3, 512)    0           bn_5_conv_b[0][0]                
__________________________________________________________________________________________________
res_5_conv_copy (Conv2D)        (None, 7, 7, 2048)   2099200     activation_27[0][0]              
__________________________________________________________________________________________________
res_5_conv_c (Conv2D)           (None, 3, 3, 2048)   1050624     activation_29[0][0]              
__________________________________________________________________________________________________
max_pooling2d_8 (MaxPooling2D)  (None, 3, 3, 2048)   0           res_5_conv_copy[0][0]            
__________________________________________________________________________________________________
bn_5_conv_c (BatchNormalization (None, 3, 3, 2048)   8192        res_5_conv_c[0][0]               
__________________________________________________________________________________________________
bn_5_conv_copy (BatchNormalizat (None, 3, 3, 2048)   8192        max_pooling2d_8[0][0]            
__________________________________________________________________________________________________
add_9 (Add)                     (None, 3, 3, 2048)   0           bn_5_conv_c[0][0]                
                                                                 bn_5_conv_copy[0][0]             
__________________________________________________________________________________________________
activation_30 (Activation)      (None, 3, 3, 2048)   0           add_9[0][0]                      
__________________________________________________________________________________________________
res_5_identity_1_a (Conv2D)     (None, 3, 3, 512)    1049088     activation_30[0][0]              
__________________________________________________________________________________________________
bn_5_identity_1_a (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_1_a[0][0]         
__________________________________________________________________________________________________
activation_31 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_1_a[0][0]          
__________________________________________________________________________________________________
res_5_identity_1_b (Conv2D)     (None, 3, 3, 512)    2359808     activation_31[0][0]              
__________________________________________________________________________________________________
bn_5_identity_1_b (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_1_b[0][0]         
__________________________________________________________________________________________________
activation_32 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_1_b[0][0]          
__________________________________________________________________________________________________
res_5_identity_1_c (Conv2D)     (None, 3, 3, 2048)   1050624     activation_32[0][0]              
__________________________________________________________________________________________________
bn_5_identity_1_c (BatchNormali (None, 3, 3, 2048)   8192        res_5_identity_1_c[0][0]         
__________________________________________________________________________________________________
add_10 (Add)                    (None, 3, 3, 2048)   0           bn_5_identity_1_c[0][0]          
                                                                 activation_30[0][0]              
__________________________________________________________________________________________________
activation_33 (Activation)      (None, 3, 3, 2048)   0           add_10[0][0]                     
__________________________________________________________________________________________________
res_5_identity_2_a (Conv2D)     (None, 3, 3, 512)    1049088     activation_33[0][0]              
__________________________________________________________________________________________________
bn_5_identity_2_a (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_2_a[0][0]         
__________________________________________________________________________________________________
activation_34 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_2_a[0][0]          
__________________________________________________________________________________________________
res_5_identity_2_b (Conv2D)     (None, 3, 3, 512)    2359808     activation_34[0][0]              
__________________________________________________________________________________________________
bn_5_identity_2_b (BatchNormali (None, 3, 3, 512)    2048        res_5_identity_2_b[0][0]         
__________________________________________________________________________________________________
activation_35 (Activation)      (None, 3, 3, 512)    0           bn_5_identity_2_b[0][0]          
__________________________________________________________________________________________________
res_5_identity_2_c (Conv2D)     (None, 3, 3, 2048)   1050624     activation_35[0][0]              
__________________________________________________________________________________________________
bn_5_identity_2_c (BatchNormali (None, 3, 3, 2048)   8192        res_5_identity_2_c[0][0]         
__________________________________________________________________________________________________
add_11 (Add)                    (None, 3, 3, 2048)   0           bn_5_identity_2_c[0][0]          
                                                                 activation_33[0][0]              
__________________________________________________________________________________________________
activation_36 (Activation)      (None, 3, 3, 2048)   0           add_11[0][0]                     
__________________________________________________________________________________________________
Averagea_Pooling (AveragePoolin (None, 1, 1, 2048)   0           activation_36[0][0]              
__________________________________________________________________________________________________
flatten (Flatten)               (None, 2048)         0           Averagea_Pooling[0][0]           
__________________________________________________________________________________________________
Dense_final (Dense)             (None, 6)            12294       flatten[0][0]                    
==================================================================================================
Total params: 19,952,262
Trainable params: 19,909,894
Non-trainable params: 42,368
__________________________________________________________________________________________________

Compile And Train Deep Learning ModelΒΆ

InΒ [26]:
# Compile the model
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
InΒ [27]:
# Using early stopping to exit training if validation loss is not decreasing even after certain epochs (patience)
earlystopping = EarlyStopping(monitor = 'val_loss', mode = 'min', verbose=1, patience = 15)

# Save the best model with lower validation loss
checkpointer = ModelCheckpoint(filepath = "weights.hdf5", verbose = 1, save_best_only = True)
InΒ [Β ]:
# I already have a pre-trained model, no need to run this
# # Fit the model
# history = model.fit_generator(train_generator, steps_per_epoch= train_generator.n // 32, epochs = 1, validation_data= validation_generator, validation_steps= validation_generator.n // 32, callbacks=[checkpointer , earlystopping])

Model EvaluationΒΆ

InΒ [28]:
# Load the model weight
model.load_weights('weights.hdf5')
InΒ [29]:
# Evaluate the performance of the model
evaluate = model.evaluate_generator(test_generator, steps = test_generator.n // 32, verbose =1)
WARNING:tensorflow:From <ipython-input-29-7406c8fc9e76>:2: Model.evaluate_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.evaluate, which supports generators.
86/86 [==============================] - 452s 5s/step - loss: 0.3857 - accuracy: 0.8608
InΒ [30]:
# Assign label names to the corresponding indexes
labels = {0: 'buildings', 1: 'forest', 2: 'glacier', 3:'mountain', 4: 'sea', 5:'street'}
InΒ [31]:
# Import library
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

# import cv2
# Create empty list
prediction = []
original = []
image = []

# Define count
count = 0

# load images and their predictions 
for i in os.listdir('./seg_test'):
  for item in os.listdir(os.path.join('./seg_test', i)):
    
    # Code to open the image
    img= PIL.Image.open(os.path.join('./seg_test', i, item))
    
    # Resizing the image to (256,256)
    img = img.resize((256, 256))
    
    # Appending image to the image list
    image.append(img)
    
    # Converting image to array
    img = np.asarray(img, dtype = np.float32)
    
    # Normalizing the image
    img = img / 255
    
    # Reshaping the image into a 4D array
    img = img.reshape(-1, 256, 256, 3)
    
    # Making prediction of the model
    predict = model.predict(img)
    
    # Getting the index corresponding to the highest value in the prediction
    predict = np.argmax(predict)
    
    # Appending the predicted class to the list
    prediction.append(labels[predict])
    
    # Appending original class to the list
    original.append(i)
InΒ [32]:
# Get the test accuracy 
score = accuracy_score(original, prediction)

# Show test acuuracy
print("Test Accuracy : {}".format(score))
Test Accuracy : 0.8573497465604635
InΒ [33]:
# Visualize the results
import random
fig = plt.figure(figsize = (100,100))
for i in range(20):
    j = random.randint(0, len(image))
    fig.add_subplot(20, 1, i+1)
    plt.xlabel("Prediction: " + prediction[j] +"   Original: " + original[j])
    plt.imshow(image[j])
fig.tight_layout()
plt.show()
InΒ [34]:
# Show classification report
print(classification_report(np.asarray(prediction), np.asarray(original)))
              precision    recall  f1-score   support

   buildings       0.92      0.70      0.80       261
      forest       0.97      0.96      0.96       478
     glacier       0.68      0.88      0.77       424
    mountain       0.93      0.72      0.81       678
         sea       0.87      0.93      0.90       481
      street       0.84      0.95      0.89       440

    accuracy                           0.86      2762
   macro avg       0.87      0.86      0.85      2762
weighted avg       0.87      0.86      0.86      2762

InΒ [35]:
# Show confusion matrix
plt.figure(figsize = (20, 20))
cm = confusion_matrix(np.asarray(prediction), np.asarray(original))
sns.heatmap(cm, annot = True)
plt.show()

Visualize Activation Maps Through Grad-CamΒΆ

InΒ [36]:
def grad_cam(img):

  # Convert the image to array of type float32
  img = np.asarray(img, dtype = np.float32)

  # Reshape the image from (256,256,3) to (1,256,256,3)
  img = img.reshape(-1, 256, 256, 3)
  img_scaled = img / 255

  # Name of the average pooling layer and dense final (you can see these names in the model summary)
  classification_layers = ["Averagea_Pooling", "Dense_final"]

  # Last convolutional layer in the model
  final_conv = model.get_layer("res_5_identity_2_c")

  # Create a model with original model inputs and the last conv_layer as the output
  final_conv_model = keras.Model(model.inputs, final_conv.output)

  # Then we create the input for classification layer, which is the output of last conv layer
  # In our case, output produced by the conv layer is of the shape (1,3,3,2048) 
  # Since the classification input needs the features as input, we ignore the batch dimension

  classification_input = keras.Input(shape = final_conv.output.shape[1:])

  # We iterate through the classification layers, to get the final layer and then append 
  # the layer as the output layer to the classification model.
  temp = classification_input
  for layer in classification_layers:
      temp = model.get_layer(layer)(temp)
  classification_model = keras.Model(classification_input, temp)


  # We use gradient tape to monitor the 'final_conv_output' to retrive the gradients
  # corresponding to the predicted class
  with tf.GradientTape() as tape:
      # Pass the image through the base model and get the feature map 
      final_conv_output = final_conv_model(img_scaled)

      # Assign gradient tape to monitor the conv_output
      tape.watch(final_conv_output)
      
      # Pass the feature map through the classification model and use argmax to get the 
      # index of the predicted class and then use the index to get the value produced by final
      # layer for that class
      prediction = classification_model(final_conv_output)

      predicted_class = tf.argmax(prediction[0][0][0])

      predicted_class_value = prediction[:,:,:,predicted_class]
  
  # Get the gradient corresponding to the predicted class based on feature map.
  # which is of shape (1,3,3,2048)
  gradient = tape.gradient(predicted_class_value, final_conv_output)

  # Since we need the filter values (2048), we reduce the other dimensions, 
  # which would result in a shape of (2048,)
  gradient_channels = tf.reduce_mean(gradient, axis=(0, 1, 2))

  # We then convert the feature map produced by last conv layer(1,6,6,1536) to (6,6,1536)
  final_conv_output = final_conv_output.numpy()[0]

  gradient_channels = gradient_channels.numpy()

  # We multiply the filters in the feature map produced by final conv layer by the 
  # filter values that are used to get the predicted class. By doing this we inrease the
  # value of areas that helped in making the prediction and lower the vlaue of areas, that 
  # did not contribute towards the final prediction
  for i in range(gradient_channels.shape[-1]):
      final_conv_output[:, :, i] *= gradient_channels[i]

  # We take the mean accross the channels to get the feature map
  heatmap = np.mean(final_conv_output, axis=-1)

  # Normalizing the heat map between 0 and 1, to visualize it
  heatmap_normalized = np.maximum(heatmap, 0) / np.max(heatmap)

  # Rescaling and converting the type to int
  heatmap = np.uint8(255 * heatmap_normalized )

  # Create the colormap
  color_map = plt.cm.get_cmap('jet')

  # Get only the rb features from the heatmap
  color_map = color_map(np.arange(256))[:, :3]
  heatmap = color_map[heatmap]

  # Convert the array to image, resize the image and then convert to array
  heatmap = keras.preprocessing.image.array_to_img(heatmap)
  heatmap = heatmap.resize((256, 256))
  heatmap = np.asarray(heatmap, dtype = np.float32)

  # Add the heatmap on top of the original image
  final_img = heatmap * 0.4 + img[0]
  final_img = keras.preprocessing.image.array_to_img(final_img)

  return final_img, heatmap_normalized
InΒ [37]:
# Visualize the images in the dataset
import random
fig, axs = plt.subplots(6,3, figsize = (16,32))
count = 0
for _ in range(6):
  i = random.randint(0, len(image))
  gradcam, heatmap = grad_cam(image[i])
  axs[count][0].title.set_text("Original -" + original[i])
  axs[count][0].imshow(image[i])
  axs[count][1].title.set_text("Heatmap") 
  axs[count][1].imshow(heatmap)
  axs[count][2].title.set_text("Prediction -" + prediction[i]) 
  axs[count][2].imshow(gradcam)  
  count += 1

fig.tight_layout()
plt.show()

ConclusionΒΆ

It can be observed at the classification report that glaciers has the highest error in prediction possible reason is that glaciers could be look like a mountain that even a human could the same mistake as well. Grad-Cam can really show how the machine think by showing where portion in the photos its focusing on. The model obtained a high score in accuracy but this could be further improve by tuning the model or more experimentation in image augmentation.