使用可视化方法理解卷积神经网络（CNN）的指南

Introduction

“你的神经网络是如何产生这种结果的？”这个问题让许多数据科学家陷入了困境。很容易解释一个简单的神经网络如何工作，但是当你在一个计算机视觉项目中增加1000x层时会发生什么？

我们的客户或最终用户需要可解释性 - 他们想知道我们的模型如何达到最终结果。我们不能用笔和纸来解释深度神经网络是如何工作的。那么我们如何摆脱这种神经网络的“黑匣子”形象呢？

通过可视化他们！可视化神经网络的不同特征所带来的清晰度是无与伦比的。当我们处理在数千和数百万图像上训练的卷积神经网络（CNN）时尤其如此。
在这里插入图片描述

在本文中，我们将研究用于可视化卷积神经网络的不同技术。此外，我们还将致力于从这些可视化中提取洞察力，以调整我们的CNN模型。

为什么我们应该使用可视化来解码神经网络？
设置模型体系结构
访问CNN的各个层
过滤器 - 可视化CNN的构建块
激活最大化 - 可视化模型的期望
遮挡贴图 - 可视化输入中的重要内容
显着性图 - 可视化输入特征的贡献
类激活映射
分层输出可视化 - 可视化过程

Why Should we use Visualization to Decode Neural Networks?

这是一个公平的问题。有很多方法可以理解神经网络是如何工作的，那么为什么要转向可视化的无人机路径呢？

让我们通过一个例子来回答这个问题。考虑一个我们需要对动物图像进行分类的项目，如雪豹和阿拉伯豹。直觉上，我们可以使用图像背景来区分这些动物，对吧？

这两种动物都生活在形成鲜明对比的栖息地。大多数雪豹图像背景都会有雪，而大多数阿拉伯豹图像都会有一片庞大的沙漠。
在这里插入图片描述

这就是问题 - 该模型将开始对雪与沙漠图像进行分类。那么，我们如何确保我们的模型正确地学习了这两种豹子类型之间的区别特征呢？答案在于可视化的形式。

可视化有助于我们了解哪些功能可以指导模型对图像进行分类的决定。

有多种方法可视化模型，我们将尝试在本文中实现其中的一些方法。

Setting up the Model Architecture

我认为最好的学习方式是编写概念。因此，这是一个非常实用的指南，我将立即深入研究Python代码。

我们将在本文的ImageNet数据集中使用VGG16架构和预训练权重。让我们首先将模型导入我们的程序并理解其架构。

我们将使用Keras中的’model.summary（）'函数可视化模型体系结构。在我们进入模型构建部分之前，这是非常重要的一步。我们需要确保输入和输出形状与我们的问题陈述相匹配，因此我们可视化模型摘要。

#importing required modules
from keras.applications import VGG16
#loading the saved model
#we are using the complete architecture thus include_top=True
model = VGG16(weights='imagenet',include_top=True)
#show the summary of model
model.summary()

以下是上述代码生成的模型摘要：

在这里插入图片描述

在这里插入图片描述
我们有详细的模型架构以及每层的可训练参数数量。我希望您花一些时间浏览上面的输出，以了解我们手头的情况。

当我们仅训练模型层的一个子集（特征提取）时，这很重要。我们可以生成模型摘要，并确保不可训练参数的数量与我们不想训练的层匹配。

此外，我们可以使用可训练参数的总数来检查我们的GPU是否能够为训练模型分配足够的内存。对于我们大多数在个人机器上工作的人来说，这是一个熟悉的挑战！

Accessing Individual Layers

现在我们知道如何获得模型的整体架构，让我们深入探讨并尝试探索单个层。

实际上，访问Keras模型的各个层并提取与每个层相关的参数非常容易。这包括图层权重和其他信息，如过滤器的数量。

现在，我们将创建将图层名称映射到其相应特征和图层权重的字典：

#creating a mapping of layer name ot layer details 
#we will create a dictionary layers_info which maps a layer name to its charcteristics
layers_info = {}
for i in model.layers:
    layers_info[i.name] = i.get_config()

#here the layer_weights dictionary will map every layer_name to its corresponding weights
layer_weights = {}
for i in model.layers:
    layer_weights[i.name] = i.get_weights()

print(layers_info['block5_conv1'])

上面的代码给出了以下输出，它由block5_conv1层的不同参数组成：

{'name': 'block5_conv1',
 'trainable': True,
 'filters': 512,
 'kernel_size': (3, 3),
 'strides': (1, 1),
 'padding': 'same',
 'data_format': 'channels_last',
 'dilation_rate': (1, 1),
 'activation': 'relu',
 'use_bias': True,
 'kernel_initializer': {'class_name': 'VarianceScaling',
  'config': {'scale': 1.0,
   'mode': 'fan_avg',
   'distribution': 'uniform',
   'seed': None}},
 'bias_initializer': {'class_name': 'Zeros', 'config': {}},
 'kernel_regularizer': None,
 'bias_regularizer': None,
 'activity_regularizer': None,
 'kernel_constraint': None,
 'bias_constraint': None}

你注意到我们图层’block5_conv1’的可训练参数是真的吗？这意味着我们可以通过进一步训练模型来更新图层权重。

Visualizing the Building Blocks of CNNs – Filters

过滤器是任何卷积神经网络的基本构建块。不同的滤镜从图像中提取不同类型的特征。以下GIF非常清楚地说明了这一点：

在这里插入图片描述

如您所见，每个卷积层都由多个过滤器组成。查看我们在上一节中生成的输出 - 'block5_conv1’层由512个过滤器组成。有道理，对吗？

让我们绘制每个VGG16块的第一个卷积层的第一个滤波器：

layers = model.layers
layer_ids = [1,4,7,11,15]
#plot the filters
fig,ax = plt.subplots(nrows=1,ncols=5)
for i in range(5):
    ax[i].imshow(layers[layer_ids[i]].get_weights()[0][:,:,:,0][:,:,0],cmap='gray')
    ax[i].set_title('block'+str(i+1))
    ax[i].set_xticks([])
    ax[i].set_yticks([])

在这里插入图片描述

我们可以在上面的输出中看到不同层的过滤器。 由于VGG16仅使用3×3滤波器，因此所有滤波器都具有相同的形状。

Visualizing what a Model Expects – Activation Maximization

让我们使用下面的图片来理解激活最大化的概念：

在这里插入图片描述

您觉得哪些功能对模型识别大象很重要？我能想到的一些主要问题：

象牙
树干
耳朵

这就是我们本能地识别大象的方式，对吗？现在，让我们看看当我们尝试优化随机图像被分类为大象时，我们得到了什么。

我们知道CNN中的每个卷积层都在前一层的输出中查找类似的模式。当输入由它正在寻找的模式组成时，卷积层的激活被最大化。
在激活最大化技术中，我们更新每层的输入，以便最大限度地减少激活最大化损失。

我们如何做到这一点？我们计算相对于输入的激活损耗的梯度，然后相应地更新输入：
在这里插入图片描述

这是执行此操作的代码：

#importing the required modules
from vis.visualization import visualize_activation
from vis.utils import utils
from keras import activations
from keras import applications
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (18,6)
#creating a VGG16 model using fully connected layers also because then we can 
#visualize the patterns for individual category
from keras.applications import VGG16
model = VGG16(weights='imagenet',include_top=True)

#finding out the layer index using layer name
#the find_layer_idx function accepts the model and name of layer as parameters and return the index of respective layer
layer_idx = utils.find_layer_idx(model,'predictions')
#changing the activation of the layer to linear
model.layers[layer_idx].activation = activations.linear
#applying modifications to the model
model = utils.apply_modifications(model)
#Indian elephant
img3 = visualize_activation(model,layer_idx,filter_indices=385,max_iter=5000,verbose=True)
plt.imshow(img3)

我们的模型使用对应于印度大象的类的随机输入生成以下输出：
在这里插入图片描述

从上面的图像中，我们可以观察到该模型需要像牙齿，大眼睛和树干这样的结构。现在，这些信息对于我们检查数据集的完整性非常重要。例如，假设该模型侧重于背景中的树木或长草等特征，因为印度大象通常存在于此类栖息地中。

然后，使用激活最大化，我们可以发现我们的数据集可能不足以完成任务，我们需要将不同栖息地的大象图像添加到我们的训练集中。

Visualizing what’s Important in the Input- Occlusion Maps

激活最大化用于可视化模型在图像中的期望。另一方面，遮挡贴图帮助我们找出图像的哪个部分对于模型很重要。

现在，为了了解遮挡贴图的工作原理，我们考虑根据制造商对汽车进行分类的模型，如丰田，奥迪等：

在这里插入图片描述

你能弄清楚哪家公司生产上述车吗？可能不是因为放置公司徽标的部分已被遮挡在图像中。对于我们的分类目的，图像的这一部分显然很重要。

类似地，为了生成遮挡贴图，我们遮挡图像的某些部分，然后计算其属于类的概率。如果概率降低，则意味着图像的被遮挡部分对于该类是重要的。否则，这并不重要。

在这里，我们将概率分配为图像的每个部分的像素值，然后将它们标准化以生成热图：

import numpy as np

from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation, Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.preprocessing.image import ImageDataGenerator
from keras.activations import relu

%matplotlib inline
import matplotlib.pyplot as plt
def iter_occlusion(image, size=8):

    occlusion = np.full((size * 5, size * 5, 1), [0.5], np.float32)
    occlusion_center = np.full((size, size, 1), [0.5], np.float32)
    occlusion_padding = size * 2

    # print('padding...')
    image_padded = np.pad(image, ( \
                        (occlusion_padding, occlusion_padding), (occlusion_padding, occlusion_padding), (0, 0) \
                        ), 'constant', constant_values = 0.0)

    for y in range(occlusion_padding, image.shape[0] + occlusion_padding, size):

        for x in range(occlusion_padding, image.shape[1] + occlusion_padding, size):
            tmp = image_padded.copy()

            tmp[y - occlusion_padding:y + occlusion_center.shape[0] + occlusion_padding, \
                x - occlusion_padding:x + occlusion_center.shape[1] + occlusion_padding] \
                = occlusion

            tmp[y:y + occlusion_center.shape[0], x:x + occlusion_center.shape[1]] = occlusion_center

            yield x - occlusion_padding, y - occlusion_padding, \
                  tmp[occlusion_padding:tmp.shape[0] - occlusion_padding, occlusion_padding:tmp.shape[1] - occlusion_padding]

上面的代码定义了一个函数iter_occlusion，它返回一个具有不同屏蔽部分的图像。

现在，让我们导入图像并绘制它：

from keras.preprocessing.image import load_img
# load an image from file
image = load_img('car.jpeg', target_size=(224, 224))
plt.imshow(image)
plt.title('ORIGINAL IMAGE')

在这里插入图片描述

现在，我们将遵循三个步骤：

预处理此图像
计算不同遮罩部分的概率
绘制热图

from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# predict the probability across all output classes
yhat = model.predict(image)
temp = image[0]
print(temp.shape)
heatmap = np.zeros((224,224))
correct_class = np.argmax(yhat)
for n,(x,y,image) in enumerate(iter_occlusion(temp,14)):
    heatmap[x:x+14,y:y+14] = model.predict(image.reshape((1, image.shape[0], image.shape[1], image.shape[2])))[0][correct_class]
    print(x,y,n,' - ',image.shape)
heatmap1 = heatmap/heatmap.max()
plt.imshow(heatmap)

在这里插入图片描述

非常有意思。我们现在将使用标准化的热图概率创建一个掩模并绘制它：

import skimage.io as io
#creating mask from the standardised heatmap probabilities
mask = heatmap1 < 0.85
mask1 = mask *256
mask = mask.astype(int)
io.imshow(mask,cmap='gray')

在这里插入图片描述

最后，我们将掩码强加在输入图像上并绘制：

import cv2
#read the image
image = cv2.imread('car.jpeg')
image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
#resize image to appropriate dimensions
image = cv2.resize(image,(224,224))
mask = mask.astype('uint8')
#apply the mask to the image
final = cv2.bitwise_and(image,image,mask = mask)
final = cv2.cvtColor(final,cv2.COLOR_BGR2RGB)
#plot the final image
plt.imshow(final)

在这里插入图片描述

你能猜到为什么我们只看到某些部分吗？这是正确的 - 只有输入图像中对其输出类概率有重大贡献的那些部分才是可见的。简而言之，这就是遮挡地图的全部内容。

Visualizing the Contribution of Input Features- Saliency Maps

显着性图是另一种基于梯度的可视化技术。这些地图在论文中介绍– Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.

显着图计算每个像素对模型输出的影响。这涉及计算输出相对于输入图像的每个像素的梯度。

这告诉我们如何根据输入图像像素的微小变化输出类别变化。渐变的所有正值意味着像素值的微小变化将增加输出值：
在这里插入图片描述

这些梯度与图像形状相同（梯度是根据每个像素计算的），为我们提供了注意力的直觉。

让我们看看如何为任何图像生成显着性图。首先，我们将使用以下代码段读取输入图像。
在这里插入图片描述

现在，我们将使用VGG16模型为图像生成显着性图：

# Utility to search for layer index by name. 
# Alternatively we can specify this as -1 since it corresponds to the last layer.
layer_idx = utils.find_layer_idx(model, 'predictions')

# Swap softmax with linear
model.layers[layer_idx].activation = activations.linear
model = utils.apply_modifications(model)

#generating saliency map with unguided backprop
grads1 = visualize_saliency(model, layer_idx,filter_indices=None,seed_input=image)
#plotting the unguided saliency map
plt.imshow(grads1,cmap='jet')

在这里插入图片描述

我们看到该模型更侧重于狗的面部部分。现在，让我们看看导向反向传播的结果：

#generating saliency map with guided backprop
grads2 =  visualize_saliency(model, layer_idx,filter_indices=None,seed_input=image,backprop_modifier='guided')
#plotting the saliency map as heatmap
plt.imshow(grads2,cmap='jet')

在这里插入图片描述

引导反向传播将所有负梯度截断为0，这意味着仅更新对类概率具有正影响的像素。

Class Activation Maps (Gradient Weighted)

类激活图也是一种神经网络可视化技术，它基于根据激活图的梯度或它们对输出的贡献来权衡激活图的想法。

以下摘自Grad-CAM论文给出了该技术的要点：

梯度加权类激活映射（Grad-CAM），使用任何目标概念的梯度（比如’dog’或甚至标题的logits），流入最终的卷积层以生成粗略的定位图，突出显示重要区域在图像中预测概念。

本质上，我们采用最终卷积层的特征映射，并使用相对于特征映射的输出的梯度对每个滤波器进行加权（乘）。 Grad-CAM涉及以下步骤：

获取最终卷积层的输出要素图。对于VGG16，此功能图的形状为14x14x512
计算输出相对于要素图的梯度
将全局平均池应用于渐变
将要素图与相应的合并渐变相乘

我们可以在下面看到输入图像及其对应的类激活图：

在这里插入图片描述

现在让我们为上面的图像生成类激活映射。

在这里插入图片描述

Visualizing the Process – Layerwise Output Visualization

CNN的起始层通常寻找像边缘这样的低级特征。随着我们深入到模型中，功能也会发生变化。

可视化模型的不同层的输出有助于我们看到在相应层突出显示图像的哪些特征。此步骤对于针对我们的问题微调架构特别重要。为什么？因为我们可以看到哪些图层提供了哪种特征，然后决定我们要在模型中使用哪些图层。

例如，可视化图层输出可以帮助我们比较神经样式转移问题中不同层的性能。

让我们看看如何在VGG16模型的不同层获得输出：

#importing required libraries and functions
from keras.models import Model
#defining names of layers from which we will take the output
layer_names = ['block1_conv1','block2_conv1','block3_conv1','block4_conv2']
outputs = []
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
#extracting the output and appending to outputs
for layer_name in layer_names:
    intermediate_layer_model = Model(inputs=model.input,outputs=model.get_layer(layer_name).output)
    intermediate_output = intermediate_layer_model.predict(image)
    outputs.append(intermediate_output)
#plotting the outputs
fig,ax = plt.subplots(nrows=4,ncols=5,figsize=(20,20))

for i in range(4):
    for z in range(5):
        ax[i][z].imshow(outputs[i][0,:,:,z])
        ax[i][z].set_title(layer_names[i])
        ax[i][z].set_xticks([])
        ax[i][z].set_yticks([])
plt.savefig('layerwise_output.jpg')

在这里插入图片描述

上图显示了VGG16的每一层从图像中提取的不同特征（块5除外）。我们可以看到起始层对应于边缘等低级特征，而后面的层则看到汽车的车顶，排气等特征。

End Notes

可视化永远不会让我感到惊讶。有多种方法可以理解技术的工作原理，但可视化它可以使它变得更加有趣。以下是您应该查看的几个资源：

神经网络中的特征提取过程是一个活跃的研究领域，并且已经开发出了像Tensorspace这样的令人敬畏的工具。Activation Atlases
TensorSpace也是一种支持多种模型格式的神经网络可视化工具。它允许您加载模型并以交互方式对其进行可视化。 TensorSpace还有一个playground，可以使用多种架构进行可视化，您可以随意使用

原文链接：https://blog.csdn.net/weixin_41697507/article/details/89917442

Introduction

Table Of Contents

Why Should we use Visualization to Decode Neural Networks?

Setting up the Model Architecture

Accessing Individual Layers

Visualizing the Building Blocks of CNNs – Filters

Visualizing what a Model Expects – Activation Maximization

Visualizing what’s Important in the Input- Occlusion Maps

Visualizing the Contribution of Input Features- Saliency Maps

Class Activation Maps (Gradient Weighted)

Visualizing the Process – Layerwise Output Visualization

End Notes