DL基础补全计划(六)---卷积和池化

本文主要是介绍DL基础补全计划(六)---卷积和池化，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

PS：要转载请注明出处，本人版权所有。

PS: 这个只是基于《我自己》的理解，

如果和你的原则及想法相冲突，请谅解，勿喷。

环境说明

Windows 10
VSCode
Python 3.8.10
Pytorch 1.8.1
Cuda 10.2

前言

本文是此基础补全计划的最终篇，因为从我的角度来说，如果前面这些基础知识都能够了解及理解，再加上本文的这篇基础知识，那么我们算是小半只脚踏入了大门。从这个时候，其实我们就已经可以做图像上的基本的分类任务了。除了分类任务，我们还有两类重要的图像任务是目标检测和图像分割，这两项任务都和分类任务有一定的关联，可以说，分类可以说是这两类的基础。

卷积神经网络是一个专门为处理图像数据的网络。下面我们简单的来看看卷积、池化的含义和怎么计算的，然后我们通过一个LeNet5的经典网络，训练一个分类模型。

卷积

卷积是一种运算，类似加减乘除。卷积是一种运算，类似加减乘除。卷积是一种运算，类似加减乘除。重要的事情说三次。

在数学上的定义是:连续n的情况 $\int f(n)g(x-n)dn$ ，离散n的情况 $\sum\limits_{n} f(n)g(x-n)$ 。从这里我们可以看到，卷积就是测量函数f和函数g的翻转且平移x后的重叠。其二维离散a,b的表达是 $\sum\limits_{a}\sum\limits_{b} f(a, b)g(x1-a, x2-b)$

卷积是一种运算，类似加减乘除。卷积是一种运算，类似加减乘除。卷积是一种运算，类似加减乘除。重要的事情再说三次。

我们再次想一想，在之前的文章中，我们普遍都建立了一种想法是，把输入数据拉成一条直线输入的，这就意味着我们在之前的任务里面只建立了相邻输入数据之间的左右关联。但是我们可以想一想，是不是所有的数据只建立左右关联就行了呢？显而易见的，并不是这样的，比如我们图片，可能上下左右4个像素加起来才是一个猫，如果我们只关联了左右，那么它可能是狗或者猫。那么我们应该通过什么样的方式来对图片像素的这种二维关联性进行描述或者计算呢？这种方法就是卷积运算。

卷积网上有许许多多的介绍，大家都做了许多详细的解答，包含信号分析、复利、概率以及图像滤波等等方面的解释。我个人认为我们可以抛开这些方面，从数据之间的关联性来看这个问题可能是最好理解的，因为我们之前只关注了数据之间左右关联，我们应该同时关注上下左右的关联才对，我们要从空间的角度来考虑数据之间的关联性。而卷积作为一种数学运算，他恰好是计算了数据的上下左右关联性，因此卷积这种数学运算很适合拿来代替之前的一条线的线性运算。

下面我们来看一下一个基本的卷积计算过程是什么样子的。

图像边缘检测实例

计算代码如下：

def corr2d(X, K): #@save"""计算⼆维互相关运算。"""h, w = K.shapeY = np.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))for i in range(Y.shape[0]):for j in range(Y.shape[1]):Y[i, j] = (X[i:i + h, j:j + w] * K).sum()return Y_X = np.ones((6, 8))
_X[0:2, 2:6] = 0
_X[3:, 2:6] = 0
print(_X)
_K = np.array([[1.0, -1.0]])
_Y = corr2d(_X, _K)
print(_Y)
_Y = corr2d(_X, _K.T)
print(_Y)

结果如图：

我们可以分别的看到，图像边缘的数值在经过我们手动构造的滤波器后，成功的检测到边缘信息。

在实际情况中，我们可能要学习边缘，角点等等特征，这个时候我们不可能手动去构造我们的滤波器，那么我们可不可以通过学习的方式把滤波器学习出来呢？下面通过实例来演示：

_X = np.ones((6, 8))
_X[0:2, 2:6] = 0
_X[3:, 2:6] = 0
print(_X)
_K = np.array([[1.0, -1.0]])
_Y = corr2d(_X, _K)
print(_Y)
# _Y = corr2d(_X, _K.T)
# print(_Y)X = torch.from_numpy(_X)
X.requires_grad = True
X = X.to(dtype=torch.float32)
X = X.reshape(1, 1, 6, 8)Y = torch.from_numpy(_Y)
Y.requires_grad = Trueconv2d = torch.nn.Conv2d(1, 1, (1, 2), bias=False)for i in range(20):y_train = conv2d(X)l = (y_train - Y)**2conv2d.zero_grad()# print(l.shape)l.backward(torch.ones_like(l))# print(conv2d.weight)with torch.no_grad():# print('grad = ', conv2d.weight.grad)conv2d.weight[:] -= 0.02 * conv2d.weight.grad# print(conv2d.weight)# print(conv2d.weight.shape)if (i + 1) % 2 == 0:print(f'batch {i+1}, loss {float(l.sum()):.3f}')print(conv2d.weight)

结果如图：

我们通过corr2d函数构造出特征Y，然后我们通过训练特征Y，我们可以看到最终卷积层的权重就是接近与1和-1，恰好等于我们构造的特殊滤波器。

这个实例说明了，我们可以通过学习的方式来学习出一个我们想要的滤波器，不需要手动构造。

此外卷积还有卷积核、步长、填充等等资料，我就不造轮子了，网上有很多大佬写的很好的，大家去看看。此外这里有个公式非常有用：N=(W-K+2P)/S+1。

池化

我们在上文知道了卷积的输出结果代表了一片上下左右数据的关联性，比如一个像素和之前的9个像素有关联，比如一个 $9 * 9$ 的图，经过一个卷积后，假如还是 $9 * 9$ ，这个时候输出的 $9 * 9$ 里面的每个像素我们已经和之前对应位置的一片像素建立了关联。但是某些时候，我们希望这种关联性聚合起来，通过求最大值或者平均等等，这就是池化的概念。以之前例子为例：卷积输出了 $9 * 9$ 的像素，经过池化之后，假如变成了 $3 * 3$ ，我们可以看到池化输出的每个像素代表之前卷积输出的 $3 * 3$ 个像素，这代表我们的信息聚集了，因为一个像素代表了上一层的多个像素。

注意池化，我们还可以从视野的角度来看待，还是和上面的例子一样，假如原图上的猫是 $9 * 9$ 的像素，经过卷积池化之后，假如变成了 $3 * 3$ ，这意味着我们从像素的角度来说，之前81个像素代表猫，现在9个像素就可以代表了，也就是之前的一个像素和现在的一个像素代表的原图视野不一样了，形成了视野放大的感觉。但是有一个缺点就是，这可能导致小目标丢失了，这个在目标检测里面会关注到。

一个经典神经网络LeNet5

在2017年12月份，我的这篇文章中《LeNet-5 论文及原理分析(笨鸟角度)》（ https://blog.csdn.net/u011728480/article/details/78799672 ）其实当时我为了学习一些基本知识，也对LeNet5的论文中网络结构部分做了细致的分析。

注意本文中的C3层和论文中的C3层不一样。本文的C3层是 $16 * 6 * (5 * 5 + 1) = 2496$ 个参数。论文原文是 $6 * (3 * 5 * 5 + 1) + 6 * (4 * 5 * 5 + 1) + 3 * (4 * 5 * 5 + 1) + 1 * (6 * 5 * 5 + 1) = 1516$ 个参数。

训练代码如下：

import numpy as np
from numpy.lib.utils import lookfor
import torchfrom torchvision.transforms import ToTensor
import os
import torch
from torch import nn
from torch.nn.modules import activation
from torch.nn.modules import linear
from torch.nn.modules.linear import Linear
from torch.utils.data import DataLoader
from torchvision import datasets, transformsimport visdomvis = visdom.Visdom(env='main')title = 'LeNet5 on ' + 'FashionMNIST'
legend = ['TrainLoss', 'TestLoss', 'TestAcc']epoch_plot_window = vis.line(        X=torch.zeros((1, 3)).cpu(),Y=torch.zeros((1, 3)).cpu(),win='epoch_win',opts=dict(xlabel='Epoch',ylabel='Loss/Acc',title=title,legend=legend))def corr2d(X, W): #@save"""计算⼆维互相关运算。"""h, w = W.shapeY = np.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))for i in range(Y.shape[0]):for j in range(Y.shape[1]):Y[i, j] = (X[i:i + h, j:j + w] * W).sum()return Ydef TrainConv2d():_X = np.ones((6, 8))_X[0:2, 2:6] = 0_X[3:, 2:6] = 0print(_X)_K = np.array([[1.0, -1.0]])_Y = corr2d(_X, _K)print(_Y)# _Y = corr2d(_X, _K.T)# print(_Y)X = torch.from_numpy(_X)X.requires_grad = TrueX = X.to(dtype=torch.float32)X = X.reshape(1, 1, 6, 8)Y = torch.from_numpy(_Y)Y.requires_grad = Trueconv2d = torch.nn.Conv2d(1, 1, (1, 2), bias=False)for i in range(20):y_train = conv2d(X)l = (y_train - Y)**2conv2d.zero_grad()# print(l.shape)l.backward(torch.ones_like(l))# print(conv2d.weight)with torch.no_grad():# print('grad = ', conv2d.weight.grad)conv2d.weight[:] -= 0.02 * conv2d.weight.grad# print(conv2d.weight)# print(conv2d.weight.shape)if (i + 1) % 2 == 0:print(f'batch {i+1}, loss {float(l.sum()):.3f}')print(conv2d.weight)class NeuralNetwork(nn.Module):def __init__(self):super(NeuralNetwork, self).__init__()self.lenet5 = nn.Sequential(# 6*28*28---->6*28*28nn.Conv2d(1, 6, (5, 5), stride=1, padding=2),nn.Sigmoid(),# 6*28*28----->6*14*14nn.AvgPool2d((2, 2), stride=2, padding=0),# 6*14*14----->16*10*10nn.Conv2d(6, 16, (5, 5), stride=1),nn.Sigmoid(),# 16*10*10------>16*5*5nn.AvgPool2d((2, 2), stride=2, padding=0),nn.Flatten(),nn.Linear(16*5*5, 1*120),nn.Sigmoid(),nn.Linear(1*120, 1*84),nn.Sigmoid(),nn.Linear(1*84, 1*10))def forward(self, x):logits = self.lenet5(x)return logitsdef LoadFashionMNISTByTorchApi():# 60000*28*28training_data = datasets.FashionMNIST(root="..\data",train=True,download=True,transform=ToTensor())# 10000*28*28test_data = datasets.FashionMNIST(root="..\data",train=False,download=True,transform=ToTensor())# labels_map = {#     0: "T-Shirt",#     1: "Trouser",#     2: "Pullover",#     3: "Dress",#     4: "Coat",#     5: "Sandal",#     6: "Shirt",#     7: "Sneaker",#     8: "Bag",#     9: "Ankle Boot",# }# figure = plt.figure(figsize=(8, 8))# cols, rows = 3, 3# for i in range(1, cols * rows + 1):#     sample_idx = torch.randint(len(training_data), size=(1,)).item()#     img, label = training_data[sample_idx]#     figure.add_subplot(rows, cols, i)#     plt.title(labels_map[label])#     plt.axis("off")#     plt.imshow(img.squeeze(), cmap="gray")# plt.show()return training_data, test_datadef train_loop(dataloader, model, loss_fn, optimizer):size = len(dataloader.dataset)num_batches = len(dataloader)loss_sum = 0for batch, (X, y) in enumerate(dataloader):# move X, y to gpuif torch.cuda.is_available():X = X.to('cuda')y = y.to('cuda')# Compute prediction and losspred = model(X)loss = loss_fn(pred, y)# Backpropagationoptimizer.zero_grad()loss.backward()optimizer.step()loss_sum += loss.item()if batch % 100 == 0:loss1, current = loss.item(), batch * len(X)print(f"loss: {loss1:>7f}  [{current:>5d}/{size:>5d}]")return loss_sum/num_batchesdef test_loop(dataloader, model, loss_fn):size = len(dataloader.dataset)num_batches = len(dataloader)test_loss, correct = 0, 0with torch.no_grad():for X, y in dataloader:# move X, y to gpuif torch.cuda.is_available():X = X.to('cuda')y = y.to('cuda')pred = model(X)test_loss += loss_fn(pred, y).item()correct += (pred.argmax(1) == y).type(torch.float).sum().item()test_loss /= num_batchescorrect /= sizeprint(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")return test_loss, correctif __name__ == '__main__':# TrainConv2d()device = 'cuda' if torch.cuda.is_available() else 'cpu'print('Using {} device'.format(device))def init_weights(m):if type(m) == nn.Linear or type(m) == nn.Conv2d:nn.init.xavier_uniform_(m.weight)model = NeuralNetwork()model.apply(init_weights)model = model.to(device)print(model)batch_size = 200learning_rate = 0.9training_data, test_data = LoadFashionMNISTByTorchApi()train_dataloader = DataLoader(training_data, batch_size, shuffle=True)test_dataloader = DataLoader(test_data, batch_size, shuffle=True)loss_fn = nn.CrossEntropyLoss()optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)epochs = 1000model.train()for t in range(epochs):print(f"Epoch {t+1}\n-------------------------------")train_loss = train_loop(train_dataloader, model, loss_fn, optimizer)test_loss, test_acc = test_loop(test_dataloader, model, loss_fn)vis.line(np.array([train_loss, test_loss, test_acc]).reshape(1, 3), np.ones((1, 3))*t, win='epoch_win', update=None if t == 0 else 'append', opts=dict(xlabel='Epoch',ylabel='Loss/Acc',title=title,legend=legend))print("Done!")# only save paramtorch.save(model.state_dict(), 'lenet5.pth')# save param and nettorch.save(model, 'lenet5-all.pth')# export onnxinput_image = torch.zeros((1,1,28,28))input_image = input_image.to(device)torch.onnx.export(model, input_image, 'model.onnx')

结果如图：

我们从训练可视化界面上可以看到，我们的模型确实是收敛了，但是不幸的是准确率大概有90%左右，而且存在过拟合现象。注意这里我们这个模型，由于有Sigmoid层，导致了很容易出现梯度消失的情况，为了加快训练，所以学习率设置的很大。

后记

整理本系列的基础知识的原因是需要加深对深度学习的理解。同时跟着参考资料，重复试验，重复运行。对我个人而言，只有真实的写了代码之后，才能够理解的更加透彻。

本文也是此系列的终篇，以后更新随缘。

参考文献

https://github.com/d2l-ai/d2l-zh/releases (V1.0.0)
https://github.com/d2l-ai/d2l-zh/releases (V2.0.0 alpha1)
https://blog.csdn.net/u011728480/article/details/78799672 （《LeNet-5 论文及原理分析(笨鸟角度)》）

打赏、订阅、收藏、丢香蕉、硬币，请关注公众号（攻城狮的搬砖之路）