deep_learning_month4_week1_convolution_model_step_by

本文主要是介绍deep_learning_month4_week1_convolution_model_step_by_step，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

deep_learning_month4_week1_convolution_model_step_by_step

标签：机器学习深度学习

代码已上传github:
https://github.com/PerfectDemoT/my_deeplearning_homework

说明：本文解释了如何一步步建立CNN的卷积层，不过只包含一步步建立的函数，并没有形成一个能够使用的模型，如果大佬们有兴趣，也是可以自己动手将其整合为一个比较强大的模型的

本文描述了如何正向进行卷积，池化(包含最大池化以及平均池化)。以及反响传播时如何对各变量(W,b,A)求导(dA,dW,db)，以及对于池化的反向传播

deep_learning_month4_week1_convolution_model_step_by_step
- - 1. 首先让我们来看看正向求卷积的操作
    - 1. 先导入包,并设置一下绘图
    - 2. 接下来首先进行Zero-padding操作
    - 3. 下面进行简单的卷积步骤演示
    - 4. 现在开始真正的卷积网络前向传播了
    - 5. 接下来是池化了
  - 2. 反向传播
    - 1. 对于dA , dW , db .求解
    - 2. 池化层的反向传播
      - 1. 对于最大池化
      - 2. 对于平均池化
      - 3. 池化反向传播整合

1. 首先让我们来看看正向求卷积的操作

1. 先导入包,并设置一下绘图

代码如下，此不赘述

import numpy as np
import h5py
import matplotlib.pyplot as pltplt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'np.random.seed(1)

2. 接下来首先进行Zero-padding操作

padding操作就是相当于在图的四周加上规定的像素。这里给一段解释吧：

It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the “same” convolution, in which the height/width is exactly preserved after one layer.
It helps us keep more of the information at the border of an image. Without padding, very few values at the next layer would be affected by pixels as the edges of an image

下面我们来进行zero-padding操作，并且给出代码：

# GRADED FUNCTION: zero_paddef zero_pad(X, pad):"""Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, as illustrated in Figure 1.Argument:X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m imagespad -- integer, amount of padding around each image on vertical and horizontal dimensionsReturns:X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)"""### START CODE HERE ### (≈ 1 line)X_pad = np.pad(X , ((0 , 0) , (pad , pad) , (pad , pad) , (0 , 0)) , 'constant' , constant_values = 0)#说明一下这个用法：由于X是一个四维的，所以后面的也是四个二维的参数，第二个中来嗯个pad表示矩阵左右是否需要加上pad列### END CODE HERE ###return X_pad

说明一下np.pad()的用法：

由于X是一个四维的，所以后面的也是四个二维的参数，第二个中来嗯个pad表示矩阵左右是否需要加上pad列

所以输出一下是这样：

np.random.seed(1)
x = np.random.randn(4, 3, 3, 2)
x_pad = zero_pad(x, 2)
print ("x.shape =", x.shape)
print ("x_pad.shape =", x_pad.shape)
print ("x[1,1] =", x[1,1])
print ("x_pad[1,1] =", x_pad[1,1])fig, axarr = plt.subplots(1, 2)
axarr[0].set_title('x')
axarr[0].imshow(x[0,:,:,0])
axarr[1].set_title('x_pad')
axarr[1].imshow(x_pad[0,:,:,0])

下面是显示：

x.shape = (4, 3, 3, 2)
x_pad.shape = (4, 7, 7, 2)
x[1,1] = [[ 0.90085595 -0.68372786][-0.12289023 -0.93576943][-0.26788808  0.53035547]]
x_pad[1,1] = [[0. 0.][0. 0.][0. 0.][0. 0.][0. 0.][0. 0.][0. 0.]]

3. 下面进行简单的卷积步骤演示

（这里怕是要让你失望了，因为这一步暂时只是一个位乘。。。）
所以我直接上代码吧，，，没啥可解释的。。。

def conv_single_step(a_slice_prev, W, b):"""Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation of the previous layer.Arguments:a_slice_prev -- slice of input data of shape (f, f, n_C_prev)W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)Returns:Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data"""### START CODE HERE ### (≈ 2 lines of code)# Element-wise product between a_slice and W. Add bias.s = np.multiply(a_slice_prev, W) + b# Sum over all entries of the volume sZ = np.sum(s)### END CODE HERE ###return Z

输出一下：

np.random.seed(1)
a_slice_prev = np.random.randn(4, 4, 3)
W = np.random.randn(4, 4, 3)
b = np.random.randn(1, 1, 1)Z = conv_single_step(a_slice_prev, W, b)
print("Z =", Z)

长这样

Z = -23.16021220252078

4. 现在开始真正的卷积网络前向传播了

（当然，这里不包含全连接层）

注意几个公式：

n H = ⌊ n H p r e v - f + 2 \times p a d s t r i d e ⌋ + 1

$n_H = \lfloor \frac{n_{H_{prev}} - f + 2 \times pad}{stride} \rfloor +1$

n W = ⌊ n W p r e v - f + 2 \times p a d s t r i d e ⌋ + 1

$n_W = \lfloor \frac{n_{W_{prev}} - f + 2 \times pad}{stride} \rfloor +1$

n C = number of filters used in the convolution

$n_C = \text{number of filters used in the convolution}$

另外对于slice,我们姑且叫它矩阵切片吧，我们在代码中有确切的导出过程，那个一定要注意到。
下面看代码：

# GRADED FUNCTION: conv_forwarddef conv_forward(A_prev, W, b, hparameters):"""Implements the forward propagation for a convolution functionArguments:A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)b -- Biases, numpy array of shape (1, 1, 1, n_C)hparameters -- python dictionary containing "stride" and "pad"Returns:Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)cache -- cache of values needed for the conv_backward() function"""### START CODE HERE #### Retrieve dimensions from A_prev's shape (≈1 line)  #(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape# Retrieve dimensions from W's shape (≈1 line)(f, f, n_C_prev, n_C) = W.shape(f, f, n_C_prev, n_C) = W.shape# Retrieve information from "hparameters" (≈2 lines)stride = hparameters['stride']pad = hparameters['pad']# Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)n_H = 1 + int((n_H_prev + 2 * pad - f) / stride)n_W = 1 + int((n_W_prev + 2 * pad - f) / stride)# Initialize the output volume Z with zeros. (≈1 line)Z = np.zeros((m, n_H, n_W, n_C))# Create A_prev_pad by padding A_prevA_prev_pad = zero_pad(A_prev, pad)#接下来开始卷积操作for i in range(m):         #对每一个训练样例                       # loop over the batch of training examplesa_prev_pad = A_prev_pad[i]       #选择该训练样例                         # Select ith training example's padded activationfor h in range(n_H):                           # loop over vertical axis of the output volumefor w in range(n_W):                       # loop over horizontal axis of the output volumefor c in range(n_C):                   # loop over channels (= #filters) of the output volume#注意：上面的i,h,w,c都是从0开始的# Find the corners of the current "slice" (≈4 lines)#聚焦起始与结束位置vert_start = h * stridevert_end = vert_start + fhoriz_start = w * stridehoriz_end = horiz_start + f# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)Z[i, h, w, c] = np.sum(np.multiply(a_slice_prev, W[:, :, :, c]) + b[:, :, :, c])### END CODE HERE #### Making sure your output shape is correctassert(Z.shape == (m, n_H, n_W, n_C))# Save information in "cache" for the backpropcache = (A_prev, W, b, hparameters)return Z, cache

49-56行，就是矩阵切片得出的地方么，聚焦该起始与结束位置的导出。
（另外部分的解释，在代码的注释里有，注意看）

下面我们输出看看：

np.random.seed(1)
A_prev = np.random.randn(10,4,4,3)
W = np.random.randn(2,2,3,8)
b = np.random.randn(1,1,1,8)
hparameters = {"pad" : 2,"stride": 1}Z, cache_conv = conv_forward(A_prev, W, b, hparameters)
print("Z's mean =", np.mean(Z))
print("cache_conv[0][1][2][3] =", cache_conv[0][1][2][3])

结果如下:

Z's mean = 0.15585932488906465
cache_conv[0][1][2][3] = [-0.20075807  0.18656139  0.41005165]

5. 接下来是池化了

（注意有最大池化和平均池化，据NG说的最常用的是最大池化）

同样，我们贴出对于过滤器位置数学公式：

n H = ⌊ n H p r e v - f s t r i d e ⌋ + 1

$n_H = \lfloor \frac{n_{H_{prev}} - f}{stride} \rfloor +1$

n W = ⌊ n W p r e v - f s t r i d e ⌋ + 1

$n_W = \lfloor \frac{n_{W_{prev}} - f}{stride} \rfloor +1$

n C = n C p r e v

$n_C = n_{C_{prev}}$

对于最大池化以及平均池化的解释如下：

Max-pooling layer: slides an (f,ff,f) window over the input and stores the max value of the window in the output.

Average-pooling layer: slides an (f,ff,f) window over the input and stores the average value of the window in the output.

（这里我默认大家都看过课程了，并且上面已经给出了解释，所以此不赘述）

代码如下：

# GRADED FUNCTION: pool_forwarddef pool_forward(A_prev, hparameters, mode = "max"):"""Implements the forward pass of the pooling layerArguments:A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)hparameters -- python dictionary containing "f" and "stride"mode -- the pooling mode you would like to use, defined as a string ("max" or "average")Returns:A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters """# Retrieve dimensions from the input shape(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape# Retrieve hyperparameters from "hparameters"f = hparameters["f"]stride = hparameters["stride"]# Define the dimensions of the outputn_H = int(1 + (n_H_prev - f) / stride)n_W = int(1 + (n_W_prev - f) / stride)n_C = n_C_prev# Initialize output matrix AA = np.zeros((m, n_H, n_W, n_C))              ### START CODE HERE ###for i in range(m):                         # loop over the training examplesfor h in range(n_H):                     # loop on the vertical axis of the output volumefor w in range(n_W):                 # loop on the horizontal axis of the output volumefor c in range (n_C):            # loop over the channels of the output volume# Find the corners of the current "slice" (≈4 lines)vert_start = h * stridevert_end = vert_start + fhoriz_start = w * stridehoriz_end = horiz_start + f# Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]# Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.if mode == "max":A[i, h, w, c] = np.max(a_prev_slice)elif mode == "average":A[i, h, w, c] = np.mean(a_prev_slice)### END CODE HERE #### Store the input and hparameters in "cache" for pool_backward()cache = (A_prev, hparameters)# Making sure your output shape is correctassert(A.shape == (m, n_H, n_W, n_C))return A, cache

大家可以看到，对于池化，其实前面和卷积有很相似的地方，都是对过滤器的定位。
另外，我们这里调用了np.max以及np.mean，如果不了解的同学，需要去了解一下numpy库里的这两个方法。

下面可以看看这个函数调用：

np.random.seed(1)
A_prev = np.random.randn(2, 4, 4, 3)
hparameters = {"stride" : 1, "f": 4}A, cache = pool_forward(A_prev, hparameters)
print("mode = max")
print("A =", A)
print()
A, cache = pool_forward(A_prev, hparameters, mode = "average")
print("mode = average")
print("A =", A)

输出如下：

mode = max
A = [[[[ 1.74481176  1.6924546   2.10025514]]][[[ 1.19891788  1.51981682  2.18557541]]]]mode = average
A = [[[[-0.09498456  0.11180064 -0.14263511]]][[[-0.09525108  0.28325018  0.33035185]]]]

2. 反向传播

1. 对于dA , dW , db .求解

公式贴出如下：

d A + = \sum h = 0 n H \sum w = 0 n W W c \times d Z h w (1)

$dA += \sum _{h=0} ^{n_H} \sum_{w=0} ^{n_W} W_c \times dZ_{hw} \tag{1}$

d W c + = \sum h = 0 n H \sum w = 0 n W a s l i c e \times d Z h w (2)

$dW_c += \sum _{h=0} ^{n_H} \sum_{w=0} ^ {n_W} a_{slice} \times dZ_{hw} \tag{2}$

d b = \sum h \sum w d Z h w (3)

$db = \sum_h \sum_w dZ_{hw} \tag{3}$

对于他们分别解释一下：

dA_prev : gradient of the cost with respect to the input of the conv layer (A_prev),
numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)

dW : gradient of the cost with respect to the weights of the conv layer (W)
numpy array of shape (f, f, n_C_prev, n_C)

db : gradient of the cost with respect to the biases of the conv layer (b)
numpy array of shape (1, 1, 1, n_C)

然后代码如下：

def conv_backward(dZ, cache):"""Implement the backward propagation for a convolution function"""### START CODE HERE #### Retrieve information from "cache"(A_prev, W, b, hparameters) = cache# Retrieve dimensions from A_prev's shape(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape# Retrieve dimensions from W's shape(f, f, n_C_prev, n_C) = W.shape# Retrieve information from "hparameters"stride = hparameters['stride']pad = hparameters['pad']# Retrieve dimensions from dZ's shape(m, n_H, n_W, n_C) = dZ.shape# Initialize dA_prev, dW, db with the correct shapesdA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))                           dW = np.zeros((f, f, n_C_prev, n_C))db = np.zeros((1, 1, 1, n_C))# Pad A_prev and dA_prevA_prev_pad = zero_pad(A_prev, pad)dA_prev_pad = zero_pad(dA_prev, pad)for i in range(m):                       # loop over the training examples# select ith training example from A_prev_pad and dA_prev_pada_prev_pad = A_prev_pad[i]da_prev_pad = dA_prev_pad[i]for h in range(n_H):                   # loop over vertical axis of the output volumefor w in range(n_W):               # loop over horizontal axis of the output volumefor c in range(n_C):           # loop over the channels of the output volume# Find the corners of the current "slice"vert_start = h * stridevert_end = vert_start + fhoriz_start = w * stridehoriz_end = horiz_start + f# Use the corners to define the slice from a_prev_pada_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]# Update gradients for the window and the filter's parameters using the code formulas given aboveda_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]dW[:,:,:,c] += a_slice * dZ[i, h, w, c]db[:,:,:,c] += dZ[i, h, w, c]# Set the ith training example's dA_prev to the unpaded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])dA_prev[i, :, :, :] = dA_prev_pad[i, pad:-pad, pad:-pad, :]### END CODE HERE #### Making sure your output shape is correctassert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))return dA_prev, dW, db

其中的参数dZ 以及 cache

dZ : gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
cache ： cache of values needed for the conv_backward(), output of conv_forward()

这里按照公式走，仔细看看会发现是这么回事，不过确实挺复杂，毕竟有很多个矩阵切片操作

输出看看

np.random.seed(1)
dA, dW, db = conv_backward(Z, cache_conv)
print("dA_mean =", np.mean(dA))
print("dW_mean =", np.mean(dW))
print("db_mean =", np.mean(db))

结果如下：

dA_mean = 9.60899067587dW_mean = 10.5817412755db_mean = 76.3710691956

2. 池化层的反向传播

1. 对于最大池化

接下来将创建一个叫create_mask_from_window的函数，传入参数是一个矩阵切片（其实就是一个小矩阵）可以返回一组bool矩阵，即如果该位置是最大值，就为True,否则是False。

直接上实例，看看输出还是更好理解：

def create_mask_from_window(x):"""Creates a mask from an input matrix x, to identify the max entry of x.Arguments:x -- Array of shape (f, f)Returns:mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x."""### START CODE HERE ### (≈1 line)mask = (x == np.max(x))### END CODE HERE ###return mask

输出

np.random.seed(1)
x = np.random.randn(2,3)
mask = create_mask_from_window(x)
print('x = ', x)
print("mask = ", mask)

结果是：

x =  [[ 1.62434536 -0.61175641 -0.52817175][-1.07296862  0.86540763 -2.3015387 ]]mask =  [[ True False False][False False False]]

这样是不是就好理解多了呢。
对了，这个最后是用在最大池化的反向传播里的。

2. 对于平均池化

接下来这个是对于平均池化的
我们会创建一个distribute_value函数，其中参数是dz,以及切片矩阵的规模，最后返回的是一个shape规模的矩阵，里面的值为

d z s h a p e _ H * s h a p e _ W

$\frac{dz}{shape\_H * shape\_W}$

还是看看实例把，这样更好理解

def distribute_value(dz, shape):"""Distributes the input value in the matrix of dimension shapeArguments:dz -- input scalarshape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dzReturns:a -- Array of size (n_H, n_W) for which we distributed the value of dz"""### START CODE HERE #### Retrieve dimensions from shape (≈1 line)(n_H, n_W) = shape# Compute the value to distribute on the matrix (≈1 line)average = dz / (n_H * n_W)# Create a matrix where every entry is the "average" value (≈1 line)a = np.ones(shape) * average### END CODE HERE ###return a

输出一下：

a = distribute_value(2, (2,2))
print('distributed value =', a)

结果：

distributed value = [[ 0.5  0.5][ 0.5  0.5]]

好了，以上就是对于两种池化的反向传播的内部原理的代码实现，然后我们现在需要将他们合起来，看下面

3. 池化反向传播整合

我们先直接看看代码

def pool_backward(dA, cache, mode = "max"):"""Implements the backward pass of the pooling layerArguments:dA -- gradient of cost with respect to the output of the pooling layer, same shape as Acache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters mode -- the pooling mode you would like to use, defined as a string ("max" or "average")Returns:dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev"""### START CODE HERE #### Retrieve information from cache (≈1 line)(A_prev, hparameters) = cache# Retrieve hyperparameters from "hparameters" (≈2 lines)stride = hparameters['stride']f = hparameters['f']# Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)m, n_H_prev, n_W_prev, n_C_prev = A_prev.shapem, n_H, n_W, n_C = dA.shape# Initialize dA_prev with zeros (≈1 line)dA_prev = np.zeros_like(A_prev)for i in range(m):                       # loop over the training examples# select training example from A_prev (≈1 line)a_prev = A_prev[i]for h in range(n_H):                   # loop on the vertical axisfor w in range(n_W):               # loop on the horizontal axisfor c in range(n_C):           # loop over the channels (depth)# Find the corners of the current "slice" (≈4 lines)vert_start = h * stridevert_end = vert_start + fhoriz_start = w * stridehoriz_end = horiz_start + f# Compute the backward propagation in both modes.if mode == "max":# Use the corners and "c" to define the current slice from a_prev (≈1 line)a_prev_slice = a_prev[vert_start:vert_end, horiz_start:horiz_end, c]# Create the mask from a_prev_slice (≈1 line)mask = create_mask_from_window(a_prev_slice)# Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i, vert_start, horiz_start, c]elif mode == "average":# Get the value a from dA (≈1 line)da = dA[i, vert_start, horiz_start, c]# Define the shape of the filter as fxf (≈1 line)shape = (f, f)# Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape)### END CODE #### Making sure your output shape is correctassert(dA_prev.shape == A_prev.shape)return dA_prev

个人认为这里还是有不少地方需要注意的，参数也非常多，（然后等我自己把这里完全缕清楚了我再写一写理解吧，现在我怕误人子弟。。。也欢迎大佬提意见，可以上简书，我的这些图片几乎都是上传在简书的云端，可以通过图片链接找到我，或者简书搜 PerfectDemoT）

然后输出：

np.random.seed(1)
A_prev = np.random.randn(5, 5, 3, 2)
hparameters = {"stride" : 1, "f": 2}
A, cache = pool_forward(A_prev, hparameters)
dA = np.random.randn(5, 4, 2, 2)dA_prev = pool_backward(dA, cache, mode = "max")
print("mode = max")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,1] = ', dA_prev[1,1])  
print()
dA_prev = pool_backward(dA, cache, mode = "average")
print("mode = average")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,1] = ', dA_prev[1,1])

输出结果：

mode = maxmean of dA =  0.145713902729dA_prev[1,1] =  [[ 0.          0.        ][ 5.05844394 -1.68282702][ 0.          0.        ]]

这篇关于deep_learning_month4_week1_convolution_model_step_by_step的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

deep_learning_month4_week1_convolution_model_step_by_step

deep_learning_month4_week1_convolution_model_step_by_step

1. 首先让我们来看看正向求卷积的操作

1. 先导入包,并设置一下绘图

2. 接下来首先进行Zero-padding操作

3. 下面进行简单的卷积步骤演示

4. 现在开始真正的卷积网络前向传播了

5. 接下来是池化了

2. 反向传播

1. 对于dA , dW , db .求解

2. 池化层的反向传播

1. 对于最大池化

2. 对于平均池化

3. 池化反向传播整合

相关文章

Pydantic中model_validator的实现

GORM中Model和Table的区别及使用

vue解决子组件样式覆盖问题scoped deep

MVC（Model-View-Controller）和MVVM（Model-View-ViewModel）

简单的Q-learning|小明的一维世界(3)

简单的Q-learning|小明的一维世界(2)

Deep Ocr

diffusion model 合集

【tensorflow 使用错误】tensorflow2.0 过程中出现 Error : Failed to get convolution algorithm

Step by Step 实现基于 Cloudera 5.8.2 的企业级安全大数据平台 - Kerberos的整合