卷积神经网络内容概述

  1. 卷积层(Convolution层)
  2. 池化层(Pooling层)
  3. 卷积层和池化层的python实现
  4. CNN的实现

基本内容

卷积层

之前的全连接神经网络会需要我们把所有数据拉成一维数据来输入,当我们把三维,二维数据拉成一维输入时,实际上也损失了关于形状的信息,如图像是3 维形状,这个形状中应该含有重要的空间信息。比如,空间上邻近的像素为相似的值、RBG的各个通道之间分别有密切的关联性、相距较远的像素之间没有什么关联等,3 维形状中可能隐藏有值得提取的本质模式。

最容易想到的就是直接将三维数据输入,而实现这步的方法就是利用卷积网络运算

与数学中的卷积运算有所不同,这里更体现的是图像与感受野的映射关系

数学中的卷积还有翻转,且其中的卷积核是以及给定的值,而CNN中的全都不同


相关概念

  • 特征图(feature map)

    • 输入特征图(input feature map)
    • 输出特征图(output feature map)
  • 感受野

    The receptive field is defined as the region in the input space that a particular CNN’s feature is looking at (i.e. be affected by).

    —— A guide to receptive field arithmetic for Convolutional Neural Networks

    感受野(Receptive Field),指的是神经网络中神经元“看到的”输入区域,在卷积神经网络中,feature map上某个元素的计算受输入图像上某个区域的影响,这个区域即该元素的感受野。

  • 滤波器(过滤器)(卷积核)

    这里有些论文中会称其为卷积核,可能也有沿用数学中的叫法的意思,这里其实也可以keras中的定义。

    So this is where a key distinction between terms comes in handy:
    whereas in the 1 channel case, where the term filter and kernel are interchangeable, in the general case,

    they’re actually pretty different.
    Each filter actually happens to be a collection of kernels, with there being one kernel for every single input channel to the layer,and each kernel being unique.

    keras中,
    当channels=1时,那么filter就是kernel
    当channels>1时,那么filter就是指一堆kernel
    其中channels表示卷积核的数量,一般为2的指数次方

    但说真的,其实本质上没啥区别就叫法不同,运算都不影响,只要知道怎么算更重要

  • 步幅(stride)

  • 填充(padding)

    使用填充主要是为了调整输出的大小。


相关运算

其实就是两个矩阵点乘运算然后在对结果求和,然后把结果放到对应位置上。

多维卷积运算时,通道方向上有多个特征图时,会按通道进行输入数据和滤波器的卷积运算,并将结果相加,从而得到输出。


对于经过一次卷积运算后输出的特征图大小,可以用以下公式写出:

这里,假设输入大小为(H,W),滤波器大小为(FH, FW),输出大小为(OH,OW),填充为P,步幅为S。此时,输出大小可通过式(7.1) 进行计算。

公式其实很好推出,就是1+(过滤器剩下要走的步数)

这里填充(p)*2,是因为你填充不可能只填一边吧,你是填周围。


池化层

有着压缩特征图的作用

特征


  • 没有要学习的参数

  • 通道数不发生变化

  • 对微小的位置变化具有鲁棒性(健壮)

    输入数据发生微小偏差时,池化仍会返回相同的结果。因此,池化对输入数据的微小偏差具有鲁棒性。比如,3 × 3 的池化的情况下,如图7-16 所示,池化会吸收输入数据的偏差(根据数据的不同,结果有可能不一致)。


相关处理


卷积层和池化层的实现

一些方便实现的函数

im2col

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def im2col(input_data,filter_h,filter_w,stride=1,pad=0):
"""
----------
input_data : 由(数据量, 通道, 高, 长)的4维数组构成的输入数据
ksize :卷积核大小
stride : 步幅
pad : 填充
Returns
-------
col : 2维数组
"""
N,C,H,W = input_data.shape
out_h = (H + 2 * pad - ksize)//stride + 1
out_w = (W + 2 * pad - ksize)//stride + 1
#填充padiing
#只填充H,W维度的
img = np.pad(input_data,[(0,0), (0,0), (pad, pad), (pad, pad)],'constant')
##最后的shape为 out_h,out_w 表示方便取同一行的感受野,然后reshape的时候不同列变为同一行
col = np.zeros((N,C,filter_h,filter_w,out_h,out_w))
for y in range(filter_h):
y_max = y + stride * out_h
for x in range(filter_w):
##一次性取out_w *stride 个数放入新矩阵中
x_max = x + stride * out_w
col[:,:,y,x,:,:]=img[:,:,y:y_max,x:x_max]
#transpose 操作很关键
#原来shape是(N,C,filter_h,filter_w,out_h,out_w)
#现在保持N,out_w,out_h不变,将C,filter_w,filter_h的值放都后面维度,相当于把卷积感受野部分全放后面维度
#然后reshape 操作把channel,filter_h,filter_w(卷积感受野部分) 规整成一行,方便直接与卷积做矩阵乘法
col = col.transpose(0,4,5,1,2,3).reshape(N*out_h*out_w, -1)
return col

col2im

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def col2im(col, input_shape, filter_h, filter_w, stride=1, pad=0):
"""

Parameters
----------
col :
input_shape : 输入数据的形状(例:(10, 1, 28, 28))
filter_h :
filter_w
stride
pad

Returns
-------

"""
N, C, H, W = input_shape
out_h = (H + 2*pad - filter_h)//stride + 1
out_w = (W + 2*pad - filter_w)//stride + 1
col = col.reshape(N, out_h, out_w, C, filter_h, filter_w).transpose(0, 3, 4, 5, 1, 2)

img = np.zeros((N, C, H + 2*pad + stride - 1, W + 2*pad + stride - 1))
for y in range(filter_h):
y_max = y + stride*out_h
for x in range(filter_w):
x_max = x + stride*out_w
img[:, :, y:y_max:stride, x:x_max:stride] += col[:, :, y, x, :, :]

return img[:, :, pad:H + pad, pad:W + pad]

卷积层实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
class Convolution:
def __init__(self, W, b, stride=1, pad=0):
self.W = W
self.b = b
self.stride = stride
self.pad = pad

# 中间数据(backward时使用)
self.x = None
self.col = None
self.col_W = None

# 权重和偏置参数的梯度
self.dW = None
self.db = None

def forward(self, x):
FN, C, FH, FW = self.W.shape
N, C, H, W = x.shape
out_h = 1 + int((H + 2*self.pad - FH) / self.stride)
out_w = 1 + int((W + 2*self.pad - FW) / self.stride)

col = im2col(x, FH, FW, self.stride, self.pad)
#这里输出为(FN*out_h*out_w,c*filter_h, filter_w)
col_W = self.W.reshape(FN, -1).T

out = np.dot(col, col_W) + self.b
out = out.reshape(N, out_h, out_w, -1).transpose(0, 3, 1, 2)
#out最后为(N,C,out_h,out_w)
self.x = x
self.col = col
self.col_W = col_W

return out

def backward(self, dout):
FN, C, FH, FW = self.W.shape
dout = dout.transpose(0,2,3,1).reshape(-1, FN)
#还原成out = np.dot(col, col_W) + self.b这里的out相对的格式

self.db = np.sum(dout, axis=0)
self.dW = np.dot(self.col.T, dout)
self.dW = self.dW.transpose(1, 0).reshape(FN, C, FH, FW)

dcol = np.dot(dout, self.col_W.T)
dx = col2im(dcol, self.x.shape, FH, FW, self.stride, self.pad)

return dx

池化层实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class Pooling:
def __init__(self, pool_h, pool_w, stride=1, pad=0):
self.pool_h = pool_h
self.pool_w = pool_w
self.stride = stride
self.pad = pad

self.x = None
self.arg_max = None

def forward(self, x):
N, C, H, W = x.shape
out_h = int(1 + (H - self.pool_h) / self.stride)
out_w = int(1 + (W - self.pool_w) / self.stride)

col = im2col(x, self.pool_h, self.pool_w, self.stride, self.pad)
col = col.reshape(-1, self.pool_h*self.pool_w)

arg_max = np.argmax(col, axis=1)
out = np.max(col, axis=1)
out = out.reshape(N, out_h, out_w, C).transpose(0, 3, 1, 2)

self.x = x
self.arg_max = arg_max

return out

def backward(self, dout):
dout = dout.transpose(0, 2, 3, 1)

pool_size = self.pool_h * self.pool_w
dmax = np.zeros((dout.size, pool_size))
dmax[np.arange(self.arg_max.size), self.arg_max.flatten()] = dout.flatten()
dmax = dmax.reshape(dout.shape + (pool_size,))

dcol = dmax.reshape(dmax.shape[0] * dmax.shape[1] * dmax.shape[2], -1)
dx = col2im(dcol, self.x.shape, self.pool_h, self.pool_w, self.stride, self.pad)

return dx

CNN的实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
class SimpleConvNet:
"""简单的ConvNet

conv - relu - pool - affine - relu - affine - softmax

Parameters
----------
input_size : 输入大小(MNIST的情况下为784)
hidden_size_list : 隐藏层的神经元数量的列表(e.g. [100, 100, 100])
output_size : 输出大小(MNIST的情况下为10)
activation : 'relu' or 'sigmoid'
weight_init_std : 指定权重的标准差(e.g. 0.01)
指定'relu'或'he'的情况下设定“He的初始值”
指定'sigmoid'或'xavier'的情况下设定“Xavier的初始值”
"""
def __init__(self, input_dim=(1, 28, 28),
conv_param={'filter_num':30, 'filter_size':5, 'pad':0, 'stride':1},
hidden_size=100, output_size=10, weight_init_std=0.01):
filter_num = conv_param['filter_num']
filter_size = conv_param['filter_size']
filter_pad = conv_param['pad']
filter_stride = conv_param['stride']
input_size = input_dim[1]
conv_output_size = (input_size - filter_size + 2*filter_pad) / filter_stride + 1
pool_output_size = int(filter_num * (conv_output_size/2) * (conv_output_size/2))

# 初始化权重
self.params = {}
self.params['W1'] = weight_init_std * \
np.random.randn(filter_num, input_dim[0], filter_size, filter_size)
self.params['b1'] = np.zeros(filter_num)
self.params['W2'] = weight_init_std * \
np.random.randn(pool_output_size, hidden_size)
self.params['b2'] = np.zeros(hidden_size)
self.params['W3'] = weight_init_std * \
np.random.randn(hidden_size, output_size)
self.params['b3'] = np.zeros(output_size)

# 生成层
self.layers = OrderedDict()
self.layers['Conv1'] = Convolution(self.params['W1'], self.params['b1'],
conv_param['stride'], conv_param['pad'])
self.layers['Relu1'] = Relu()
self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
self.layers['Relu2'] = Relu()
self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])

self.last_layer = SoftmaxWithLoss()

def predict(self, x):
for layer in self.layers.values():
x = layer.forward(x)

return x

def loss(self, x, t):
"""求损失函数
参数x是输入数据、t是教师标签
"""
y = self.predict(x)
return self.last_layer.forward(y, t)

def accuracy(self, x, t, batch_size=100):
if t.ndim != 1 : t = np.argmax(t, axis=1)

acc = 0.0

for i in range(int(x.shape[0] / batch_size)):
tx = x[i*batch_size:(i+1)*batch_size]
tt = t[i*batch_size:(i+1)*batch_size]
y = self.predict(tx)
y = np.argmax(y, axis=1)
acc += np.sum(y == tt)

return acc / x.shape[0]

def numerical_gradient(self, x, t):
"""求梯度(数值微分)

Parameters
----------
x : 输入数据
t : 教师标签

Returns
-------
具有各层的梯度的字典变量
grads['W1']、grads['W2']、...是各层的权重
grads['b1']、grads['b2']、...是各层的偏置
"""
loss_w = lambda w: self.loss(x, t)

grads = {}
for idx in (1, 2, 3):
grads['W' + str(idx)] = numerical_gradient(loss_w, self.params['W' + str(idx)])
grads['b' + str(idx)] = numerical_gradient(loss_w, self.params['b' + str(idx)])

return grads

def gradient(self, x, t):
"""求梯度(误差反向传播法)

Parameters
----------
x : 输入数据
t : 教师标签

Returns
-------
具有各层的梯度的字典变量
grads['W1']、grads['W2']、...是各层的权重
grads['b1']、grads['b2']、...是各层的偏置
"""
# forward
self.loss(x, t)

# backward
dout = 1
dout = self.last_layer.backward(dout)

layers = list(self.layers.values())
layers.reverse()
for layer in layers:
dout = layer.backward(dout)

# 设定
grads = {}
grads['W1'], grads['b1'] = self.layers['Conv1'].dW, self.layers['Conv1'].db
grads['W2'], grads['b2'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
grads['W3'], grads['b3'] = self.layers['Affine2'].dW, self.layers['Affine2'].db

return grads

def save_params(self, file_name="params.pkl"):
params = {}
for key, val in self.params.items():
params[key] = val
with open(file_name, 'wb') as f:
pickle.dump(params, f)

def load_params(self, file_name="params.pkl"):
with open(file_name, 'rb') as f:
params = pickle.load(f)
for key, val in params.items():
self.params[key] = val

for i, key in enumerate(['Conv1', 'Affine1', 'Affine2']):
self.layers[key].W = self.params['W' + str(i+1)]
self.layers[key].b = self.params['b' + str(i+1)]