误差反向传播法

用到的知识:

计算图进行局部运算
简单层的实现
1. 加法层
2. 乘法层
激活函数层的实现
1. Relu层
2. Sigmoid层
Affine/Softmax层的实现

跳过计算图介绍

简单层的实现

加法层

python实现

class AddLayer:
    def __init__ (self):
        pass
    
    def forward(self,x,y):
        out =x + y
        return out
    
    def backward(self,dout):
        dx=dout*1
        dy=dout*1
        
        return dx,dy

乘法层

python实现:

class MulLayer:
    def __init__(self):
        self.x = None
        self.y = None
        
    def forward(self,x,y):
        self.x=x
        self.y=y
        out=x*y
        rerutn out
        
    def backward(self,dout):
        dx=dout*self.y
        dy=dout*self.x
        return dx,dy

激活函数层的实现

ReLu层

# 这里输入层都是以numpy数组输入
class Relu:
    def __init__(self):
        self.mark=None
        
    def forward(self,x):
        self.mark=(x<=0)
        out=x.copy()
        out[self.mark]=0
        return out
    
    def backward(self,dout):
        dout[self.mark]=0
        dx=dout
        return dx

Sigmoid层

根据以上式子，相应的python实现为:

class Sigmoid:
    def __init__(self):
        self.out=None
        
    def forward(self,x):
        out=1/(1+np.exp(-x))
        self.out=out
        return out
    
    def backward(self,dout):
        dy=dout*(1-self.out)*self.out
        return dy

Affine/Softmax层的实现

Affine层的批量化

python实现:

class Affine:
    def __init__(self,W,b):
        self.W=W
        self.b=b
        self.dW=None
        self.db=None
        
    def forward(self,x):
        self.x=x
        out=np.dot(x,self.W)+self.b
        return out
    
    def backward(self,dout):
        dx=np.dot(dout,self.W.T)
        self.dW=np.dot(self.x.T,dout)
        self.db=np.sum(dout,axis=0)
        
        return dx

Softmax-with-loss层

神经网络中进行的处理有推理（inference）和学习两个阶段。神经网络的推理通常不使用Softmax 层。比如，用图5-28 的网络进行推理时，会将最后一个Affine 层的输出作为识别结果。神经网络中未被正规化的输出结果（图5-28 中Softmax 层前面的Affine 层的输出）有时被称为“得分”。也就是说，当神经网络的推理只需要给出一个答案的情况下，因为此时只对得分最大值感兴趣，所以不需要Softmax 层。不过，神经网络的学习阶段则需要Softmax层。

化简计算图可以得到以下的简化图:

Softmax 层的反向传播得到了（y1 − t1, y2 − t2, y3 − t3）这样“漂亮”的结果。由于（y1, y2, y3）是Softmax 层的输出，（t1, t2, t3）是监督数据，所以（y1 − t1, y2 − t2, y3 − t3）是Softmax 层的输出和教师标签的差分。神经网络的反向传播会把这个差分表示的误差传递给前面的层，这是神经网络学习中的重要性质。刚刚的（y1 − t1, y2 − t2, y3 − t3）正是Softmax层的输出与教师标签的差，直截了当地表示了当前神经网络的输出与教师标签的误差。

这里考虑一个具体的例子，比如思考教师标签是（0, 1, 0），Softmax 层的输出是(0.3, 0.2, 0.5) 的情形。因为正确解标签处的概率是0.2（20%），这个时候的神经网络未能进行正确的识别。此时，Softmax 层的反向传播传递的是(0.3, −0.8, 0.5) 这样一个大的误差。因为这个大的误差会向前面的层传播，所以Softmax层前面的层会从这个大的误差中学习到“大”的内容。

使用交叉熵误差作为softmax 函数的损失函数后，反向传播得到（y1 − t1, y2 − t2, y3 − t3）这样“ 漂亮”的结果。实际上，这样“漂亮”的结果并不是偶然的，而是为了得到这样的结果，特意设计了交叉熵误差函数。回归问题中输出层使用“恒等函数”，损失函数使用“平方和误差”，也是出于同样的理由（3.5 节）。也就是说，使用“平方和误差”作为“恒等函数”的损失函数，反向传播才能得到（y1 −t1, y2 − t2, y3 − t3）这样“漂亮”的结果。

交叉熵误差函数python实现:

def cross_entropy_error(y,t):
    if y.ndim==1:
        y.reshape(1,y.size)
        t.reshape(1,y.size)
    if y.size==t.size:
        t.argmax(axis=1)
    
    batch_size=y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size),t]+1e-7))/batch_size

Softmax-with-Loss 层的实现

def softmax(x):
    if x.ndim == 2:
        x = x.T 
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T 

    x = x - np.max(x) # 溢出对策
    return np.exp(x) / np.sum(np.exp(x))

class Soft-with-loss:
    def __init__(self):
        self.loss=None
        self.y=None
        self.t=None
    
    def forward(self,x,t):
        self.t=t
        self.y=softmax(x)
        self.loss=cross_entropy_error(self.y,self.t)
        
        return self.loss
    
    def backward(self,dout):
        batch_size=self.t.shape[0]
        dx=(self.y-self.t)/batch_size
        
        return dx

误差反向传播法的实现

神经网络学习的全貌图
- 前提
  - 神经网络中有合适的权重和偏置，调整权重和偏置以便拟合训练数据的过程称为学习。神经网络的学习分为下面4 个步骤。
- 步骤1（mini-batch）
从训练数据中随机选择一部分数据。
- 步骤2（计算梯度）
- 计算损失函数关于各个权重参数的梯度。
步骤3（更新参数）
- 将权重参数沿梯度方向进行微小的更新。
- 步骤4（重复）
  - 重复步骤1、步骤2、步骤3。

基于BP算法的TwoLayerNet的实现

import numpy as np
from collections import OrderedDict

def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)
    
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val # 还原值
        it.iternext()# 下一个维度   
      
    return grad

def Softmax(x):
    if x.ndim==2:
        x=x.T 
        x=x-np.max(x,axis=0)
        y=np.exp(x)/np.sum(np.exp(x),axis=0)
        return y.T 
    
    x=x-np.max(x)
    y=np.exp(x)/np.sum(np.exp(x))
    return y

def Sigmoid(x):
    return 1/(1+np.exp(-x))

def cross_entropy_error(y,t):
    if y.ndim==1:
        y.reshape(1,y.size)
        t.reshape(1,t.size)

    if y.size==t.size:
        t.argmax(axis=1)

    batch_size=y.shape([0])
    return -np.sum(np.log(y[np.arange(batch_size),t]+1e-7))

#----------------------------------------------------------------

class Affine:
    def __init__ (self,w,b):
        self.w=w
        self.b=b
        self.dw=None
        self.dx=None

    def forward(self,x):
        self.x=x
        out=np.dot(x,self.w)+self.b
        return out

    def backward(self,dout):
        dx=np.dot(dout,self.w.T)
        self.dw=np.dot(self.x.T,dout)
        self.db=np.sum(self.b,axis=0)
        return dx


class Relu:
    def __init__ (self):
        self.mask=None
    
    def forward(self,x): 
        self.mask=(x<=0)
        out=x.copy()
        out[self.mask]=0
        return 0

    def backward(self,dout):
        dout[self.mask]=0
        dx=dout
        return dx

class SoftmaxWithLoss:
    def __init__(self):
        self.loss=None
        self.y=None
        self.t=None

    def forward(self,x,t):
        self.t=t
        self.y=Softmax(x)
        self.loss=cross_entropy_error(self.y,self.t)
        return self.loss

    def backward(self,dout):
        batch_size=self.t.shape[(0)]
        dx=(self.y-self.t)/batch_size
        return dx

#---------------------------------------------------------------------
class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):
        # 初始化权重
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) 
        self.params['b2'] = np.zeros(output_size)

        # 生成层
        self.layers = OrderedDict()
        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1'])
        self.layers['Relu1'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2'])

        self.lastLayer = SoftmaxWithLoss()
        
    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        
        return x
        
    # x:输入数据, t:监督数据
    def loss(self, x, t):
        y = self.predict(x)
        return self.lastLayer.forward(y, t)
    
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        if t.ndim != 1 : 
            t = np.argmax(t, axis=1)
        
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
        
    # x:输入数据, t:监督数据
    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        
        grads = {}
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        
        return grads
        
    def gradient(self, x, t):
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.lastLayer.backward(dout)
        
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        # 设定
        grads = {}
        grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads

梯度确认

数值微分的优点是实现简单，因此，一般情况下不太容易出错。而误差反向传播法的实现很复杂，容易出错。所以，经常会比较数值微分的结果和误差反向传播法的结果，以确认误差反向传播法的实现是否正确。确认数值微分求出的梯度结果和误差反向传播法求出的结果是否一致（严格地讲，是非常相近）的操作称为梯度确认（gradient check）。

# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # 为了导入父目录的文件而进行的设定
import numpy as np
from mnist import load_mnist
from two_layer_net import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

x_batch = x_train[:3]
t_batch = t_train[:3]

grad_numerical = network.numerical_gradient(x_batch, t_batch)
grad_backprop = network.gradient(x_batch, t_batch)

for key in grad_numerical.keys():
    diff = np.average( np.abs(grad_backprop[key] - grad_numerical[key]) )
    print(key + ":" + str(diff))

神经网络学习（bp算法版）

import sys, os
sys.path.append(os.pardir)

import numpy as np
from mnist import load_mnist
from two_layer_net import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # 梯度
    #grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)
    
    # 更新
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(train_acc, test_acc)