（二）神经网络入门之Logistic回归（分类问题）

pf_miles 发布于2019-07-30 15:18 / 3705人阅读

摘要：那么，概率将是神经网络输出的，即。函数实现了函数，函数实现了损失函数，实现了神经网络的输出结果，实现了神经网络的预测结果。

作者：chen_h
微信号 & QQ：862251340
微信公众号：coderpai
简书地址：https://www.jianshu.com/p/d94...

这篇教程是翻译Peter Roelants写的神经网络教程，作者已经授权翻译，这是原文。

该教程将介绍如何入门神经网络，一共包含五部分。你可以在以下链接找到完整内容。

（一）神经网络入门之线性回归

Logistic分类函数

（二）神经网络入门之Logistic回归（分类问题）

（三）神经网络入门之隐藏层设计

Softmax分类函数

（四）神经网络入门之矢量化

（五）神经网络入门之构建多层网络

Logistic回归（分类问题）

这部分教程将介绍一部分：

Logistic分类模型

我们在上次的教程中给出了一个很简单的模型，只有一个输入和一个输出。在这篇教程中，我们将构建一个二分类模型，输入参数是两个变量。这个模型在统计上被称为Logistic回归模型，网络结构可以被描述如下：

我们先导入教程需要使用的软件包。

import numpy as np 
import matplotlib.pyplot as plt 
from matplotlib.colors import colorConverter, ListedColormap
from matplotlib import cm

定义类分布

在教程中，目标分类t将从两个独立分布中产生，当t=1时，用蓝色表示。当t=0时，用红色表示。输入参数X是一个N*2的矩阵，目标分类t是一个N * 1的向量。更直观的表现，见下图。

# Define and generate the samples
nb_of_samples_per_class = 20  # The number of sample in each class
red_mean = [-1,0]  # The mean of the red class
blue_mean = [1,0]  # The mean of the blue class
std_dev = 1.2  # standard deviation of both classes
# Generate samples from both classes
x_red = np.random.randn(nb_of_samples_per_class, 2) * std_dev + red_mean
x_blue = np.random.randn(nb_of_samples_per_class, 2) * std_dev + blue_mean

# Merge samples in set of input variables x, and corresponding set of output variables t
X = np.vstack((x_red, x_blue))
t = np.vstack((np.zeros((nb_of_samples_per_class,1)), np.ones((nb_of_samples_per_class,1))))

# Plot both classes on the x1, x2 plane
plt.plot(x_red[:,0], x_red[:,1], "ro", label="class red")
plt.plot(x_blue[:,0], x_blue[:,1], "bo", label="class blue")
plt.grid()
plt.legend(loc=2)
plt.xlabel("$x_1$", fontsize=15)
plt.ylabel("$x_2$", fontsize=15)
plt.axis([-4, 4, -4, 4])
plt.title("red vs. blue classes in the input space")
plt.show()

Logistic函数和交叉熵损失函数

Logistic函数

我们设计的网络的目的是从输入的x去预测目标t。假设，输入x = [x1, x2]，权重w = [w1, w2]，预测目标t = 1。那么，概率P(t = 1|x, w)将是神经网络输出的y，即y = σ(x∗wT)。其中，σ表示Logistic函数，定义如下：

如果，对于Logistic函数和它的导数还不是很清楚的，可以查看这个教程，里面进行了详细描述。

交叉熵损失函数

对于这个分类问题的损失函数优化，我们使用交叉熵误差函数来解决，对于每个训练样本i，交叉熵误差函数定义如下：

如果我们要计算整个训练样本的交叉熵误差，那么只需要把每一个样本的值进行累加就可以了，即：

关于交叉熵误差函数更加详细的介绍可以看这个教程。

logistic(z)函数实现了Logistic函数，cost(y, t)函数实现了损失函数，nn(x, w)实现了神经网络的输出结果，nn_predict(x, w)实现了神经网络的预测结果。

# Define the logistic function
def logistic(z): 
    return 1 / (1 + np.exp(-z))

# Define the neural network function y = 1 / (1 + numpy.exp(-x*w))
def nn(x, w): 
    return logistic(x.dot(w.T))

# Define the neural network prediction function that only returns
#  1 or 0 depending on the predicted class
def nn_predict(x,w): 
    return np.around(nn(x,w))
    
# Define the cost function
def cost(y, t):
    return - np.sum(np.multiply(t, np.log(y)) + np.multiply((1-t), np.log(1-y)))

# Plot the cost in function of the weights
# Define a vector of weights for which we want to plot the cost
nb_of_ws = 100 # compute the cost nb_of_ws times in each dimension
ws1 = np.linspace(-5, 5, num=nb_of_ws) # weight 1
ws2 = np.linspace(-5, 5, num=nb_of_ws) # weight 2
ws_x, ws_y = np.meshgrid(ws1, ws2) # generate grid
cost_ws = np.zeros((nb_of_ws, nb_of_ws)) # initialize cost matrix
# Fill the cost matrix for each combination of weights
for i in range(nb_of_ws):
    for j in range(nb_of_ws):
        cost_ws[i,j] = cost(nn(X, np.asmatrix([ws_x[i,j], ws_y[i,j]])) , t)
# Plot the cost function surface
plt.contourf(ws_x, ws_y, cost_ws, 20, cmap=cm.pink)
cbar = plt.colorbar()
cbar.ax.set_ylabel("$xi$", fontsize=15)
plt.xlabel("$w_1$", fontsize=15)
plt.ylabel("$w_2$", fontsize=15)
plt.title("Cost function surface")
plt.grid()
plt.show()

梯度下降优化损失函数

梯度下降算法的工作原理是损失函数ξ对于每一个参数的求导，然后沿着负梯度方向进行参数更新。

参数w按照一定的学习率沿着负梯度方向更新，即w(k+1)=w(k)−Δw(k+1)，其中Δw可以表示为：

对于每个训练样本i，∂ξi/∂w计算如下：

其中，yi=σ(zi)是神经元的Logistic输出，zi=xi∗wT是神经元的输入。

在详细推导损失函数对于权重的导数之前，我们先这个教程中摘取几个推导。

参考上面的分步推导，我们可以得到下面的详细推导：

因此，对于每个权重的更新Δwj可以表示为：

在批处理中，我们需要将N个样本的梯度都进行累加，即：

在开始梯度下降算法之前，你需要对参数都进行一个随机数赋值过程，然后采用梯度下降算法更新参数，直至收敛。

gradient(w, x, t)函数实现了梯度∂ξ/∂w，delta_w(w_k, x, t, learning_rate)函数实现了Δw。

# define the gradient function.
def gradient(w, x, t):
    return (nn(x, w) - t).T * x

# define the update function delta w which returns the 
#  delta w for each weight in a vector
def delta_w(w_k, x, t, learning_rate):
    return learning_rate * gradient(w_k, x, t)

梯度下降更新

我们在训练集X上面运行10次去做预测，下图中画出了前三次的结果，图中蓝色的点表示在第k次，w(k)的值。

# Set the initial weight parameter
w = np.asmatrix([-4, -2])
# Set the learning rate
learning_rate = 0.05

# Start the gradient descent updates and plot the iterations
nb_of_iterations = 10  # Number of gradient descent updates
w_iter = [w]  # List to store the weight values over the iterations
for i in range(nb_of_iterations):
    dw = delta_w(w, X, t, learning_rate)  # Get the delta w update
    w = w-dw  # Update the weights
    w_iter.append(w)  # Store the weights for plotting

# Plot the first weight updates on the error surface
# Plot the error surface
plt.contourf(ws_x, ws_y, cost_ws, 20, alpha=0.9, cmap=cm.pink)
cbar = plt.colorbar()
cbar.ax.set_ylabel("cost")

# Plot the updates
for i in range(1, 4): 
    w1 = w_iter[i-1]
    w2 = w_iter[i]
    # Plot the weight-cost value and the line that represents the update
    plt.plot(w1[0,0], w1[0,1], "bo")  # Plot the weight cost value
    plt.plot([w1[0,0], w2[0,0]], [w1[0,1], w2[0,1]], "b-")
    plt.text(w1[0,0]-0.2, w1[0,1]+0.4, "$w({})$".format(i), color="b")
w1 = w_iter[3]  
# Plot the last weight
plt.plot(w1[0,0], w1[0,1], "bo")
plt.text(w1[0,0]-0.2, w1[0,1]+0.4, "$w({})$".format(4), color="b") 
# Show figure
plt.xlabel("$w_1$", fontsize=15)
plt.ylabel("$w_2$", fontsize=15)
plt.title("Gradient descent updates on cost surface")
plt.grid()
plt.show()

训练结果可视化

下列代码，我们将训练的结果进行可视化。

# Plot the resulting decision boundary
# Generate a grid over the input space to plot the color of the
#  classification at that grid point
nb_of_xs = 200
xs1 = np.linspace(-4, 4, num=nb_of_xs)
xs2 = np.linspace(-4, 4, num=nb_of_xs)
xx, yy = np.meshgrid(xs1, xs2) # create the grid
# Initialize and fill the classification plane
classification_plane = np.zeros((nb_of_xs, nb_of_xs))
for i in range(nb_of_xs):
    for j in range(nb_of_xs):
        classification_plane[i,j] = nn_predict(np.asmatrix([xx[i,j], yy[i,j]]) , w)
# Create a color map to show the classification colors of each grid point
cmap = ListedColormap([
        colorConverter.to_rgba("r", alpha=0.30),
        colorConverter.to_rgba("b", alpha=0.30)])

# Plot the classification plane with decision boundary and input samples
plt.contourf(xx, yy, classification_plane, cmap=cmap)
plt.plot(x_red[:,0], x_red[:,1], "ro", label="target red")
plt.plot(x_blue[:,0], x_blue[:,1], "bo", label="target blue")
plt.grid()
plt.legend(loc=2)
plt.xlabel("$x_1$", fontsize=15)
plt.ylabel("$x_2$", fontsize=15)
plt.title("red vs. blue classification boundary")
plt.show()

完整代码，点击这里

作者：chen_h
微信号 & QQ：862251340
简书地址：https://www.jianshu.com/p/d94...

CoderPai 是一个专注于算法实战的平台，从基础的算法到人工智能算法都有设计。如果你对算法实战感兴趣，请快快关注我们吧。加入AI实战微信群，AI实战QQ群，ACM算法微信群，ACM算法QQ群。长按或者扫描如下二维码，关注 “CoderPai” 微信号（coderpai）

文章版权归作者所有，未经允许请勿转载,若此文章存在违规行为，您可以联系管理员删除。

转载请注明本文地址：https://www.ucloud.cn/yun/41174.html

Logistic分类函数

摘要：对于多分类问题，我们使用函数来处理多项式回归。概率方程表示输出根据函数得到的值。最大似然估计可以写成因为对于给定的参数，去产生和，根据联合概率我们又能将似然函数改写成。作者：chen_h微信号 & QQ：862251340微信公众号：coderpai简书地址：https://www.jianshu.com/p/abc... 这篇教程是翻译Peter Roelants写的神经网络教程...

XBaron 2019-07-30 15:18 评论0 收藏0
Softmax分类函数

摘要：对于多分类问题，我们可以使用多项回归，该方法也被称之为函数。函数的交叉熵损失函数的推导损失函数对于的导数求解如下上式已经求解了当和的两种情况。最终的结果为，这个求导结果和函数的交叉熵损失函数求导是一样的，再次证明函数是函数的一个扩展板。作者：chen_h微信号 & QQ：862251340微信公众号：coderpai简书地址：https://www.jianshu.com/p/8eb...

BicycleWarrior 2019-07-30 15:19 评论0 收藏0
（一）神经网络入门之线性回归

摘要：神经网络的模型结构为，其中是输入参数，是权重，是预测结果。损失函数我们定义为对于损失函数的优化，我们采用梯度下降，这个方法是神经网络中常见的优化方法。函数实现了神经网络模型，函数实现了损失函数。作者：chen_h微信号 & QQ：862251340微信公众号：coderpai简书地址：https://www.jianshu.com/p/0da... 这篇教程是翻译Peter Roe...

lx1036 2019-07-30 15:18 评论0 收藏0
（三）神经网络入门之隐藏层设计

摘要：在这个教程中，我们也将设计一个二分类神经网络模型，其中输入数据是一个维度，隐藏层只有一个神经元，并且使用非线性函数作为激活函数，模型结构能用图表示为我们先导入教程需要使用的软件包。作者：chen_h微信号 & QQ：862251340微信公众号：coderpai简书地址：https://www.jianshu.com/p/8e1... 这篇教程是翻译Peter Roelants写的...

kun_jian 2019-07-30 15:19 评论0 收藏0