利用 tf.gradients 在 TensorFlow 中实现梯度下降

ckllj 发布于2019-07-30 15:10 / 669人阅读

摘要：使用内置的优化器对数据集进行回归在使用实现梯度下降之前，我们先尝试使用的内置优化器比如来解决数据集分类问题。使用对数据集进行回归通过梯度下降公式，权重的更新方式如下为了实现梯度下降，我将不使用优化器的代码，而是采用自己写的权重更新。

作者：chen_h
微信号 & QQ：862251340
微信公众号：coderpai
简书地址：http://www.jianshu.com/p/13e0...

我喜欢 TensorFlow 的其中一个原因是它可以自动的计算函数的梯度。我们只需要设计我们的函数，然后去调用 tf.gradients 函数就可以了。是不是非常简单。

接下来让我们来举个例子，具体说明一下。

使用 TensorFlow 内置的优化器对 MNIST 数据集进行 softmax 回归

在使用 tf.gradients 实现梯度下降之前，我们先尝试使用 TensorFlow 的内置优化器（比如 GradientDescentOptimizer）来解决MNIST数据集分类问题。

import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

# Parameters
learning_rate = 0.01
training_epochs = 10
batch_size = 100
display_step = 1


# tf Graph Input
x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes

# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

# Construct model
pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax

# Minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Start training
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Fit training using batch data
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_xs,
                                                       y: batch_ys})
            
#             print(__w)
            
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
#             print(sess.run(W))
            print ("Epoch:", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost))

    print ("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy for 3000 examples
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print ("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]}))
    
    
#### Output
    
# Extracting /tmp/data/train-images-idx3-ubyte.gz
# Extracting /tmp/data/train-labels-idx1-ubyte.gz
# Extracting /tmp/data/t10k-images-idx3-ubyte.gz
# Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
# Epoch: 0001 cost= 1.184285608
# Epoch: 0002 cost= 0.665428013
# Epoch: 0003 cost= 0.552858426
# Epoch: 0004 cost= 0.498728328
# Epoch: 0005 cost= 0.465593693
# Epoch: 0006 cost= 0.442609185
# Epoch: 0007 cost= 0.425552949
# Epoch: 0008 cost= 0.412188290
# Epoch: 0009 cost= 0.401390140
# Epoch: 0010 cost= 0.392354651
# Optimization Finished!
# Accuracy: 0.873333

所以，我们在这里做的是利用内置的优化器来计算损失值。如果我们想自己计算渐变过程和更新权重，那应该怎么办？这就是 tf.gradients 的作用了。

使用 tf.gradients 对MNIST数据集进行 softmax 回归

通过梯度下降公式，权重的更新方式如下：

为了实现梯度下降，我将不使用优化器的代码，而是采用自己写的权重更新。

因为这里有权重矩阵 w 和偏差项矩阵 b，所以我们需要去计算这些矩阵的梯度。所以实现的代码如下：

# Computing the gradient of cost with respect to W and b
grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost)

# Gradient Step
new_W = W.assign(W - learning_rate * grad_W)
new_b = b.assign(b - learning_rate * grad_b)

这三行代码只是替代前面的一行代码，干嘛给自己造成这么大的麻烦呢？因为如果你需要自己的损失函数的梯度，并且你不想编写严格的数学函数，那么 TensorFlow 就可以帮助你了。

我们已经构建好了计算图，所以接下来我们只需要在会话中运行这个计算图就行了。让我来试试吧。

# Fit training using batch data
            _, _,  c = sess.run([new_W, new_b ,cost], feed_dict={x: batch_xs, y: batch_ys})

我们不需要 new_W 和 new_b 的输出，所以我忽略了这些变量。

完整代码如下：

import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

# Parameters
learning_rate = 0.01
training_epochs = 10
batch_size = 100
display_step = 1

# Parameters
learning_rate = 0.01
training_epochs = 10
batch_size = 100
display_step = 1

# tf Graph Input
x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes

# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

# Construct model
pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax

# Minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))

grad_W, grad_b = tf.gradients(xs=[W, b], ys=cost)


new_W = W.assign(W - learning_rate * grad_W)
new_b = b.assign(b - learning_rate * grad_b)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Fit training using batch data
            _, _,  c = sess.run([new_W, new_b ,cost], feed_dict={x: batch_xs,
                                                       y: batch_ys})
            
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
#             print(sess.run(W))
            print ("Epoch:", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost))

    print ("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy for 3000 examples
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print ("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]}))
    
    
# Output
# Epoch: 0001 cost= 1.183741399
# Epoch: 0002 cost= 0.665312284
# Epoch: 0003 cost= 0.552796521
# Epoch: 0004 cost= 0.498697014
# Epoch: 0005 cost= 0.465521633
# Epoch: 0006 cost= 0.442611256
# Epoch: 0007 cost= 0.425528946
# Epoch: 0008 cost= 0.412203073
# Epoch: 0009 cost= 0.401364554
# Epoch: 0010 cost= 0.392398663
# Optimization Finished!
# Accuracy: 0.874

使用梯度公式的 softmax 回归

我们对于权重 w 的梯度处理如下：

如前所示，不使用 tf.gradients 或使用 TensorFlow 的内置优化器，这样可以实现梯度方程。完整代码如下：

import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

# Parameters
learning_rate = 0.01
training_epochs = 10
batch_size = 100
display_step = 1

# Parameters
learning_rate = 0.01
training_epochs = 10
batch_size = 100
display_step = 1

# tf Graph Input
x = tf.placeholder(tf.float32, [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder(tf.float32, [None, 10]) # 0-9 digits recognition => 10 classes

# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

# Construct model
pred = tf.nn.softmax(tf.matmul(x, W)) # Softmax

# Minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))


W_grad =  - tf.matmul ( tf.transpose(x) , y - pred) 
b_grad = - tf.reduce_mean( tf.matmul(tf.transpose(x), y - pred), reduction_indices=0)

new_W = W.assign(W - learning_rate * W_grad)
new_b = b.assign(b - learning_rate * b_grad)

init = tf.global_variables_initializer()


with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Fit training using batch data
            _, _, c = sess.run([new_W, new_b, cost], feed_dict={x: batch_xs, y: batch_ys})
            
        
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print ("Epoch:", "%04d" % (epoch+1), "cost=", "{:.9f}".format(avg_cost))

    print ("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy for 3000 examples
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print ("Accuracy:", accuracy.eval({x: mnist.test.images[:3000], y: mnist.test.labels[:3000]}))
    
    
# Output
# Extracting /tmp/data/train-images-idx3-ubyte.gz
# Extracting /tmp/data/train-labels-idx1-ubyte.gz
# Extracting /tmp/data/t10k-images-idx3-ubyte.gz
# Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
# Epoch: 0001 cost= 0.432943137
# Epoch: 0002 cost= 0.330031527
# Epoch: 0003 cost= 0.313661941
# Epoch: 0004 cost= 0.306443773
# Epoch: 0005 cost= 0.300219418
# Epoch: 0006 cost= 0.298976618
# Epoch: 0007 cost= 0.293222957
# Epoch: 0008 cost= 0.291407861
# Epoch: 0009 cost= 0.288372261
# Epoch: 0010 cost= 0.286749691
# Optimization Finished!
# Accuracy: 0.898

Tensorflow 是如何计算梯度的？

你可以在思考，TensorFlow是如何计算函数的梯度？

TensorFlow 使用的是一种称为 Automatic Differentiation 的方法，具体你可以查看 Wikipedia。

我希望这篇文章对你有帮会帮助。

算法直播课：请点击这里

作者：chen_h
微信号 & QQ：862251340
简书地址：http://www.jianshu.com/p/13e0...

CoderPai 是一个专注于算法实战的平台，从基础的算法到人工智能算法都有设计。如果你对算法实战感兴趣，请快快关注我们吧。加入AI实战微信群，AI实战QQ群，ACM算法微信群，ACM算法QQ群。长按或者扫描如下二维码，关注 “CoderPai” 微信号（coderpai）

云服务器 GPU云服务器 fec在webrtc中实现在浏览器中实现webrtc视频传输自适应梯度下降算法中实现单点登录

文章版权归作者所有，未经允许请勿转载,若此文章存在违规行为，您可以联系管理员删除。

转载请注明本文地址：https://www.ucloud.cn/yun/41085.html

OpenAI开源TF梯度替换插件，十倍模型计算时间仅增加20%

摘要：训练深度神经网络需要大量的内存，用户使用这个工具包，可以在计算时间成本仅增加的基础上，在上运行规模大倍的前馈模型。使用导入此功能，与使用方法相同，使用梯度函数来计算参数的损失梯度。随后，在反向传播中重新计算检查点之间的节点。 OpenAI是电动汽车制造商特斯拉创始人 Elon Musk和著名的科技孵化器公司 Y Combinator总裁 Sam Altman于 2016年联合创立的 AI公司...

GraphQuery 2019-04-25 18:23 评论0 收藏0
WGAN最新进展：从weight clipping到gradient penalty

摘要：前面两个期望的采样我们都熟悉，第一个期望是从真样本集里面采，第二个期望是从生成器的噪声输入分布采样后，再由生成器映射到样本空间。 Wasserstein GAN进展：从weight clipping到gradient penalty，更加先进的Lipschitz限制手法前段时间，Wasserstein GAN以其精巧的理论分析、简单至极的算法实现、出色的实验效果，在GAN研究圈内掀起了一阵...

陈江龙 2019-04-25 18:13 评论0 收藏0
使用 LSTM 智能作诗送新年祝福

摘要：经过第一步的处理已经把古诗词词语转换为可以机器学习建模的数字形式，因为我们采用算法进行古诗词生成，所以还需要构建输入到输出的映射处理。 LSTM 介绍序列化数据即每个样本和它之前的样本存在关联，前一数据和后一个数据有顺序关系。深度学习中有一个重要的分支是专门用来处理这样的数据的——循环神经网络。循环神经网络广泛应用在自然语言处理领域(NLP)，今天我们带你从一个实际的例子出发，介绍循...

lauren_liuling 2019-06-26 18:56 评论0 收藏0