资讯专栏INFORMATION COLUMN

TensorFlow学习笔记(5):基于MNIST数据的卷积神经网络CNN

ad6623 / 2892人阅读

摘要:前言本文基于官网的写成。这个数据集可以从牛的不行的教授的网站获取。本文将使用卷积神经网络以获得更高的准确率。对于不存在这个问题,在最小点损失函数的梯度变为,因此可以使用固定的。

前言

本文基于TensorFlow官网的Tutorial写成。输入数据是MNIST,全称是Modified National Institute of Standards and Technology,是一组由这个机构搜集的手写数字扫描文件和每个文件对应标签的数据集,经过一定的修改使其适合机器学习算法读取。这个数据集可以从牛的不行的Yann LeCun教授的网站获取。

本系列文章的这一篇对这份数据集使用了softmax regression,在测试集上取得了接近92%的准确率。本文将使用卷积神经网络以获得更高的准确率。关于CNN的理论知识,可以参考这篇文章

代码
#!/usr/bin/env python
# -*- coding=utf-8 -*-
# @author: 陈水平
# @date: 2017-02-04
# @description: implement a CNN model upon MNIST handwritten digits
# @ref: http://yann.lecun.com/exdb/mnist/

import gzip
import struct
import numpy as np
from sklearn import preprocessing
import tensorflow as tf

# MNIST data is stored in binary format, 
# and we transform them into numpy ndarray objects by the following two utility functions
def read_image(file_name):
    with gzip.open(file_name, "rb") as f:
        buf = f.read()
        index = 0
        magic, images, rows, columns = struct.unpack_from(">IIII" , buf , index)
        index += struct.calcsize(">IIII")

        image_size = ">" + str(images*rows*columns) + "B"
        ims = struct.unpack_from(image_size, buf, index)
        
        im_array = np.array(ims).reshape(images, rows, columns)
        return im_array

def read_label(file_name):
    with gzip.open(file_name, "rb") as f:
        buf = f.read()
        index = 0
        magic, labels = struct.unpack_from(">II", buf, index)
        index += struct.calcsize(">II")
        
        label_size = ">" + str(labels) + "B"
        labels = struct.unpack_from(label_size, buf, index)

        label_array = np.array(labels)
        return label_array

print "Start processing MNIST handwritten digits data..."
train_x_data = read_image("MNIST_data/train-images-idx3-ubyte.gz")  # shape: 60000x28x28
train_x_data = train_x_data.reshape(train_x_data.shape[0], train_x_data.shape[1], train_x_data.shape[2], 1).astype(np.float32)
train_y_data = read_label("MNIST_data/train-labels-idx1-ubyte.gz")  
test_x_data = read_image("MNIST_data/t10k-images-idx3-ubyte.gz")  # shape: 10000x28x28
test_x_data = test_x_data.reshape(test_x_data.shape[0], test_x_data.shape[1], test_x_data.shape[2], 1).astype(np.float32)
test_y_data = read_label("MNIST_data/t10k-labels-idx1-ubyte.gz")

train_x_minmax = train_x_data / 255.0
test_x_minmax = test_x_data / 255.0

# Of course you can also use the utility function to read in MNIST provided by tensorflow
# from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)
# train_x_minmax = mnist.train.images
# train_y_data = mnist.train.labels
# test_x_minmax = mnist.test.images
# test_y_data = mnist.test.labels

# Reformat y into one-hot encoding style
lb = preprocessing.LabelBinarizer()
lb.fit(train_y_data)
train_y_data_trans = lb.transform(train_y_data)
test_y_data_trans = lb.transform(test_y_data)

print "Start evaluating CNN model by tensorflow..."

# Model input
x = tf.placeholder(tf.float32, shape=[None, 28, 28, 1])
y_ = tf.placeholder(tf.float32, [None, 10])

# Weight initialization
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

# Convolution and Pooling
def conv2d(x, W):
    # `tf.nn.conv2d()` computes a 2-D convolution given 4-D `input` and `filter` tensors
    # input tensor shape `[batch, in_height, in_width, in_channels]`, batch is number of observation 
    # filter tensor shape `[filter_height, filter_width, in_channels, out_channels]`
    # strides: the stride of the sliding window for each dimension of input.
    # padding: "SAME" or "VALID", determine the type of padding algorithm to use
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")

def max_pool_2x2(x):
    # `tf.nn.max_pool` performs the max pooling on the input
    #  ksize: the size of the window for each dimension of the input tensor.
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")


# First convolutional layer
# Convolution: compute 32 features for each 5x5 patch
# Max pooling: reduce image size to 14x14.
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])

h_conv1 = tf.nn.relu(conv2d(x,  W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# Second convolutional layer
# Convolution: compute 64 features for each 5x5 patch
# Max pooling: reduce image size to 7x7
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# Densely connected layer
# Fully-conected layer with 1024 neurons
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])

h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout
# To reduce overfitting, we apply dropout before the readout layer.
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Readout layer
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])

y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

# Train and evaluate
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
# y = tf.nn.softmax(y_conv)
# loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
optimizer = tf.train.AdamOptimizer(1e-4)
# optimizer = tf.train.GradientDescentOptimizer(1e-4)
train = optimizer.minimize(loss)

correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

for step in range(20000):
    sample_index = np.random.choice(train_x_minmax.shape[0], 50)
    batch_xs = train_x_minmax[sample_index, :]
    batch_ys = train_y_data_trans[sample_index, :]
    if step % 100 == 0:
        train_accuracy = sess.run(accuracy, feed_dict={
            x: batch_xs, y_: batch_ys, keep_prob: 1.0})
        print "step %d, training accuracy %g" % (step, train_accuracy)
    sess.run(train, feed_dict={x: batch_xs, y_: batch_ys, keep_prob: 0.5})

print "test accuracy %g" % sess.run(accuracy, feed_dict={
    x: test_x_minmax, y_: test_y_data_trans, keep_prob: 1.0})

输出如下:

Start processing MNIST handwritten digits data...
Start evaluating CNN model by tensorflow...
step 0, training accuracy 0.1
step 100, training accuracy 0.82
step 200, training accuracy 0.92
step 300, training accuracy 0.96
step 400, training accuracy 0.92
step 500, training accuracy 0.92
step 600, training accuracy 1
step 700, training accuracy 0.94
step 800, training accuracy 0.96
step 900, training accuracy 0.96
step 1000, training accuracy 0.94
step 1100, training accuracy 0.98
step 1200, training accuracy 0.94
step 1300, training accuracy 0.98
step 1400, training accuracy 0.96
step 1500, training accuracy 1
...
step 15700, training accuracy 1
step 15800, training accuracy 0.98
step 15900, training accuracy 1
step 16000, training accuracy 1
step 16100, training accuracy 1
step 16200, training accuracy 1
step 16300, training accuracy 1
step 16400, training accuracy 1
step 16500, training accuracy 1
step 16600, training accuracy 1
step 16700, training accuracy 1
step 16800, training accuracy 1
step 16900, training accuracy 1
step 17000, training accuracy 1
step 17100, training accuracy 1
step 17200, training accuracy 1
step 17300, training accuracy 1
step 17400, training accuracy 1
step 17500, training accuracy 1
step 17600, training accuracy 1
step 17700, training accuracy 1
step 17800, training accuracy 1
step 17900, training accuracy 1
step 18000, training accuracy 1
step 18100, training accuracy 1
step 18200, training accuracy 1
step 18300, training accuracy 1
step 18400, training accuracy 1
step 18500, training accuracy 1
step 18600, training accuracy 1
step 18700, training accuracy 1
step 18800, training accuracy 1
step 18900, training accuracy 1
step 19000, training accuracy 1
step 19100, training accuracy 1
step 19200, training accuracy 1
step 19300, training accuracy 1
step 19400, training accuracy 1
step 19500, training accuracy 1
step 19600, training accuracy 1
step 19700, training accuracy 1
step 19800, training accuracy 1
step 19900, training accuracy 1
test accuracy 0.9929
思考

参数数量:第一个卷积层5x5x1x32=800个参数,第二个卷积层5x5x32x64=51200个参数,第三个全连接层7x7x64x1024=3211264个参数,第四个输出层1024x10=10240个参数,总量级为330万个参数,单机训练时间约为30分钟。

关于优化算法:随机梯度下降法的learning rate需要逐渐变小,因为随机抽取样本引入了噪音,使得我们在最小点处的随机梯度仍然不为0。对于batch gradient descent不存在这个问题,在最小点损失函数的梯度变为0,因此batch gradient descent可以使用固定的learning rate。为了让learning rate逐渐变小,有以下几种变种算法。

Momentum algorithm accumulates an exponentially decaying moving average of past gradients and continues to move in their direction.

AdaGrad adapts the learning rates of all model parameters by scaling them inversely proportional to the square root of the sum of all their historical squared values. But the accumulation of squared gradients from the beginning of training can result in a premature and excessive decrease in the effective learning rate.

RMSProp: AdaGrad is designed to converge rapidly when applied to a convex function. When applied to a non-convex function to train a neural network, the learning trajectory may pass through many different structures and eventually arrive at a region that is a locally convex bowl. AdaGrad shrinks the learning rate according to the entire history of the squared gradient and may have made the learning rate too small before arriving at such a convex structure.
RMSProp uses an exponentially decaying average to discard history from the extreme past so that it can converge rapidly after finding a convex bowl, as if it were an instance of the AdaGrad algorithm initialized within that bowl.

Adam:The name “Adam” derives from the phrase “adaptive moments.” It is a variant on the combination of RMSProp and momentum with a few important distinctions. Adam is generally regarded as being fairly robust to the choice of hyperparameters, though the learning rate sometimes needs to be changed from the suggested default.

如果将MNIST数据集的AdamOptimizer换成GradientDescentOptimizer,测试集的准确率为0.9296.

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/38394.html

相关文章

  • 深度学习

    摘要:深度学习在过去的几年里取得了许多惊人的成果,均与息息相关。机器学习进阶笔记之一安装与入门是基于进行研发的第二代人工智能学习系统,被广泛用于语音识别或图像识别等多项机器深度学习领域。零基础入门深度学习长短时记忆网络。 多图|入门必看:万字长文带你轻松了解LSTM全貌 作者 | Edwin Chen编译 | AI100第一次接触长短期记忆神经网络(LSTM)时,我惊呆了。原来,LSTM是神...

    Vultr 评论0 收藏0

发表评论

0条评论

最新活动
阅读需要支付1元查看
<