TensorFlow MNIST 예제

Classic 환경에서 이용 가능합니다.

TensorFlow MNIST 예제

여기에서는 TensorFlow 홈페이지에서 제공하는 TensorFlow 초보자를 위한 MNIST 기초 예제 및 전문가를 위한 MNIST 고급 예제를 다루고 있습니다.

MNIST 데이터셋을 이용하여 Softmax 회귀 및 CNN 분류 모델을 만들어 모델이 이미지 데이터를 사용해 숫자를 얼마나 잘 예측하는지 모델 성능을 평가해 볼 것입니다.

각각의 개념이나 용어는 예제 코드를 이해하는 데 필요한 수준으로만 설명합니다. 정확한 이해를 위해서는 Machine Learning 및 Deep Learning에 대한 별도의 학습이 필요합니다.

MNIST 데이터셋 설명

MNIST 데이터셋은 아래와 같이 손으로 쓴 숫자 이미지를 벡터로 나타낸 images와 그 이미지가 의미하는 바를 나타내는 labels로 이루어져 있습니다. 아래 이미지의 라벨은 각각 5, 0, 4, 1이며, 라벨은 0~9까지 10개의 고유한 값으로 이루어져 있습니다.

MNIST 데이터셋은 또한 55,000개의 학습 데이터(mnist.train), 10,000개의 테스트 데이터(mnist.test), 5,000개의 검증용 데이터(mnist.validation)로 이루어져 있으며, 각각은 위에서 설명한 images와 labels로 다시 나뉘어 있습니다.

한 개의 이미지는 28x28(=784)픽셀로 이루어져 있기 때문에 이는 784차원의 벡터로 저장되어 있고 784차원에는 진하기의 정도에 따라 0~1 사이의 값이 들어 있습니다.

아래 코드를 통해 TensorFlow에서 제공하는 데이터를 다운로드하여 data 폴더에 저장합니다.
'one_hot=True' 옵션(one hot encoding)을 사용하여 label을 0~9 사이의 숫자값 하나로 정의하지 않고 10차원 벡터로 정의합니다. one hot encoding 데이터에 대해서는 아래에서 예제를 통해 다시 설명하도록 하겠습니다.

""" TensorFlow 패키지 import : 이후 tf로 사용하면 됩니다. """
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt


""" 데이터 다운로드 및 로드
TensorFlow에서 제공하는 MNIST 데이터 파일 4개를 다운로드하여 data 폴더에 저장하고 읽어옵니다.
최초 실행 시에만 데이터를 다운로드하고, 두 번째 이후부터는 저장된 데이터를 읽어 오기만 하기 때문에 시간이 단축됩니다."""
from tensorflow.examples.tutorials.mnist import input_data
%time mnist = input_data.read_data_sets("data/", one_hot=True)  # %time을 통해 전체 실행 시간을 남길 수 있습니다.


    Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
    Extracting data/train-images-idx3-ubyte.gz
    Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
    Extracting data/train-labels-idx1-ubyte.gz
    Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
    Extracting data/t10k-images-idx3-ubyte.gz
    Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
    Extracting data/t10k-labels-idx1-ubyte.gz
    CPU times: user 447 ms, sys: 454 ms, total: 901 ms
    Wall time: 36.1 s

아래의 코드를 통해 데이터를 확인해 보면, images는 28x28픽셀을 나타내는 784차원 벡터로 되어 있고, labels는 'one_hot=True' 옵션(one hot encoding)을 사용하여 데이터를 읽었기 때문에 '7'이라는 라벨을 '[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]'로 나타내고 있음을 확인할 수 있습니다(0은 [ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.], 1은 [ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.], 2는 [ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]로 나타냄).

# images/labels 데이터 구조 확인
print 'train 데이터셋(55,000건):', mnist.train.images.shape, mnist.train.labels.shape
print 'test 데이터셋(10,000건):', mnist.test.images.shape, mnist.test.labels.shape
print 'validation 데이터셋(5,000건):', mnist.validation.images.shape, mnist.validation.labels.shape

# 샘플 이미지 데이터 확인
print '\nlabel :', mnist.train.labels[0]
label = np.argmax(mnist.train.labels[0])  # 가장 큰 값(즉 1이 있는 곳)

im = np.reshape(mnist.train.images[0], [28,28])
plt.imshow(im, cmap='Greys')
plt.title('label:' + str(label))
plt.show()

    train 데이터셋(55,000건): (55000, 784) (55000, 10)
    test 데이터셋(10,000건): (10000, 784) (10000, 10)
    validation 데이터셋(5,000건): (5000, 784) (5000, 10)

    label: [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]

회귀 모델

본 예제 코드는 TensorFlow에서 TensorFlow 초보자를 위해 제공하는 MNIST 기초 예제를 다루고 있습니다.
회귀 모델을 만들어 훈련시킨 후 label을 예측하고 모델의 정확도를 구해볼 것입니다.

Implementing the Regression

이미지와 정답 레이블을 담을 placeholder와 학습 결과인 가중치(weight)와 바이어스(bias)를 담을 Variable을 정의하고 Softmax Regression 모델을 정의합니다.

""" placeholder 정의 : 데이터가 들어 갈 곳
이미지와 정답 레이블용 2차원 tensor를 만든다.
None은 어떤 length도 가능함을 의미한다. """
# 이미지 데이터용 placeholder
x = tf.placeholder(tf.float32, [None, 784])
# 정답 레이블용 placeholder
y_ = tf.placeholder(tf.float32, [None, 10])

""" Variable 정의 : 학습 결과가 저장될 가중치(weight)와 바이어스(bias) """
# 0으로 초기화 함
W = tf.Variable(tf.zeros([784, 10])) # w는 784차원의 이미지 벡터를 곱해, 10차원(one hot encoding된 0~9)의 결과를 내기 위한 것
b = tf.Variable(tf.zeros([10]))      # b는 결과에 더해야 하므로 10차원

""" 모델 정의 : Softmax Regression
10개의 값 중 가장 확률이 높은 것을 고르기 위해 Softmax 사용 """
y = tf.nn.softmax(tf.matmul(x, W) + b)

Training

모델 훈련에 필요한 Loss 함수와 학습율(Learning Rate)을 정의하고 100개씩 샘플링하여 모델을 1000회 학습시킵니다.
샘플링 데이터 수를 늘리면 정확도가 올라갈 수는 있지만 학습 시간이 증가합니다.
랜덤 샘플링한 작은 배치로 학습하는 것을 Stochastic Training이라고 하며, 비용이 싸고 비슷한 결과를 낼 수 있어서 많이 사용됩니다.

""" 모델 훈련 """
# Loss 함수 정의
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
# learning rate을 0.5로 정의
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# 세션 시작 전에 모든 변수를 초기화
init = tf.global_variables_initializer()

sess = tf.Session()
sess.run(init)

# 100개씩 샘플링하여 1000회 학습 진행
for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)  # 학습 데이터셋에서 무작위로 샘플링한 100개의 데이터로 구성된 'batch'를 가져옴
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})    # placeholder x, y_에 샘플링된 batch_xs, batch_ys를 공급함

Evaluating Model

tf.argmax를 통해 가장 높은 확률의 label을 구하고
tf.equal을 통해 예측값(y)과 정답(y_)이 같은 것을 구하도록 correct_prediction과 accuracy tensor를 정의합니다.

모델을 평가하기 위해 test 데이터를 이용해서 정확도를 구합니다.
아래에서는 0.9163로 약 91%의 정확도가 나왔으며, 모델을 다시 훈련시킬 때마다 결과가 조금씩 달라질 수 있습니다.

""" 모델 평가 """
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# 정확도
print sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})

    0.915

# 분류 결과 확인
correct_vals = sess.run(correct_prediction,
                        feed_dict={x: mnist.test.images, y_: mnist.test.labels})
pred_vals = sess.run(y, feed_dict={x: mnist.test.images} )

print '전체 테스트 데이터', len(correct_vals), '중에 정답수:', len(correct_vals[correct_vals == True]), \
      ', 오답수:', len(correct_vals[correct_vals == False])


# 정확히 분류된 이미지 3개만 확인
fig = plt.figure(figsize=(10,3))
img_cnt = 0
for i, cv in enumerate(correct_vals):
    if cv==True:  # 정상 분류
        img_cnt +=1
        ax = fig.add_subplot(1,3,img_cnt)
        im = np.reshape(mnist.test.images[i], [28,28])
        label = np.argmax(mnist.test.labels[i])
        pred_label = np.argmax(pred_vals[i])
        ax.imshow(im, cmap='Greys')
        ax.text(2, 2, 'true label=' + str(label) + ', pred label=' + str(pred_label))

    if img_cnt == 3:  # 3개만 확인
        break
plt.show()

    전체 테스트 데이터 10000 중에 정답수: 9150 , 오답수: 850

# 잘못 분류된 이미지 3개만 확인
fig = plt.figure(figsize=(10,3))
img_cnt = 0
for i, cv in enumerate(correct_vals):
    if cv==False:  # 잘못 분류
        img_cnt +=1
        ax = fig.add_subplot(1,3,img_cnt)
        im = np.reshape(mnist.test.images[i], [28,28])
        label = np.argmax(mnist.test.labels[i])
        pred_label = np.argmax(pred_vals[i])
        ax.imshow(im, cmap='Greys')
        ax.text(2, 2, 'true label=' + str(label) + ', pred label=' + str(pred_label))

    if img_cnt == 3:  # 3개만 확인
        break
plt.show()

# 실행을 모두 마치면 Session을 닫음
sess.close()

CNN 모델

본 예제 코드는 TensorFlow에서 TensorFlow 전문가를 위해 제공하는 MNIST Deep Learning 고급 예제를 다루고 있습니다.
Deep Learning의 하나인 CNN(Convolutional Neural Network) 모델을 만들어 훈련시킨 후 label을 예측하고 모델의 정확도를 구해볼 것입니다.

가중치(weight)와 바이어스(bias) 초기화

대칭성을 깨뜨리고 gradient가 0이 되는 것을 막기 위해 약간의 noise를 줘서 가중치를 초기화하고,
ReLU neuron을 사용하기 때문에 죽은 뉴런이 되는 것을 막기 위해 바이어스를 작은 양수값인 0.1로 초기화합니다.

""" 가중치 초기화 """
def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)

""" Bias 초기화 """
def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

Convolution과 Pooling 정의

아래 코드를 통해 Convolution Layer의 stride를 1로 설정하고, 출력 크기가 입력과 같게 되도록 하기 위해 패딩을 0으로 설정합니다.
풀링은 2x2 크기의 맥스 풀링을 적용하고 stride를 2로 설정합니다.

""" Convolution 정의 """
def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

""" Pooling 정의 """
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

Convolutional Layer 정의

아래 코드는 28x28 이미지가 두 번의 Convolutional Layer를 거치면서 7x7 이미지로 변하는 모습을 보여주고 있습니다.

즉, 28x28 이미지가 첫 번째 Convolutional Layer(5x5 필터, 스트라이드 1)를 거치면서 24x24가 되고 첫 번째 Max Pooling 상하좌우 패딩 후 28x28, 스트라이드 2를 거치면서 14x14가 됩니다.

그리고 두 번째 Convolutional Layer(5x5 필터, 스트라이드 1)를 거치면서 10x10이 되고
두 번째 Max Pooling 상하좌우 패딩 후 14x14, 스트라이드 2를 거치면서 7x7이 됩니다.

# 입력 데이터를 4D 텐서로 재정의
# 두 번째/세 번째 파라미터는 이미지의 가로/세로 길이
# 마지막 파라미터 컬러 채널의 수는 흑백 이미지이므로 1임
x_image = tf.reshape(x, [-1,28,28,1])

""" First Convolutional Layer 정의 """
# 가중치 텐서 정의(patch size, patch size, input channel, output channel).
# 5x5의 윈도우(patch라고도 함) 크기를 가지는 32개의 feature(kernel, filter)를 사용
# 흑백 이미지이므로 input channel은 1임
W_conv1 = weight_variable([5, 5, 1, 32])
# 바이어스 텐서 정의
b_conv1 = bias_variable([32])
# x_image와 가중치 텐서에 합성곱을 적용하고, 바이어스을 더한 뒤 ReLU 함수를 적용
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
# 출력값을 구하기 위해 맥스 풀링을 적용
h_pool1 = max_pool_2x2(h_conv1)

""" Second Convolutional Layer 정의 """
# 가중치 텐서 정의(patch size, patch size, input channel, output channel)
# 5x5의 윈도우(patch라고도 함) 크기를 가지는 64개의 feature를 사용
# 이전 레이어의 output channel의 크기가 32가 여기에서는 input channel이 됨
W_conv2 = weight_variable([5, 5, 32, 64])
# 바이어스 텐서 정의
b_conv2 = bias_variable([64])
# First Convolutional Layer의 출력값인 h_pool1과 가중치 텐서에 합성곱을 적용하고, 바이어스을 더한 뒤 ReLU 함수를 적용
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
# 출력값을 구하기 위해 맥스 풀링을 적용
h_pool2 = max_pool_2x2(h_conv2)

""" 완전 연결 레이어(Fully-Connected Layer) 정의 """
#  7×7 크기의 64개 필터. 임의로 선택한 뉴런의 갯수(여기서는 1024)
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

""" Dropout 정의 """
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

""" 최종 소프트맥스 계층 정의 """
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

모델 훈련 및 평가

아래 코드를 통해 모델을 훈련시키고 정확도를 평가해 보면, 첫 번째 회귀 모델보다 정확도가 높음을 확인할 수 있습니다.

TensorBoard를 확인하려면 웹 브라우저를 띄워 [공인 IP 주소:18888]로 접속하시면 됩니다.
접속이 안되는 경우 터미널로 서버에 접속하여 'jup tb-start' 명령어로 TensorBoard 프로세스를 시작해 주십시오('TensorBoard 프로세스 관리' 참고).

# 모델 훈련 및 평가
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    # 100개씩 샘플링하여 2000회 학습 진행
    for i in range(2000):
        batch = mnist.train.next_batch(100)
        if i % 100 == 0:
            train_accuracy = accuracy.eval(feed_dict={
                x: batch[0], y_: batch[1], keep_prob: 1.0})
            print 'step %d, training accuracy %g' % (i, train_accuracy)
        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

    print 'test accuracy %g' % accuracy.eval(feed_dict={
        x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

    step 0, training accuracy 0.17
    step 100, training accuracy 0.81
    step 200, training accuracy 0.93
    step 300, training accuracy 0.89
    step 400, training accuracy 0.92
    step 500, training accuracy 0.95
    step 600, training accuracy 0.99
    step 700, training accuracy 0.98
    step 800, training accuracy 0.93
    step 900, training accuracy 0.94
    step 1000, training accuracy 0.97
    step 1100, training accuracy 0.95
    step 1200, training accuracy 0.97
    step 1300, training accuracy 0.99
    step 1400, training accuracy 0.95
    step 1500, training accuracy 0.98
    step 1600, training accuracy 0.97
    step 1700, training accuracy 0.96
    step 1800, training accuracy 0.98
    step 1900, training accuracy 0.97
    test accuracy 0.9795