TensorFlow: A Beginner's Guide

A simple guide from linear regression to convolutional neural networks in Tensorflow

What is TensorFlow?

If you've been following the machine learning community, in particular that of deep learning, over the last year, you've probably heard of Tensorflow. Tensorflow is a library to structure and run numerical computations developed in-house by Google Brain (the people who developed Alpha-GO). One can imagine this library as an extension of NumPY to work on more scalable architectures, as well as with more detailed algorithms and methods that pertain specifically to machine learning. Tensorflow joins Theano and cuDNN as architectures for building and designing neural networks.

This article hopes to delve into Tensorflow through case studies of implementations of Neural Networks. As such, it requires advance knowledge of neural networks (the subject is too expansive to cover in a single article). For those new (and for those who need a refresher), here are some good reference materials

Installation

Tensorflow is available on PyPI, so we can simply pip install

    pip install tensorflow

Or if you have a GPU

    pip install tensorflow-gpu

More extensive installation details can be found on the Tensorflow Installation Website

How This Article Is Set Up

We follow the Theano Tutorials which build up from basic addition/multiplication all the way to convolutional neural networks

Initial Investigations

  1. Multiplication
  2. Linear Regression
  3. Logistic Regression

Neural Networks

  1. Fully-connected Feed-Forward Neural Network (FC NN)
  2. "Deep" Fully Connected Neural Network
  3. Convolutional Neural Network

First, let's get the default imports out of the way: these imports will be used throughout the guide, and others will be added later when necessary

import tensorflow as tf import numpy as np import matplotlib.pyplot as plt %matplotlib inline

Initial Investigations

Multiplication

Given two floats $x$ and $y$, find $xy$

First we create the relevant variables x and y (initializing them as floats). Placeholders can be thought of as inputs; when doing computations, we'll plug in values for x and y. We symbolize the result that we are looking for as xy.

x = tf.placeholder("float") y = tf.placeholder("float") xy = tf.mul(x,y)

Now we've represented the computation graph in Tensorflow; all that remains is to create a session, and plug in values, retrieving the result of the computation

with tf.Session() as sess: print("%f x %f = %f"%(2,3,sess.run(xy,feed_dict={x:2,y:3})))
2.000000 x 3.000000 = 6.000000

Linear Regression

Given $\{(x_1,y_1) \dots (x_n,y_n)\}$, find $w$ and $b$ such that it minimizes $$\sum (wx_i + b - y_i)^2$$

First, let's create some sample data to work with:

We model $y = 2x + \mathcal{N}(0,1)$ (there's some random noise)

trX = np.linspace(-1, 1, 500) trY = 2 * trX + np.random.randn(*trX.shape)*.35 + 2 plt.scatter(trX,trY);

Here, we again define our inputs x and y again. We define a variable w which stores the weight; variables are objects in Tensorflow which we use to represent internal states and are updatable. Again y_hat is simply our prediction

X = tf.placeholder("float") Y = tf.placeholder("float") w = tf.Variable(0.0,name="weights") b = tf.Variable(0.0, name="bias") y_hat = tf.add(tf.mul(X,w),b)

Let's now define our cost model and the underlying optimizer. Here, we opt for the squared loss objective (there are many others similar).

In order to optimize the function over $w$ and $b$, we create a GD optimizer, and minimize over the given cost function. Here we set $\alpha = .01$ (the learning rate)

cost = tf.reduce_mean(tf.square(Y - y_hat)) train_operation = tf.train.GradientDescentOptimizer(.01).minimize(cost)

Now, we simply run the train_operation, passing in our input data (this is Gradient Descent , not SGD). Since we created variables (w and b), we need to initialize them in the session with tf.initialize_all_variables().run()

numEpochs = 200 costs = [] with tf.Session() as sess: tf.initialize_all_variables().run() for i in range(numEpochs): sess.run(train_operation,feed_dict={X:trX,Y: trY}) costs.append(sess.run(cost,feed_dict={X:trX,Y: trY})) print("Final Error is %f"%costs[-1]) wfinal,bfinal = sess.run(w),sess.run(b) print("Predicting y = %.02f x + %.02f"%(wfinal,bfinal)) print("Actually is y = %.02f x + %.02f"%(2,2))
Final Error is 0.217815
Predicting  y = 1.52 x + 1.95
Actually is y = 2.00 x + 2.00
plt.plot(costs) plt.ylabel("Mean Squared Error") plt.xlabel("Epoch");

Let's try to expand this to the multivariable case, where $x \in \mathbb{R}^n$,$w \in \mathbb{R}^{n \times m}$, and where $y$ is modelled with gaussian noise as

$$ y = W^Tx + \mathcal{N}(0,I_m)$$
m = 8 n = 5 NUM_EXAMPLES = 100 W = np.random.rand(n,m) trX = np.random.rand(100,n) trY = X.dot(W) + np.random.randn(NUM_EXAMPLES,m) trX.shape, trY.shape
Out[77]:
((100, 5), (100, 8))

We again define our x and y placeholder inputs similarly; however, this time we explicitly add a shape parameter to the data. The first None is the dimension of the batch-size (variable), and the second number our actual dimension.

x = tf.placeholder("float",shape=[None, n]) y = tf.placeholder("float",shape=[None, m]) w = tf.Variable(tf.zeros([n,m])) y_hat = tf.matmul(x,w)

The rest remains the same

cost = tf.reduce_mean(tf.square(y - y_hat)) train_operation = tf.train.GradientDescentOptimizer(.01).minimize(cost)
numEpochs = 1000 costs = [] with tf.Session() as sess: tf.initialize_all_variables().run() for i in range(numEpochs): sess.run(train_operation,feed_dict={x:trX,y: trY}) costs.append(sess.run(cost,feed_dict={x:trX,y: trY})) print("Final Error is %f"%costs[-1])
Final Error is 1.073464
plt.plot(costs) plt.ylabel("Mean Squared Error") plt.xlabel("Epoch");

Logistic Regression

We shall use the MNIST dataset for this example. Conveniently, Tensorflow has a library to read the MNIST files

from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST/",one_hot=True) trX, trY = mnist.train.images, mnist.train.labels teX, teY = mnist.test.images, mnist.test.labels
Extracting MNIST/train-images-idx3-ubyte.gz
Extracting MNIST/train-labels-idx1-ubyte.gz
Extracting MNIST/t10k-images-idx3-ubyte.gz
Extracting MNIST/t10k-labels-idx1-ubyte.gz

Recall in logistic regression that we model the logit as a linear transformation of $x$, and perform MLE over $W$. Notice the MLE likelihood function is simply just the cross entropy on the logit of the linear model. Using this philosophy, we express our logistic model

X = tf.placeholder("float",shape=[None,784]) Y = tf.placeholder("float",shape=[None,10]) w = tf.Variable(tf.random_normal([784,10], stddev=0.01)) pred_logit = tf.matmul(X,w)
sample_cost = tf.nn.softmax_cross_entropy_with_logits(pred_logit,Y) total_cost = tf.reduce_mean(sample_cost) train_operation = tf.train.GradientDescentOptimizer(0.05).minimize(total_cost) predict_operation = tf.argmax(pred_logit, 1) accuracy_operation = tf.reduce_mean( tf.cast(tf.equal(predict_operation,tf.argmax(Y,1)),tf.float32) )

Let's train! We'll do batch gradient descent here to speed up training times

NUM_EPOCHS = 30 BATCH_SIZE = 200 accuracies = [] with tf.Session() as sess: tf.initialize_all_variables().run() for epoch in range(NUM_EPOCHS): for start in range(0,len(trX),BATCH_SIZE): end = start + BATCH_SIZE sess.run(train_operation, \ feed_dict = {X: trX[start:end],Y: trY[start:end]}) accuracies.append(sess.run(accuracy_operation,feed_dict= {X: teX,Y: teY}))
plt.plot(accuracies)
Out[22]:
[<matplotlib.lines.Line2D at 0x7fd8dc56ce48>]

Neural Networks

Before beginning the neural network section, we introduce some common code bases, which shall be shared by all the neural networks that follow. This allows us to abstract our code, and work in a cleaner environment. In particular, we define code to create variables (these are our parameters that we learn) initiated randomly, and code to train our model

import tqdm # Using this for dynamic updates instead of unwieldy print statements import time # Timing how long it takes an epoch to run def init_weights(shape): return tf.Variable(tf.random_normal(shape,stddev=0.01)) def update_d(prev,new): combined = prev.copy() combined.update(new) return combined def train_model(sess,train_X,train_Y, test_X,test_Y,train_operation,accuracy_operation,num_epochs,batch_size,test_size,train_feed=dict(),test_feed=dict(),howOften=100): accuracies = [] startingTime = time.time() with tqdm.tqdm(total= num_epochs * len(train_X)) as ranger: for epoch in range(num_epochs): for start in range(0,len(train_X),batch_size): end = start + batch_size sess.run(train_operation, \ feed_dict = update_d(train_feed,{X: train_X[start:end],Y: train_Y[start:end]})) ranger.update(batch_size) if (start//batch_size)%howOften == 0: testSet = np.random.choice(len(test_X),test_size,replace=False) tX,tY = test_X[testSet],test_Y[testSet] accuracies.append(sess.run(accuracy_operation,feed_dict= update_d(test_feed,{X: tX,Y: tY}))) ranger.set_description("Test Accuracy: " + str(accuracies[-1])) testSet = np.random.choice(len(test_X),test_size,replace=False) tX,tY = test_X[testSet],test_Y[testSet] accuracies.append(sess.run(accuracy_operation,feed_dict= update_d(test_feed,{X: tX,Y: tY}))) ranger.set_description("Test Accuracy: " + str(accuracies[-1])) timeTaken = time.time() - startingTime print("Finished training for %d epochs"%num_epochs) print("Took %.02f seconds (%.02f s per epoch)"%(timeTaken,timeTaken/num_epochs)) accuracies.append(sess.run(accuracy_operation,feed_dict= update_d(test_feed,{X: test_X,Y: test_Y}))) print("Final accuracy was %.04f"%accuracies[-1]) plt.plot(accuracies)

Basic Fully Connected Network

We shall use the "classic" starting neural network; which consists of the input layer, a hidden layer coupled with the sigmoid activation function, and finally an output layer, upon which we shall run softmax (paired with the cross-entropy loss). As in the previous example on logistic regression, the softmax won't be directly computed, and instead implicitly factored in through the cost function. We again shall train on the MNIST dataset

Basic Network Picture

NUM_HIDDEN = 620 X = tf.placeholder("float",shape=[None,784]) Y = tf.placeholder("float",shape=[None,10]) def init_weights(shape): # We define this out of convenience return tf.Variable(tf.random_normal(shape, stddev=0.01)) W_h = init_weights([784,NUM_HIDDEN]) # Weights entering the hidden layer W_o = init_weights([NUM_HIDDEN,10]) # Weights entering the output layer
entering_hidden = tf.matmul(X,W_h) exiting_hidden = tf.nn.sigmoid(entering_hidden) model = tf.matmul(exiting_hidden,W_o) sample_cost = tf.nn.softmax_cross_entropy_with_logits(model,Y) total_cost = tf.reduce_mean(sample_cost) train_operation = tf.train.GradientDescentOptimizer(0.2).minimize(total_cost) predict_operation = tf.argmax(model, 1) accuracy_operation = tf.reduce_mean( tf.cast(tf.equal(predict_operation,tf.argmax(Y,1)),tf.float32) )
NUM_EPOCHS = 50 BATCH_SIZE = 50 import tqdm accuracies = [] with tf.Session() as sess: tf.initialize_all_variables().run() train_model(sess,trX,trY,teX,teY,train_operation,accuracy_operation,NUM_EPOCHS,BATCH_SIZE,10000) plt.ylim(.9,1)
Test Accuracy: 0.9786: 100%|██████████| 2750000/2750000 [06:10<00:00, 7420.19it/s]58, 5741.60it/s]
Finished training for 50 epochs
Took 370.61 seconds (7.41 s per epoch)
Final accuracy was 0.9786

Out[24]:
(0.9, 1)

After training for $50$ epochs which took about $6$ minutes total to train on a laptop), we finally get to $97.86\%$ accuracy on the basic MNIST dataset. Next, we revitalize this classic example with some modern techniques built in

Modern Neural Network

Here we implement the following changes on the previous neural network to increase accuracy on MNIST

We also shift the organization of the code,by abstracting out the model, so it is easier to parse when reading. As our models and networks get more complicated, this becomes a good idea to facilitate debugging

def model_gen(X,w_h,w_h2, w_o,drop_rate_input,drop_rate_hidden): out_X = tf.nn.dropout(X, drop_rate_input) in_H = tf.matmul(X,w_h) out_H = tf.nn.dropout(tf.nn.relu(in_H),drop_rate_hidden) in_H2 = tf.matmul(out_H,w_h2) out_H2 = tf.nn.relu(in_H2) model = tf.matmul(out_H2,w_o) return model
X = tf.placeholder("float", [None, 784]) Y = tf.placeholder("float", [None, 10]) w_h = init_weights([784, 625]) w_h2 = init_weights([625, 625]) w_o = init_weights([625, 10]) drop_rate_input = tf.placeholder("float") drop_rate_hidden = tf.placeholder("float") model = model_gen(X,w_h,w_h2,w_o,drop_rate_input,drop_rate_hidden)
sample_cost = tf.nn.softmax_cross_entropy_with_logits(model,Y) total_cost = tf.reduce_mean(sample_cost) train_operation = tf.train.RMSPropOptimizer(0.001,0.9).minimize(total_cost) predict_operation = tf.argmax(model, 1) accuracy_operation = tf.reduce_mean( tf.cast(tf.equal(predict_operation,tf.argmax(Y,1)),tf.float32) )
NUM_EPOCHS = 50 BATCH_SIZE = 100 import tqdm accuracies = [] with tf.Session() as sess: tf.initialize_all_variables().run() train_model(sess,trX,trY,teX,teY,train_operation,accuracy_operation,NUM_EPOCHS,BATCH_SIZE,10000,\ {drop_rate_input:0.7,drop_rate_hidden: 0.4}, {drop_rate_hidden:1, drop_rate_input:1}) plt.ylim(.9,1);
Test Accuracy: 0.9852: 100%|██████████| 2750000/2750000 [13:42<00:00, 3345.05it/s]200/2750000 [00:00<1:59:19, 384.08it/s]
Finished training for 50 epochs
Took 822.11 seconds (16.44 s per epoch)
Final accuracy was 0.9852

This ran for $20$ epochs, taking about $4$ minutes total to train on a laptop, and the final training accuracy was $98.33\%$ accuracy on the basic MNIST dataset. That's a sizable improvement on our basic MNIST algorithm, while taking less time to train. Next, we'll see another technique which has done very well on MNIST and other image databases, making it the defacto algorithm for image based ML in the industr

Convolutional Neural Networks

def model_gen(X,w,w2,w3,w4,w_o, keep_rate_conv,keep_rate_hidden): l1a = tf.nn.relu(tf.nn.conv2d(X,w,strides=[1,1,1,1],padding='SAME')) # Shape = (?, 28, 28, 32) l1 = tf.nn.max_pool(l1a, ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME') l1o = tf.nn.dropout(l1,keep_rate_conv) l2a = tf.nn.relu(tf.nn.conv2d(l1o,w2,strides=[1,1,1,1],padding='SAME')) # Shape = (?, 14, 14, 64) l2 = tf.nn.max_pool(l2a, ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME') l2o = tf.nn.dropout(l2,keep_rate_conv) l3a = tf.nn.relu(tf.nn.conv2d(l2o,w3,strides=[1,1,1,1],padding='SAME')) # Shape = (?, 7, 7, 128) l3 = tf.nn.max_pool(l3a, ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME') l3o = tf.nn.dropout(l3,keep_rate_conv) l3_final = tf.reshape(l3, [-1,w4.get_shape().as_list()[0]]) l4 = tf.nn.relu(tf.matmul(l3_final,w4)) l4o = tf.nn.dropout(l4, keep_rate_hidden) model = tf.matmul(l4o,w_o) return model
X = tf.placeholder("float", [None, 28,28,1]) Y = tf.placeholder("float", [None, 10]) w = init_weights([3,3,1,32]) w2 = init_weights([3,3,32,64]) w3 = init_weights([3,3,64,128]) w4 = init_weights([128*4*4, 625]) w_o = init_weights([625,10]) keep_rate_conv = tf.placeholder("float") keep_rate_hidden = tf.placeholder("float") model = model_gen(X,w,w2,w3,w4,w_o,keep_rate_conv,keep_rate_hidden)
sample_cost = tf.nn.softmax_cross_entropy_with_logits(model,Y) total_cost = tf.reduce_mean(sample_cost) train_operation = tf.train.RMSPropOptimizer(0.001,0.9).minimize(total_cost) predict_operation = tf.argmax(model, 1) accuracy_operation = tf.reduce_mean( tf.cast(tf.equal(predict_operation,tf.argmax(Y,1)),tf.float32) )

Before training, we must return the MNIST Dataset back into $28 \times 28$ images (instead of the flattened vectors). We do this now

trX2 = trX.reshape(-1,28,28,1) teX2 = teX.reshape(-1,28,28,1)
NUM_EPOCHS = 10 BATCH_SIZE = 50 TEST_SIZE = 500 import tqdm accuracies = [] with tf.Session() as sess: tf.initialize_all_variables().run() train_model(sess,trX2,trY,teX2,teY,train_operation,accuracy_operation,NUM_EPOCHS,BATCH_SIZE,TEST_SIZE,\ {keep_rate_conv:0.8,keep_rate_hidden: 0.5}, {keep_rate_hidden:1, keep_rate_conv:1},20) plt.ylim(.9,1)
Test Accuracy: 0.988: 100%|██████████| 550000/550000 [27:49<00:00, 423.88it/s]  5, 424.50it/s]
Finished training for 10 epochs
Took 1669.98 seconds (167.00 s per epoch)
Final accuracy was 0.9913
Out[71]:
(0.9, 1)