Chollet: Chapter 03: Intro to Keras and TensorFlow
Metadata
Title: Introduction to Keras and TensorFlow
Number: 3
Book: Chollet: Deep Learning
Core Ideas
The chapter starts by introducing the TensorFlow and Keras libraries, their philosophy and history. Keras now comes with TF and is its official API for high level work.
It then walks through setting up a deep learning dev environment, recommending Google Colab as a first environment for beginners without access to a decent GPU.
The combination of TF and Keras provides both low-level and high-level APIs.
TensorFlow provides low-level tensor manipulation, the infrastructure that underlies ML. The main TF APIs are:
Tensors including special tensors that store the network state (variables).
Tensor operations like addition,
relu
andmatmul
.Backpropogation to compute the gradient of a mathematical expression (via
GradientTape
).On top of this Keras provides high-level APIs for deep learning concepts:
Layers which are combined into a model.
A loss function which defines the feedback signal used in learning.
An optimizer which determines how learning proceeds.
Metrics to evaluate performance, eg accuracy.
A training loop that performs Gradient Based Optimization
Intro to TensorFlow
Tensors
Creating a tensor looks a lot like the NumPy API:
A key difference though is that TensorFlow tensors are not assignable, unlike NP arrays.
They are constant, if you try to assign a new value to a TF tensor and you’ll get an error.
import tensorflow as tf
x = tf.ones(shape=(2,1))
y = tf.zeros(shape=(2,1))
z = tf.random.normal(shape=(3,1), mean=0., stddev=1.)
a = tf.random.uniform(shape=(3,1), minval=0., maxval=1.)
x[0,0] = 0 # this will throw an error!
So how do we update the model state during training? Through the use of the tf.Variable
class. You can modify the state of the variable with its assign
method.
v = tf.Variable(initial_value=tf.random.normal(shape=(3,1))
v.assign(tf.ones((3,1)) #This is ok!
v[0,0].assign(3.) #so is this!
# equivalent of +=
v.assign_add(tf.ones((3,1)))
# equivalent of -=
v.assign_sub(tf.ones((3,1))
Like NumPy, TF has a lot of efficient tensor operations, with eager execution:
a = tf.ones((2,2))
b = tf.square(a)
c = tf.sqrt(a)
d = b + c
e = tf.matmul(a,b)
e *= d # element-wise multiplication
Gradient Tape
So far so NumPy, but TF can retrieve the gradient of any differentiable expression with respect to any of its inputs. You just have to open a GradientTape
scope, apply some computation to one or several input tensors, and retrieve the gradient with respect to the inputs.
input_var = tf.Variable(initial_value= 3.)
with tf.GradientTape() as tape:
result = tf.square(input_var)
gradient = tape.gradient(result, input_var)
The most common use is to retrieve the gradients of the model’s loss with respect to the weights: gradients = tape.gradient(loss, weights)
NB, only trainable variables are tracked by default to minimize computational overhead. If you want constant tensors to be tracked by the tape, you need to tell it to watch
that tensor.
You can nest tapes to track second-order gradients. See p. 79 for an example.
Linear Classifier
The chapter walks through building a linear classifier from scratch just using the TF API. It uses full batch training rather than mini-batch training to keep it simple:
import tensorflow as tf
input_dim = 2
output_dim = 1
learning_rate = 0.1
W = tf.Variable(initial_value = tf.random.uniform(shape=(input_dim, output_dim)))
b = tf.Variable(initial_value = tf.zeros(shape=(output_dim,)))
def model(inputs):
return tf.matmul(inputs, W) + b
def square_loss(targets, predictions):
per_sample_loss = tf.square(targets - predictions)
return tf.reduce_mean(per_sample_loss)
def training_step(inputs, targets):
with tf.GradientTape() as tape:
predictions = model(inputs)
loss = square_loss(targets, predictions)
grad_loss_wrt_W, grad_loss_wrt_b = tape.gradient(loss, [W, b])
W.assign_sub(grad_loss_wrt_W * learning_rate)
b.assign_sub(grad_loss_wrt_b * learning_rate)
return loss
def train(inputs, targets, steps):
for step in range(steps):
loss = training_step(inputs, targets)
print(f'loss at step {step}: {loss: .4f}')
See pp. 80-4 for a worked example with random data.
Intro to Keras
The chapter then introduces Keras basics.