Chollet: Chapter 03: Intro to Keras and TensorFlow

Metadata

Title: Introduction to Keras and TensorFlow
Number: 3
Book: Chollet: Deep Learning

Core Ideas

The chapter starts by introducing the TensorFlow and Keras libraries, their philosophy and history. Keras now comes with TF and is its official API for high level work.

It then walks through setting up a deep learning dev environment, recommending Google Colab as a first environment for beginners without access to a decent GPU.

The combination of TF and Keras provides both low-level and high-level APIs.

TensorFlow provides low-level tensor manipulation, the infrastructure that underlies ML. The main TF APIs are:

Tensors including special tensors that store the network state (variables).
Tensor operations like addition, relu and matmul.
Backpropogation to compute the gradient of a mathematical expression (via GradientTape).
On top of this Keras provides high-level APIs for deep learning concepts:
Layers which are combined into a model.
A loss function which defines the feedback signal used in learning.
An optimizer which determines how learning proceeds.
Metrics to evaluate performance, eg accuracy.
A training loop that performs Gradient Based Optimization

Intro to TensorFlow

Tensors

Creating a tensor looks a lot like the NumPy API:

A key difference though is that TensorFlow tensors are not assignable, unlike NP arrays.

They are constant, if you try to assign a new value to a TF tensor and you’ll get an error.

import tensorflow as tf

x = tf.ones(shape=(2,1))
y = tf.zeros(shape=(2,1))
z = tf.random.normal(shape=(3,1), mean=0., stddev=1.)
a = tf.random.uniform(shape=(3,1), minval=0., maxval=1.)

x[0,0] = 0 # this will throw an error!

So how do we update the model state during training? Through the use of the tf.Variable class. You can modify the state of the variable with its assign method.

v = tf.Variable(initial_value=tf.random.normal(shape=(3,1))

v.assign(tf.ones((3,1)) #This is ok!

v[0,0].assign(3.) #so is this!

# equivalent of +=
v.assign_add(tf.ones((3,1)))

# equivalent of -=
v.assign_sub(tf.ones((3,1))

Like NumPy, TF has a lot of efficient tensor operations, with eager execution:

a = tf.ones((2,2))
b = tf.square(a)
c = tf.sqrt(a)
d = b + c
e = tf.matmul(a,b)
e *= d # element-wise multiplication

Gradient Tape

So far so NumPy, but TF can retrieve the gradient of any differentiable expression with respect to any of its inputs. You just have to open a GradientTape scope, apply some computation to one or several input tensors, and retrieve the gradient with respect to the inputs.

input_var = tf.Variable(initial_value= 3.)
with tf.GradientTape() as tape:
    result = tf.square(input_var)

gradient = tape.gradient(result, input_var)

The most common use is to retrieve the gradients of the model’s loss with respect to the weights: gradients = tape.gradient(loss, weights)

NB, only trainable variables are tracked by default to minimize computational overhead. If you want constant tensors to be tracked by the tape, you need to tell it to watch that tensor.

You can nest tapes to track second-order gradients. See p. 79 for an example.

Linear Classifier

The chapter walks through building a linear classifier from scratch just using the TF API. It uses full batch training rather than mini-batch training to keep it simple:


import tensorflow as tf

input_dim = 2
output_dim = 1
learning_rate = 0.1

W = tf.Variable(initial_value = tf.random.uniform(shape=(input_dim, output_dim)))
b = tf.Variable(initial_value = tf.zeros(shape=(output_dim,)))

def model(inputs):
  return tf.matmul(inputs, W) + b

def square_loss(targets, predictions):
  per_sample_loss = tf.square(targets - predictions)
  return tf.reduce_mean(per_sample_loss)

def training_step(inputs, targets):
  with tf.GradientTape() as tape:
    predictions = model(inputs)
    loss = square_loss(targets, predictions)
  grad_loss_wrt_W, grad_loss_wrt_b = tape.gradient(loss, [W, b])
  W.assign_sub(grad_loss_wrt_W * learning_rate)
  b.assign_sub(grad_loss_wrt_b * learning_rate)
  return loss

def train(inputs, targets, steps):
  for step in range(steps):
    loss = training_step(inputs, targets)
    print(f'loss at step {step}: {loss: .4f}')

See pp. 80-4 for a worked example with random data.

Intro to Keras

The chapter then introduces Keras basics.

Alex's Notes