cm3015 Topic 09: Getting Started with Neural Networks
Main Info
Title: Getting Started with Neural Networks
Teachers: Tim Blackwell
Semester Taken: October 2021
Parent Module: cm3015 Machine Learning and Neural Networks
Description
In this topic, we study three types of machine learning tasks and their associated neural networks. We are really getting down to business now - deep learning programs, training and validation plots, and model evaluation. We begin with neural network architecture - exactly what goes in to building a neural network - layers, loss functions and optimisers.
This topic covers four weeks of the module. The first week goes through the architecture of a NN in a bit more detail. The remaining three weeks cover case studies:
Key Reading
Chollet: Chapter O3: Intro to Keras and TensorFlow - for 9.1: architecture.
Chollet: Chapter O4: Getting Started with Neural Networks - the case studies.
Lecture Summaries
9.1 Architecture
The main lecture is 9.103 on layers. The module is only going to cover feedforward networks in the lectures, for the rest it points to Chollet’s chapters on RNNs and CNNs.
Gives the definitions of common activation functions:
\[\mathrm{relu}(x) = \mathrm{max}(0,x)\]
\[\mathrm{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}\]
\[\mathrm{sigmoid}(x) = \frac{1}{1+e^{-z}}\]
Shows the sigmoid function, smooths outputs around 0.5, large positive inputs are mapped to 1, large negative inputs mapped to 0, smooth transition around inputs close to 0.
9.105 again stresses that DL is closer to engineering than mathematics as we have to find our way to good network architectures by practice rather than proof.
9.2 Classifying Movie Reviews
Walks through the first example from Chollet: Chapter O4: Getting Started with Neural Networks
9.204 is the main lecture, walking through the notion of units, layers, and activation.
It also digs into the crossentropy loss function in more detail than Chollet.
\[f_{sample} = -y \log y_{pred} - (1 - y) \log(1-y_{pred})\]
Why is crossentropy loss a possible loss function?
The lecture shows that if the label is 0 and the prediction is 0 the crossentropy will be 0, likewise if the label is 1 and the prediction is 1.
If the pred != label, the log will be negative as the sigmoid output is [0,1]. The minus sign ensures that sample cross-entropy is greater than 0. So it measures a positive loss.
It recommends Nielsen’s book as a good source to use for the maths.
9.3 Classifying Reuters Articles
Walks through the second example from Chollet: Chapter O4: Getting Started with Neural Networks
9.304 gives the mathematical definition of softmax:
\[f(x_i) = \frac{e^{x_i}{\sum_i e^{x_i}\]
Softmax is contrasted with hardmax, which would flatten smaller values to 0 and give 1 to the maximum value. Softmax preserves information about the smaller values, and crucially is differentiable.
9.4 Regression: House Prices
Walks through the third example from Chollet: Chapter O4: Getting Started with Neural Networks
One useful video is the 9.402 one on data prep which walks through normalization. It shows the effects of changing axes in NumPy stats calculations.
9.407 includes an additional tip to smooth noisy plots. In addition to omitting the outliers (first 10 epochs) as Chollet does, he smooths the data points itself by replacing each data point with a weighted sum of itself and the previous smooth point. Like this:
def smooth_curve(points, factor=0.9):
smoothed_points = []
for point in points:
if smoothed_points:
previous = smoothed_points[-1]
smoothed_points.append(previous * factor + point * (1 - factor))
else:
smoothed_points.append(point)
return smoothed_points
Lab Summaries
Each case study has an associated lab where you can experiment with hyper parameters.