Chollet: Chapter 05: Fundamentals of Machine Learning

Metadata

Title: Fundamentals of Machine Learning
Number: 5
Book: Chollet: Deep Learning

Core Ideas

The fundamental issue in machine learning is the tension between optimization and generalization. Optimization refers to the process of adjusting a model to get the best performance possible on the training data (the learning in machine learning), whereas generalization refers to how well the trained model performs on data it has never seen before.

The goal is generalization, but you don’t control it, you only control the fit of the model to the training data. Do it too well and we get overfitting.

A deep learning model is basically a very high-dimensional curve - a curve that is smooth and continuous (with additional constraints on its structure, originating from model architecture priors), since it needs to be differentiable. And that curve is fitted to data points via gradient descent, smoothly and incrementally. By its very nature, deep learning is about taking a big, complex curve - a manifold - and incrementally adjusting its parameters until it fits some training data points.

The curve involves enough parameters that it could fit anything, train a model long enough and it will end up memorizing its training data and won’t generalize at all. However the data you’re fitting isn’t made of isolated points sparsely distributed, it is a highly structured, low-dimensional manifold within the input space (this is the manifold hypothesis). At some point in the training process the model will roughly approximate the natural manifold of the data.

They work because DL models implement a smooth, continuous mapping from their inputs to their outputs (by necessity since it has to be differentiable). The smoothness helps approximate latent manifolds which have the same properties.

And because DL models tend to be structured in a way that mirrors the “shape” of the information in their training data (via architectural design). Especially true of image processing and sequence processing. They structure their learned representations in a hierarchical, modular way which echoes the organization of natural data.

Generalization

The chapter discusses the problem of overfitting and generalization, summarised in Generalization in Deep Learning, including overfitting/underfitting, the manifold hypothesis, and the importance of the training set.

Model Evaluation

It then moves on to discuss Evaluating Machine Learning Models, looking at train/valid/test split techniques and some pointers on common pitfalls in evaluation.

Improving Model Fit

It then discusses Improving Model Fit

Improving Generalization

Once you’re sure your model can overfit you work on Improving Generalization

Summary

The purpose of a machine learning model is to generalize: to perform accurately on never-seen-before inputs. It’s harder than it seems.
A deep neural network achieves generalization by learning a parametric model that can successfully interpolate between training samples—such a model can be said to have learned the “latent manifold” of the training data. This is why deep learning models can only make sense of inputs that are very close to what they’ve seen during training.
The fundamental problem in machine learning is the tension between optimization and generalization: to attain generalization, you must first achieve a good fit to the training data, but improving your model’s fit to the training data will inevitably start hurting generalization after a while. Every single deep learning best practice deals with managing this tension.
The ability of deep learning models to generalize comes from the fact that they manage to learn to approximate the latent manifold of their data, and can thus make sense of new inputs via interpolation.
It’s essential to be able to accurately evaluate the generalization power of your model while you’re developing it. You have at your disposable an array of evaluation methods, from simple hold-out validation, to K-fold cross-validation and iterated K-fold cross-validation with shuffling. Remember to always keep a completely separate test set for final model evaluation, since information leaks from your validation data to your model may have occurred.
When you start working on a model, your goal is first to achieve a model that has some generalization power and that can overfit. Best practices to do this include tuning your learning rate and batch size, leveraging better architecture priors, increasing model capacity, or simply training longer.
As your model starts overfitting, your goal switches to improving generalization through model regularization. You can reduce your model’s capacity, add dropout or weight regularization, and use early stopping. And naturally, a larger or better dataset is always the number one way to help a model generalize.

Alex's Notes