Alex's Notes

Alpaydin: Chapter 01: Introduction

Metadata

Key Quotes

Machine learning is programming computers to optimize a performance criterion using example data or past experience. We have a model defined up to some parameters, and learning is the execution of a computer program to optimize the parameters of the model using the training data or past experience. The model may be predictive to make predictions in the future, or descriptive to gain knowledge from data, or both. (p. 3)

Intro

Introduces ML as a solution to the issue of how we get machines to perform tasks for which we cannot design an algorithm (approach is followed by the module):

For some tasks… we do not have an algorithm, despite decades of research. Some of these are tasks that we as humans can do, and do effortlessly, without even being aware of how we do them. We can recognize a person from a photograph; we can move in a crowded room without hitting objects or people; we can play chess, drive a car, and hold conversations in a foreign language. In machine learning the idea is to learn to do these type of things. (p. 1)

our approach is to start from a very general model with many parameters, and that general model can do all sorts of tasks depending on how its parameters are set. Learning corresponds to adjusting the values of those parameters so that the model matches best with the data it sees during training. Based on this training data, the general model through a particular setting of its parameters becomes specialized to the particular task that underlies the data. That version of the model we get after training, that particular instantiation of the general template, becomes the algorithm for that task. (p. 1)

Introduces a sample problem, how can a supermarket use all its transaction data to understand customer behaviour and predict what customers will want to buy? We cannot know exactly, but we know it is not completely random, there are patterns in the data. We can construct a good and useful approximation, which won’t explain everything but might account for part of the data. This is the niche of ML, detecting patterns to help us understand a process or make predictions.

Introduces the mining analogy, “a large volume of data is processed to construct a simple model with valuable use”.

Machine learning is also part of AI, to act intelligently in a dynamic enviornment, an agent must be able to learn.

ML uses statistical theory to build mathematical models, since “the core task is making inferences from a sample”. It relies on CS for efficient training optimization, and efficient application to new data.

Example Applications - Supervised Learning

Association Rules

In basket analysis we learn associations between products bought by customers, the conditional probability of \(P(Y|X)\), or \(P(Y|X,D)\) where X and Y are products and D are customer attributes.

Classification

credit scoring is an example of a classification problem where customers are put into low-risk and high-risk classes. A function that takes customer data and finds suitable values to put them into the correct category is a discriminant function. We may want to assign a discrete category, or a probability of membership.

Optical character recognition is an interesting classification problem, we lack a formal description of ‘A’ that covers all ‘As’, but we can learn from example. We can also learn contextual dependencies that come from sequences that enable us to interpret characters like th?s.

Other examples are given, facial recognition, medical diagnosis, speech recognition, sentiment analysis, machine translation, and biometrics.

Learning a rule from data also allows knowledge extraction. The rule is a simple model that explains the data, and looking at this model we have an explanation about the process underlying the data.

EG if we learn the discriminant in the high-risk, low-risk customer data, we learn the knowledge about the properties of low-risk customers. This also performs compression, we get an explanation that is less complex than the original data. It also allows us to detect outliers or novel cases.

Regression

Problems where the output is continuous, like a number (eg price prediction), are regression problems. Like classification it is a supervised learning problem. We assume a model up to a set of parameters:

\(y = g(x|\theta)\)

where \(g(\circ)\) is the model, a regression function or discriminant, and \(\theta\) are its parameters. The ML program optimizes the parameters \(\theta\), such that the approximation error is minimized. The model may be linear, or if that is too restrictive, a quadratic or higher order polynomial, or any other nonlinear function.

Sometimes in regression we don’t want the output to be an absolute numeric value, we want a ranking in, eg, a recommendation system.

Unsupervised learning

In unsupervised learning we have only input data, we are not learning a pre-existing mapping to an output, we are aiming to find regularities in the input.

There is a structure to the input space such that certain patterns occur more often than others, and we want to see what generally happens and what does not. In statistics, this is called density estimation. One method for density estimation is clustering, where the aim is to find clusters or groupings of input. (p. 11)

Sample applications include customer segmentation, document clustering (where similar documents in a corpus, eg news reports, are grouped by the words they share).

Reinforcement Learning

In some applications, the output of the system is a sequence of actions. In such a case, a single action is not important; what is important is the policy that is the sequence of correct actions to reach the goal. There is no such thing as the best action in any intermediate state; an action is good if it is part of a good policy. In such a case, the machine learning program should be able to assess the goodness of policies and learn from past good action sequences to be able to generate a policy. Such learning methods are called reinforcement learning algorithms. (p. 12)

Game playing and robot navigation are good examples. It can be complicated by partially observable or multi-agent Task Environments.

The chapter gives a very cursory history of some milestones and precursors to ML, noting that the process of induction is fundamental to human reasoning more generally, and scientific design. But we have so much data now that it is hard to make such inductive reasoning formally over large datasets without computational help, this is where ML comes in.

Unsurprisingly, the methods that shape ML have come from those earlier formal inductive reasoning approaches in other domains:

In statistics, going from particular observations to general descriptions is called inference, and learning is called estimation. Classification is called discriminant analysis… In engineering, classification is called pattern recognition.

Over time, it has been realized in the neural network community that most neural network learning algorithms have their basis in statistics - for example, the multilayer perceptron is another class of nonparametric estimator - and claims of brain-like computation have started to fade. (p. 14)

The chapter reviews how the field of ML bumps up against high performance computing, data privacy and security, model interpretability and trust, and data science.