Alex's Notes

Alpaydin: Chapter 06: Dimensionality Reduction

Metadata

Core Ideas

The complexity of any classifier or regressor depends on the number of inputs. This determines both the time and space complexity and the necessary number of training examples to train such a classifier or regressor. In this chapter, we discuss feature selection methods that choose a subset of important features pruning the rest and feature extraction methods that form fewer, new features from the original inputs.

In an ideal world a classifier/regressor should be able to use whichever features it needs and discard the rest. However there are several reasons we are interested in reducing dimensionality as a separate preprocessing step:

  • Complexity of most learning algorithms depends on the number of input dimensions d and samples N. So reducing dimensionality reduces space and time complexity.

  • If we decide an input is unneccessary we save the costs of future extraction.

  • Simpler models are more robust on small datasets. They have less variance.

  • If data can be explained with fewer features we get a better idea about the process that underlies it, allowing knowledge extraction. These fewer features can be interpreted as hidden or latent factors that in combination generate the observed features.

  • When data can be represented in a few dimensions without loss of information it can be visualized for structural and outlier analysis.

There are two main methods for reducing dimensionality:

Feature Selection: where we find k of the d dimensions that give us the most information, discarding the rest. One method is subset selection.

Feature Extraction: Where we find a new set of k dimensions that are combinations of the original d dimensions. These methods can be unsupervised or supervised if they use the output information. The two best known methods are principal component analysis (unsupervised) and linear discriminant analysis (supervised), both linear projection methods.

Subset Selection

Further Notes