Alex's Notes

Tensorflow Datasets

As presented in Chollet: Chapter O8: Intro to DL for Computer Vision

For more read the docs

The tf.data API creates efficient input pipelines for models. The core class is tf.data.Dataset

A Dataset object is an iterator, you can use it in a for loop or pass it directly to the fit method of a model.

It handles a lot of stuff that would be a pain to implement, like async data pre-fetching.

It exposes a functional API for modifying a dataset.

It has a range of useful methods like:

  • dataset.batch(32) for batching the data

  • dataset.shuffle(buffer_size) shuffle elements within a buffer

  • dataset.prefetch(buffer_size) prefetches a buffer of elements in GPU memory.

  • dataset.map(callable) Applies an arbitrary transformation to each element of the dataset. callable takes a single element yielded by the dataset. You will use this a lot, for example in reshaping.