Tensor Operations (Chollet)

Much as computer programs can be reduced to a small set of binary operations (AND, OR etc), all transformations learned by deep neural networks can be reduced to a handful of tensor operations or tensor functions applied to tensors of numeric data.

When we build a layer like keras.layers.Dense(512, activation='relu'_ that layer can be interpreted as a function which takes as input a matrix and returns another matrix. Specifically that function would be:

output = relu(dot(input, W) + b)

Where W is a matrix and b a vector, both attributes of the layer. This is based on three tensor operations:

A dot product between the input tensor and a tensor named W
An addition between the resulting matrix and b
A relu operation, where relu(x) is max(x,0) (relu stands for ‘rectified linear unit’)

Element-wise operations

The relu operation and addition are element wise operations, ie they are applied independently to each entry in the tensors being considered. This means they can be processed in parallel, known as vectorized implementations.

A naive implementation of the operations would look like this, but be really slow:

def naive_relu(x):
    assert len(x.shape) == 2
    x = x.copy()
    for i in range(x.shape[0]):
	for j in range(x.shape[1]):
	    x[i,j] = max(x[i,j], 0)
    return x

def naive_add(x,y):
    assert len(x.shape) == 2
    assert x.shape == y.shape
    x = x.copy()
    for i in range(x.shape[0]):
	for j in range(x.shape[1]):
	    x[i,j] += y[i,j]
    return x

#fast numpy versions
#element wise addition
z = x + y
# element-wise relu
z = np.maximum(z, 0.)

Broadcasting

What if we wish to add tensors of different ranks? For example we want to add a matrix to a vector.

When possible, and if there’s no ambiguity, the smaller tensor will be broadcast to match the shape of the larger tensor. Broadcasting has two steps:

Axes (called broadcast axes) are added to the smaller tensor to match the ndim of the larger tensor.
The smaller tensor is repeated along these new axes to match teh full shape of the larger tensor.

For a concrete example:

X = np.random.random((32,10))
y = np.random.random((10,))

# step 1, add an empty first axis to y, so it is now (1,10)
y = np.expand_dims(y, axis=0)

# step 2, repeat y 32 times along this axis, so it is (32,10)
Y = np.concatenate([y] * 32, axis=0)

With broadcasting, you can generally perform element-wise operations that take two input tensors if one tensor has a shape (a,b,…n, n+1, …m) and the other has a shape (n, n+1, m) The broadcasting will automatically happen for axes a through n-1.

Tensor Product

The tensor product is like the dot product, we can do it with the np.dot(x,y) function in NumPy. With vectors it’s the sum of the element wise product, like this:

def naive_vector_dot(x,y):
    assert len(x.shape) == 1
    assert len(y.shape) == 1
    assert x.shape[0] == y.shape[0]

    z = 0
    for in range(x.shape[0]):
	z += x[i] * y[i]
    return z

def naive_matrix_vector_dot(x, y):
    z = np.zeros(x.shape[0])
    for i in range(x.shape[0]):
	z[i] = naive_vector_dot(x[i, :], y)
    return z

def naive_matrix_dot(x, y):
   assert len(x.shape) == 2
   assert len(y.shape) == 2
   assert x.shape[1] == y.shape[0]

   z = np.zeros((x.shape[0], y.shape[1]))

   for i in range(x.shape[0]):
       for j in range(y.shape[1]):
	   row_x = x[i, :]
	   column_y = y[:, j]
	   z[i,j] = naive_vector_dot(row_x, column_y)
   return z

It generalizes to tensors of an arbitrary number of axes. Most common is the dot product between two matrices shown above. This is defined if x.shape[1] == y.shape[0]. Then the result is a matrix with shape (x.shape[0], y.shape[1]).

Tensor reshaping

A third essential tensor operation is tensor reshaping.

Reshaping a tensor means rearranging its rows and columns to match a target shape. Naturally the reshaped tensor has the same total number of coefficients as the initial tensor.

Transposition is a special case of reshaping, exchanging its rows and columns.

Alex's Notes

Tensor Operations (Chollet)

Element-wise operations

Broadcasting

Tensor Product

Tensor reshaping

Links to this note