Keras: Functional API

As presented in Chollet: Chapter O7: Working with Keras: A Deep Dive

The problem with the simple Sequential API is that it can only express models with single inputs, single outputs, and that apply one layer after another in sequential fashion. In practice, it’s very common to need models with multiple inputs (a document and its metadata), multiple outputs (different predictions you want to make), and with a nonlinear topology.

For that you need to use the functional API.

Here’s how you’d express a simple two layer stack using the functional API:

inputs = keras.Input(shape=(100,), name="doc")
features = keras.layers.Dense(64, activation="relu")(inputs)
outputs = keras.layers.Dense(10, activation="softmax")(features)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

What’s going on here? When we create the Input instance we create a symbolic tensor. It doesn’t hold any data, but acts as a spec for future tensors.

When layers are called on symbolic tensors they return a new symbolic tensor, with updated shape and dtype info. This allows us to chain them in the way we did here, we call the first layer on the inputs and get a new symbolic tensor, then we call the output layer on that, and get a symbolic tensor for our outputs.

Then we instantiate a Model specifying the inputs and outputs.

Multi-Input Multi-Output Models

Most DL models look like graphs, not lists.

Let’s say you want to build a system to rank customer support tickets by priority and route them to the correct team. Then your model might have three inputs: title of the ticket (text), text body of the ticket (text), and any tags added by the user (categorical input, multi-hot encoded vector).

Let’s say you encode your two text inputs as multi-hot vectors of size vocabulary_size with a chosen vocab. And say you want to output two things: the priority score (scalar [0,1], sigmoid), and the department to handle it (softmax over the departments).

Here’s what the model looks like in the functional API:


vocab_size = 10000
num_tags = 100
num_departments = 4

title = keras.Input(shape=(vocab_size,), name="title")
text_body = keras.Input(shape=(vocab_size,), name="text_body")
tags = keras.Input(shape=(num_tags,), name="tags")

features = layers.Concatenate()([title, text_body, tags])
features = layers.Dense(64, activation="relu")(features)

priority = layers.Dense(1, activation="sigmoid", name="priority")(features)
department = layers.Dense(num_departments, activation="softmax", name="department")(features)

model = keras.Model(inputs=[title, text_body, tags], outputs=[priority, department])

model.summary()
# plot a nice graph of the model:
keras.utils.plot_model(model, "ticket_classifier.png", show_shapes=True)

Note if you print the graph of the model the first value in the shape is the batch size. If it’s None it means the model will take a batch of any size.

So how do we train this model? We can pass the data either as lists of input and output data in the same order as the inputs passed to the Model constructor. Or, if we don’t want to rely on input order you can use a dictionary instead if you gave names to the Input objects and output layers.

Training with Lists of data


num_samples = 1280

title_data = np.random.randint(0, 2, size=(num_samples, vocabulary_size))
text_body_data = np.random.randint(0, 2, size=(num_samples, vocabulary_size))
tags_data = np.random.randint(0, 2, size=(num_samples, num_tags))

priority_data = np.random.random(size=(num_samples, 1))
department_data = np.random.randint(0, 2, size=(num_samples, num_departments))

model.compile(optimizer="rmsprop",
	      loss=["mean_squared_error", "categorical_crossentropy"],
	      metrics=[["mean_absolute_error"], ["accuracy"]])
model.fit([title_data, text_body_data, tags_data],
	  [priority_data, department_data],
	  epochs=1)
model.evaluate([title_data, text_body_data, tags_data],
	       [priority_data, department_data])
priority_preds, department_preds = model.predict(
    [title_data, text_body_data, tags_data])

Training with Dictionaries

model.compile(optimizer="rmsprop",
	      loss={"priority": "mean_squared_error", "department":
		    "categorical_crossentropy"},
	      metrics={"priority": ["mean_absolute_error"], "department":
		       ["accuracy"]})
model.fit({"title": title_data, "text_body": text_body_data,
	   "tags": tags_data},
	  {"priority": priority_data, "department": department_data},
	  epochs=1)
model.evaluate({"title": title_data, "text_body": text_body_data,
		"tags": tags_data},
	       {"priority": priority_data, "department": department_data})
priority_preds, department_preds = model.predict(
    {"title": title_data, "text_body": text_body_data, "tags": tags_data})

Accessing Layer Connectivity

One of the main powers of this functional API is easy access to individual nodes (layer calls) in the graph, for inspection or reuse.

If you inspect the model.layers property you can see all the layers that make up the model. Then you can query layer.input or layer.output to see the Tensor model for that layer.

This enables feature extraction, creating models that re-use intermediate features from another model. Here’s an example:


model.layers # produces a list of all the layers

model.layers[3].input # returns the symbolic tensor

#let's build a new model

# extract a feature
features = model.layers[4].output

# add a new output layer
difficulty = layers.Dense(3, activation="softmax", name="difficulty")(features)

new_model = keras.Model(inputs=[title, text_body, tags], outputs = [priority, department, difficulty])

keras.utils.plot_model(new_model, "updated_ticket_classifier.png", show_shapes=True)

Alex's Notes

Keras: Functional API

Multi-Input Multi-Output Models

Accessing Layer Connectivity

Links to this note