Keras: Functional API
As presented in Chollet: Chapter O7: Working with Keras: A Deep Dive
The problem with the simple Sequential API is that it can only express models with single inputs, single outputs, and that apply one layer after another in sequential fashion. In practice, it’s very common to need models with multiple inputs (a document and its metadata), multiple outputs (different predictions you want to make), and with a nonlinear topology.
For that you need to use the functional API.
Here’s how you’d express a simple two layer stack using the functional API:
inputs = keras.Input(shape=(100,), name="doc")
features = keras.layers.Dense(64, activation="relu")(inputs)
outputs = keras.layers.Dense(10, activation="softmax")(features)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()
What’s going on here? When we create the Input
instance we create a symbolic tensor
. It doesn’t hold any data, but acts as a spec for future tensors.
When layers are called on symbolic tensors they return a new symbolic tensor, with updated shape and dtype
info. This allows us to chain them in the way we did here, we call the first layer on the inputs and get a new symbolic tensor, then we call the output layer on that, and get a symbolic tensor for our outputs.
Then we instantiate a Model
specifying the inputs and outputs.
Multi-Input Multi-Output Models
Most DL models look like graphs, not lists.
Let’s say you want to build a system to rank customer support tickets by priority and route them to the correct team. Then your model might have three inputs: title of the ticket (text), text body of the ticket (text), and any tags added by the user (categorical input, multi-hot encoded vector).
Let’s say you encode your two text inputs as multi-hot vectors of size vocabulary_size
with a chosen vocab. And say you want to output two things: the priority score (scalar [0,1], sigmoid), and the department to handle it (softmax over the departments).
Here’s what the model looks like in the functional API:
vocab_size = 10000
num_tags = 100
num_departments = 4
title = keras.Input(shape=(vocab_size,), name="title")
text_body = keras.Input(shape=(vocab_size,), name="text_body")
tags = keras.Input(shape=(num_tags,), name="tags")
features = layers.Concatenate()([title, text_body, tags])
features = layers.Dense(64, activation="relu")(features)
priority = layers.Dense(1, activation="sigmoid", name="priority")(features)
department = layers.Dense(num_departments, activation="softmax", name="department")(features)
model = keras.Model(inputs=[title, text_body, tags], outputs=[priority, department])
model.summary()
# plot a nice graph of the model:
keras.utils.plot_model(model, "ticket_classifier.png", show_shapes=True)
Note if you print the graph of the model the first value in the shape is the batch size. If it’s None
it means the model will take a batch of any size.
So how do we train this model? We can pass the data either as lists of input and output data in the same order as the inputs passed to the Model
constructor. Or, if we don’t want to rely on input order you can use a dictionary instead if you gave names to the Input
objects and output layers.
Training with Lists of data
num_samples = 1280
title_data = np.random.randint(0, 2, size=(num_samples, vocabulary_size))
text_body_data = np.random.randint(0, 2, size=(num_samples, vocabulary_size))
tags_data = np.random.randint(0, 2, size=(num_samples, num_tags))
priority_data = np.random.random(size=(num_samples, 1))
department_data = np.random.randint(0, 2, size=(num_samples, num_departments))
model.compile(optimizer="rmsprop",
loss=["mean_squared_error", "categorical_crossentropy"],
metrics=[["mean_absolute_error"], ["accuracy"]])
model.fit([title_data, text_body_data, tags_data],
[priority_data, department_data],
epochs=1)
model.evaluate([title_data, text_body_data, tags_data],
[priority_data, department_data])
priority_preds, department_preds = model.predict(
[title_data, text_body_data, tags_data])
Training with Dictionaries
model.compile(optimizer="rmsprop",
loss={"priority": "mean_squared_error", "department":
"categorical_crossentropy"},
metrics={"priority": ["mean_absolute_error"], "department":
["accuracy"]})
model.fit({"title": title_data, "text_body": text_body_data,
"tags": tags_data},
{"priority": priority_data, "department": department_data},
epochs=1)
model.evaluate({"title": title_data, "text_body": text_body_data,
"tags": tags_data},
{"priority": priority_data, "department": department_data})
priority_preds, department_preds = model.predict(
{"title": title_data, "text_body": text_body_data, "tags": tags_data})
Accessing Layer Connectivity
One of the main powers of this functional API is easy access to individual nodes (layer calls) in the graph, for inspection or reuse.
If you inspect the model.layers
property you can see all the layers that make up the model. Then you can query layer.input
or layer.output
to see the Tensor model for that layer.
This enables feature extraction, creating models that re-use intermediate features from another model. Here’s an example:
model.layers # produces a list of all the layers
model.layers[3].input # returns the symbolic tensor
#let's build a new model
# extract a feature
features = model.layers[4].output
# add a new output layer
difficulty = layers.Dense(3, activation="softmax", name="difficulty")(features)
new_model = keras.Model(inputs=[title, text_body, tags], outputs = [priority, department, difficulty])
keras.utils.plot_model(new_model, "updated_ticket_classifier.png", show_shapes=True)