Subclassing Models
The model subclassing and custom layers give you even more control over how the model is constructed, and can be thought of as an even lower level API than the functional API. However with more flexibility comes more opportunity for bugs.
In order to use model subclassing, we first import the Model
class and layer classes, then sub-class the Model
class directly, i.e. the MyModel
class. The basic structure to keep in mind is that:
- create layers in the initializer
__init__
- Don’t forget calling the initializer for the base class first
- define the forward pass in the
call
method
Once we have built the class, all we need to do is to create an instance of this class. The name
keyword argument with value my_model
is passed down to the base class constructor. The object my_model
inherits from the Model
base class, and so it has all the methods you already know about, like compile
, fit
, etc.
The training
keyword argument in call
method is important to determine the behavior of the model at training or at inference. A really common use of this keyword argument is in BatchNormalization
and Dropout
layers. In the code below, when this model is being trained, the Dropout
layer will randomly zero out its inputs; at test time, the Dropout
layer does nothing.
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
class MyModel(Model):
def __init__(self, num_classes, **kwargs):
super(MyModel, self).__init__(**kwargs)
self.dense1 = Dense(16, activation='sigmoid')
self.dropout = Dropout(0.5)
self.dense2 = Dense(num_classes, activation='softmax')
def call(self, inputs, training=False):
h = self.dense1(inputs)
h = self.dropout(h, training=training)
return self.dense2(h)
my_model = MyModel(10, name='my_model')
Subclassing Layers
The more you work at a lower level with models and layers, you can use these two objects in more similar ways. For example you can call a layer on an input to get the output of the layer; similarly, you call a model on an input, then it will return the output of the model. So you can use model objects as well as layer objects to build larger models.
To create a custom layer, we’re going to be subclassing the base Layer
class:
- Create the layer variables in the initializer
__init__
- The actual values of variables will need to be initialized when the layer is created.
- Another method to create layer variables is to use the
add_weight
method.
- The
call
method contains the layer computation
from tensorflow.keras.layers import Layer
class LinearMap(Layer):
def __init__(self, input_dim, units):
super(MyModel, self).__init__()
# either create initializer and variable
w_init = tf.random_normal_initializer()
self.w = tf.Variable( inital_value = w_init(shape=(input_dim, units)) )
# or equivalently use the add_weight method
self.w = self.add_weight(shape=(input_dim, units),
initializer='random_normal')
def call(self, inputs):
return tf.matmul(inputs, self.w)
linear_map = LinearMap(3, 2)
Automatic Differentiation
In most cases, the model.fit and the model.fit_generator methods are flexible enough for training our networks. But in certain special cases, you might again need to have a finer level of control over what happens in the training loop, in which, obviously a big part is going to be computing the gradients of all of the trainable network variables. Thanks to the automatic differentiation, people do not have to code the network gradients manually.
In the code example below, x
is the independent variable, with respect to which we will do the differentiation. Within the context defined by GradientTape
, we setup the operations that define the function that we want to differentiate. The line tape.watch(x)
means any operations that are performed on x from this point on within the context will be recorded.
import tensorflow as tf
x = tf.constant(2.0)
# context
with tf.GradientTape() as tape:
tape.watch(x)
y = x ** 2 # define a new tensor y
grad = tape.gradient(y, x) # take the derivative of f and
# evaluate the derivative at x=2
print(grad)
# tf.Tensor(4.0, shape=(), dtype=float32)
Another example:
import tensorflow as tf
x = tf.constant([0, 1, 2, 3], dtype=tf.float32)
# context
with tf.GradientTape() as tape:
tape.watch(x)
y = tf.reduce_sume(x ** 2) # element-wise square and take sum
z = tf.math.sin(y) # another tensor
dz_dy, dz_dx = tape.gradient(z, [y, x]) # compute gradient
The automatic differentiation can be leveraged within a training loop for deep learning network.
Custom Training Loops
The standard principle for training neural networks is by default implemented by the model.fit
and model.fit_generator
methods:
- compute the gradients of a scalar loss function with respect to the model parameters.
- update the model parameters according to the optimization algorithm.
But if we want more control over the training loop, then we’ll have to implement these steps ourselves. Suppose you have initialized a model instance, which could be:
- a custom model built with subclassing
- a model built using functional API
- a model built using sequential API
To train this model, we need a loss function which takes two arguments: a prediction y^
and a ground truth y
) and returns a scalar tensor. Either custom or inbuilt loss function works.
When setting up the tf.GradientTape()
context, tape.watch(...)
is not required, because computations that make use of TensorFlow variable objects are automatically recorded by the tf.GradientTape()
context. In the example we want to take derivatives with respect to the model parameters (weights), which are all TensorFlow variable objects, so we don’t need to use tape.watch(...)
here.
So within the context, we do following things:
- Compute the loss (scalar tensor) by calling the loss function and passing in the model predictions and ground truth.
- Compute a list of the gradients of the loss with respect to the model parameters, by calling the
tape.gradient(...)
, passing in the loss (scalar tensor) and all the model’s trainable variables. - Apply these gradients to update model parameters by calling
apply_gradients
method of an optimizer, according to an optimization algorithm. Remember to use thezip
function to to match up the gradients with the trainable variables, before passing that into theapply_gradients
method of the optimizer.
import tensorflow as tf
from tensorflow.keras.losses import MeanSquareError
from tensorflow.keras.optimizers import SGD
import numpy as np
my_model = MyModel()
loss = MeanSquaredError()
optimizer = SGD(learning_rate=0.05, momentum=0.9)
epoch_losses = []
for epoch in range(num_epochs):
batch_losses = []
for inputs, outputs in training_dataset:
with tf.GradientTape() as tape:
curr_loss = loss(my_model(inputs), outputs)
grads = tape.gradent(curr_loss, my_model.trainable_variables)
batch_losses.append(curr_loss)
optimizer.apply_gradients(zip(grads, my_model.trainable_variables))
epoch_losses.append(np.mean(batch_losses))
Optimizing Performance with tf.function
In TensorFlow 1, you would need to first build the computational graph and then you run it inside a session. The benefit was that the graph could then be optimized for performance at runtime.
In Tensorflow 2, eager execution is the default, which makes it so much easier to develop models and it’s a huge step forward in terms of usability. However increased usability leads to slower performance. But in TensorFlow 2 makes it possible to convert programs into graphs really easily to get back the peak performance that you get from computational graphs.
Simply add a decorator @tf.function
to the function. This single addition can make all the difference to the performance of the code. It makes a graph out of the function so that in many cases, it’s executed much quicker.
@tf.function
def get_loss_and_grads(inputs, outputs):
with tf.GradientTape() as tape:
curr_loss = loss(my_model(inputs), outputs)
grads = tape.gradent(curr_loss, my_model.trainable_variables)
return curr_loss, grads
for epoch in range(num_epochs):
for inputs, outputs in training_dataset:
curr_loss, grads = get_loss_and_grads(inputs, outputs)
optimizer.apply_gradients(zip(grads, my_model.trainable_variables))
My Certificate
For more on Customizing Models, Layers and Training Loops, please refer to the wonderful course here https://www.coursera.org/learn/customising-models-tensorflow2
Related Quick Recap
I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai