Customizing Models, Layers and Training Loops

Table of Contents

Subclassing Models

The model subclassing and custom layers give you even more control over how the model is constructed, and can be thought of as an even lower level API than the functional API. However with more flexibility comes more opportunity for bugs.

In order to use model subclassing, we first import the Model class and layer classes, then sub-class the Model class directly, i.e. the MyModel class. The basic structure to keep in mind is that:

create layers in the initializer __init__
- Don’t forget calling the initializer for the base class first
define the forward pass in the call method

Once we have built the class, all we need to do is to create an instance of this class. The name keyword argument with value my_model is passed down to the base class constructor. The object my_model inherits from the Model base class, and so it has all the methods you already know about, like compile, fit, etc.

The training keyword argument in call method is important to determine the behavior of the model at training or at inference. A really common use of this keyword argument is in BatchNormalization and Dropout layers. In the code below, when this model is being trained, the Dropout layer will randomly zero out its inputs; at test time, the Dropout layer does nothing.

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense

class MyModel(Model):
  def __init__(self, num_classes, **kwargs):
    super(MyModel, self).__init__(**kwargs)
    self.dense1 = Dense(16, activation='sigmoid')
    self.dropout = Dropout(0.5)
    self.dense2 = Dense(num_classes, activation='softmax')

  def call(self, inputs, training=False):
    h = self.dense1(inputs)
    h = self.dropout(h, training=training)
    return self.dense2(h)

my_model = MyModel(10, name='my_model')

Subclassing Layers

The more you work at a lower level with models and layers, you can use these two objects in more similar ways. For example you can call a layer on an input to get the output of the layer; similarly, you call a model on an input, then it will return the output of the model. So you can use model objects as well as layer objects to build larger models.

To create a custom layer, we’re going to be subclassing the base Layer class:

Create the layer variables in the initializer __init__
- The actual values of variables will need to be initialized when the layer is created.
- Another method to create layer variables is to use the add_weight method.
The call method contains the layer computation

from tensorflow.keras.layers import Layer

class LinearMap(Layer):
  def __init__(self, input_dim, units):
    super(MyModel, self).__init__()

    # either create initializer and variable
    w_init = tf.random_normal_initializer()
    self.w = tf.Variable( inital_value = w_init(shape=(input_dim, units)) )

    # or equivalently use the add_weight method
    self.w = self.add_weight(shape=(input_dim, units),
                             initializer='random_normal')

  def call(self, inputs):
    return tf.matmul(inputs, self.w)

linear_map = LinearMap(3, 2)

Automatic Differentiation

In most cases, the model.fit and the model.fit_generator methods are flexible enough for training our networks. But in certain special cases, you might again need to have a finer level of control over what happens in the training loop, in which, obviously a big part is going to be computing the gradients of all of the trainable network variables. Thanks to the automatic differentiation, people do not have to code the network gradients manually.

In the code example below, x is the independent variable, with respect to which we will do the differentiation. Within the context defined by GradientTape, we setup the operations that define the function that we want to differentiate. The line tape.watch(x) means any operations that are performed on x from this point on within the context will be recorded.

import tensorflow as tf

x = tf.constant(2.0)

# context
with tf.GradientTape() as tape:
  tape.watch(x)
  y = x ** 2                   # define a new tensor y
  grad = tape.gradient(y, x)   # take the derivative of f and
                               # evaluate the derivative at x=2

print(grad)
# tf.Tensor(4.0, shape=(), dtype=float32)

Another example:

import tensorflow as tf

x = tf.constant([0, 1, 2, 3], dtype=tf.float32)

# context
with tf.GradientTape() as tape:
  tape.watch(x)
  y = tf.reduce_sume(x ** 2)   # element-wise square and take sum
  z = tf.math.sin(y)           # another tensor
  dz_dy, dz_dx = tape.gradient(z, [y, x])  # compute gradient

The automatic differentiation can be leveraged within a training loop for deep learning network.

Custom Training Loops

The standard principle for training neural networks is by default implemented by the model.fit and model.fit_generator methods:

compute the gradients of a scalar loss function with respect to the model parameters.
update the model parameters according to the optimization algorithm.

But if we want more control over the training loop, then we’ll have to implement these steps ourselves. Suppose you have initialized a model instance, which could be:

a custom model built with subclassing
a model built using functional API
a model built using sequential API

To train this model, we need a loss function which takes two arguments: a prediction y^ and a ground truth y) and returns a scalar tensor. Either custom or inbuilt loss function works.

When setting up the tf.GradientTape() context, tape.watch(...) is not required, because computations that make use of TensorFlow variable objects are automatically recorded by the tf.GradientTape() context. In the example we want to take derivatives with respect to the model parameters (weights), which are all TensorFlow variable objects, so we don’t need to use tape.watch(...) here.

So within the context, we do following things:

Compute the loss (scalar tensor) by calling the loss function and passing in the model predictions and ground truth.
Compute a list of the gradients of the loss with respect to the model parameters, by calling the tape.gradient(...), passing in the loss (scalar tensor) and all the model’s trainable variables.
Apply these gradients to update model parameters by calling apply_gradients method of an optimizer, according to an optimization algorithm. Remember to use the zip function to to match up the gradients with the trainable variables, before passing that into the apply_gradients method of the optimizer.

import tensorflow as tf
from tensorflow.keras.losses import MeanSquareError
from tensorflow.keras.optimizers import SGD
import numpy as np

my_model = MyModel()
loss = MeanSquaredError()
optimizer = SGD(learning_rate=0.05, momentum=0.9)

epoch_losses = []

for epoch in range(num_epochs):
  batch_losses = []
  for inputs, outputs in training_dataset:
    with tf.GradientTape() as tape:
      curr_loss = loss(my_model(inputs), outputs)
      grads = tape.gradent(curr_loss, my_model.trainable_variables)
    batch_losses.append(curr_loss)
    optimizer.apply_gradients(zip(grads, my_model.trainable_variables))
  epoch_losses.append(np.mean(batch_losses))

Optimizing Performance with `tf.function`

In TensorFlow 1, you would need to first build the computational graph and then you run it inside a session. The benefit was that the graph could then be optimized for performance at runtime.

In Tensorflow 2, eager execution is the default, which makes it so much easier to develop models and it’s a huge step forward in terms of usability. However increased usability leads to slower performance. But in TensorFlow 2 makes it possible to convert programs into graphs really easily to get back the peak performance that you get from computational graphs.

Simply add a decorator @tf.function to the function. This single addition can make all the difference to the performance of the code. It makes a graph out of the function so that in many cases, it’s executed much quicker.

@tf.function
def get_loss_and_grads(inputs, outputs):
  with tf.GradientTape() as tape:
    curr_loss = loss(my_model(inputs), outputs)
    grads = tape.gradent(curr_loss, my_model.trainable_variables)
  return curr_loss, grads

for epoch in range(num_epochs):
  for inputs, outputs in training_dataset:
    curr_loss, grads = get_loss_and_grads(inputs, outputs)
    optimizer.apply_gradients(zip(grads, my_model.trainable_variables))

My Certificate

For more on Customizing Models, Layers and Training Loops, please refer to the wonderful course here https://www.coursera.org/learn/customising-models-tensorflow2

My #99 course certificate from Coursera

Related Quick Recap

Sequential Data and Recurrent Neural Networks

I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai

Subclassing Models

Subclassing Layers

Automatic Differentiation

Custom Training Loops

Optimizing Performance with tf.function

My Certificate

Related Quick Recap

Related Posts

My 158th course certificate from Coursera

Kubernetes Deployment and Networking

Cloud Computing: Law Enforcement, Competition and Tax

Leave a Reply Cancel reply

Optimizing Performance with `tf.function`