## Subclassing Models

The model subclassing and custom layers give you even more control over how the model is constructed, and can be thought of as an even lower level API than the functional API. However with more flexibility comes more opportunity for bugs.

In order to use model subclassing, we first import the `Model`

class and layer classes, then sub-class the `Model`

class directly, i.e. the `MyModel`

class. The basic structure to keep in mind is that:

- create layers in the initializer
`__init__`

- Don’t forget calling the initializer for the base class first

- define the forward pass in the
`call`

method

Once we have built the class, all we need to do is to create an instance of this class. The `name`

keyword argument with value `my_model`

is passed down to the base class constructor. The object `my_model`

inherits from the `Model`

base class, and so it has all the methods you already know about, like `compile`

, `fit`

, etc.

The `training`

keyword argument in `call`

method is important to determine the behavior of the model at training or at inference. A really common use of this keyword argument is in `BatchNormalization`

and `Dropout`

layers. In the code below, when this model is being trained, the `Dropout`

layer will randomly zero out its inputs; at test time, the `Dropout`

layer does nothing.

```
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense
class MyModel(Model):
def __init__(self, num_classes, **kwargs):
super(MyModel, self).__init__(**kwargs)
self.dense1 = Dense(16, activation='sigmoid')
self.dropout = Dropout(0.5)
self.dense2 = Dense(num_classes, activation='softmax')
def call(self, inputs, training=False):
h = self.dense1(inputs)
h = self.dropout(h, training=training)
return self.dense2(h)
my_model = MyModel(10, name='my_model')
```

## Subclassing Layers

The more you work at a lower level with models and layers, you can use these two objects in more similar ways. For example you can call a layer on an input to get the output of the layer; similarly, you call a model on an input, then it will return the output of the model. So you can use model objects as well as layer objects to build larger models.

To create a custom layer, we’re going to be subclassing the base `Layer`

class:

- Create the layer variables in the initializer
`__init__`

- The actual values of variables will need to be initialized when the layer is created.
- Another method to create layer variables is to use the
`add_weight`

method.

- The
`call`

method contains the layer computation

```
from tensorflow.keras.layers import Layer
class LinearMap(Layer):
def __init__(self, input_dim, units):
super(MyModel, self).__init__()
# either create initializer and variable
w_init = tf.random_normal_initializer()
self.w = tf.Variable( inital_value = w_init(shape=(input_dim, units)) )
# or equivalently use the add_weight method
self.w = self.add_weight(shape=(input_dim, units),
initializer='random_normal')
def call(self, inputs):
return tf.matmul(inputs, self.w)
linear_map = LinearMap(3, 2)
```

## Automatic Differentiation

In most cases, the model.fit and the model.fit_generator methods are flexible enough for training our networks. But in certain special cases, you might again need to have a finer level of control over what happens in the training loop, in which, obviously a big part is going to be computing the gradients of all of the trainable network variables. Thanks to the automatic differentiation, people do not have to code the network gradients manually.

In the code example below, `x`

is the independent variable, with respect to which we will do the differentiation. Within the context defined by `GradientTape`

, we setup the operations that define the function that we want to differentiate. The line `tape.watch(x)`

means any operations that are performed on x from this point on within the context will be recorded.

```
import tensorflow as tf
x = tf.constant(2.0)
# context
with tf.GradientTape() as tape:
tape.watch(x)
y = x ** 2 # define a new tensor y
grad = tape.gradient(y, x) # take the derivative of f and
# evaluate the derivative at x=2
print(grad)
# tf.Tensor(4.0, shape=(), dtype=float32)
```

Another example:

```
import tensorflow as tf
x = tf.constant([0, 1, 2, 3], dtype=tf.float32)
# context
with tf.GradientTape() as tape:
tape.watch(x)
y = tf.reduce_sume(x ** 2) # element-wise square and take sum
z = tf.math.sin(y) # another tensor
dz_dy, dz_dx = tape.gradient(z, [y, x]) # compute gradient
```

The automatic differentiation can be leveraged within a training loop for deep learning network.

## Custom Training Loops

The standard principle for training neural networks is by default implemented by the `model.fit`

and `model.fit_generator`

methods:

- compute the gradients of a scalar loss function with respect to the model parameters.
- update the model parameters according to the optimization algorithm.

But if we want more control over the training loop, then we’ll have to implement these steps ourselves. Suppose you have initialized a model instance, which could be:

- a custom model built with subclassing
- a model built using functional API
- a model built using sequential API

To train this model, we need a loss function which takes two arguments: a prediction `y^`

and a ground truth `y`

) and returns a scalar tensor. Either custom or inbuilt loss function works.

When setting up the `tf.GradientTape()`

context, `tape.watch(...)`

is not required, because computations that make use of TensorFlow variable objects are automatically recorded by the `tf.GradientTape()`

context. In the example we want to take derivatives with respect to the model parameters (weights), which are all TensorFlow variable objects, so we don’t need to use `tape.watch(...)`

here.

So within the context, we do following things:

- Compute the loss (scalar tensor) by calling the loss function and passing in the model predictions and ground truth.
- Compute a list of the gradients of the loss with respect to the model parameters, by calling the
`tape.gradient(...)`

, passing in the loss (scalar tensor) and all the model’s trainable variables. - Apply these gradients to update model parameters by calling
`apply_gradients`

method of an optimizer, according to an optimization algorithm. Remember to use the`zip`

function to to match up the gradients with the trainable variables, before passing that into the`apply_gradients`

method of the optimizer.

```
import tensorflow as tf
from tensorflow.keras.losses import MeanSquareError
from tensorflow.keras.optimizers import SGD
import numpy as np
my_model = MyModel()
loss = MeanSquaredError()
optimizer = SGD(learning_rate=0.05, momentum=0.9)
epoch_losses = []
for epoch in range(num_epochs):
batch_losses = []
for inputs, outputs in training_dataset:
with tf.GradientTape() as tape:
curr_loss = loss(my_model(inputs), outputs)
grads = tape.gradent(curr_loss, my_model.trainable_variables)
batch_losses.append(curr_loss)
optimizer.apply_gradients(zip(grads, my_model.trainable_variables))
epoch_losses.append(np.mean(batch_losses))
```

## Optimizing Performance with `tf.function`

In TensorFlow 1, you would need to first build the computational graph and then you run it inside a session. The benefit was that the graph could then be optimized for performance at runtime.

In Tensorflow 2, eager execution is the default, which makes it so much easier to develop models and it’s a huge step forward in terms of usability. However increased usability leads to slower performance. But in TensorFlow 2 makes it possible to convert programs into graphs really easily to get back the peak performance that you get from computational graphs.

Simply add a decorator `@tf.function`

to the function. This single addition can make all the difference to the performance of the code. It makes a graph out of the function so that in many cases, it’s executed much quicker.

```
@tf.function
def get_loss_and_grads(inputs, outputs):
with tf.GradientTape() as tape:
curr_loss = loss(my_model(inputs), outputs)
grads = tape.gradent(curr_loss, my_model.trainable_variables)
return curr_loss, grads
for epoch in range(num_epochs):
for inputs, outputs in training_dataset:
curr_loss, grads = get_loss_and_grads(inputs, outputs)
optimizer.apply_gradients(zip(grads, my_model.trainable_variables))
```

## My Certificate

For more on Customizing Models, Layers and Training Loops, please refer to the wonderful course here https://www.coursera.org/learn/customising-models-tensorflow2

## Related Quick Recap

*I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai*

All of your support will be used for maintenance of this site and more great content. I am humbled and grateful for your generosity. Thank you!

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!

Tweet