We’ll be making extensive use of the TensorFlow Probability library to help us develop probabilistic deep learning models. The distribution objects from the library are the vital building blocks because they capture the essential operations on probability distributions. We are going to use them when building probabilistic deep learning models in TensorFlow.

## Univariate Distributions

Within the tfp library, there are several modules that we’ll use a lot, one of them being the `distributions`

module. The code below is an example of normal distribution.

```
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
normal = tfd.Normal(loc=0., scale=1.) # mean = 0, std deviation = 1
normal.sample() # sample from the dist, returning a Tensor object
normal.sample(3) # draw multiple independent samples from the dist
normal.prob(0.5) # evaluate the prob density function at a point
normal.log_prob(0.5)
```

Below is an example of a discrete univariate distribution object:

```
bernoulli = tfd.Bernoulli(probs=0.7) # prob that the random var takes 1
bernoulli = tfd.Bernoulli(logits=0.847) # sigmoid(0.847) ~= 0.7
bernoulli.sample(3) # draw multiple independent samples from the dist
bernoulli.prob(1) # evaluate the prob of event 1, which ~= 0.7
bernoulli.log_prob(1) # evaluate using log prob
```

The `event_shape`

property of these objects is what captures the dimensionality of the random variable itself. In the case of univariate distributions with single random variable, the `event_shape`

property is empty.

Another one of the powerful features of distribution objects is that a single object can represent a batch of distributions of the same type. By designing distribution objects in this way, the TensorFlow probability library can exploit the performance gains from vectorizing computations.

```
# 1 object with 2 batches of distributions
batched_bernoulli = tfd.Bernoulli(probs=[0.4, 0.5])
batched_bernoulli.batch_shape # returns (2,)
batched_bernoulli.sample(3) # returns a Tensor with shape (3, 2)
batched_bernoulli.prob([1, 1]) # eval the prob of event 1 for both batches
batched_bernoulli.log_prob([1, 1])
```

## Multivariate Distributions

Multivariate distributions can be constructed and used in a very similar way to that of the univariate distributions. Below is the code example of instantiating a 2-dimensional diagonal Gaussian:

```
mv_normal = tfd.MultivariateNormalDiag(loc=[-1., 0.5], scale_diag=[1., 1.5])
mv_normal.event_shape # returns (2,)
mv_normal.sample(3) # returns a Tensor of shape (3, 2)
```

Note that a 2-dimensional multivariate distribution (batch_shape is empty and `event_shape = 2`

) is totally different from a univariate distribution with 2 batches (`batch_shape = 2`

and event_shape is empty). This difference is clear when we compute `log_prob`

for a given input.

Multivariate distributions can also be batched. This MultivariateNormalDiag distribution has an event shape of two and a batch shape of three. In other words, it contains a batch of three multivariate Gaussians, each of which is a distribution over a two-dimensional random variable.

```
batched_mv_normal = tfd.MultivariateNormalDiag(
loc=[[-1., 0.5], [2., 0.], [-0.5, 1.5]],
scale_diag=[[1., 1.5], [2., 0.5], [1., 1.]] )
# batch_shape = [3], event_shape = [2]
batched_mv_normal.sample(2) # returns a Tensor of shape (2, 3, 2)
# (sample_size, batch_size, event_size)
```

## The `Independent`

Distribution

Sometimes we might want to ** reinterpret** a batch of independent distributions over an event space as a single joint distribution over a product of event spaces. For example, our model might assume that the features of our data are independent given a class label. In this case, we could set up a separate class conditional distribution for each feature in a batch.

But this batch of distributions is really a joint distribution over all the features, and we’d like that to be reflected in the `batch_shape`

and `event_shape`

properties, and the outputs of the `log_prob`

method.

In the distributions module, there is the `Independent`

distribution class, which is designed especially for this purpose. First lets do some comparison:

Multivariate | `mv_normal = tfd.MultivariateNormalDiag(loc=[-1., 0.5], scale_diag=[1., 1.5])` |

Univariate | `batched_normal = tfd.Normal(loc=[-1., 0.5], scale_diag=[1., 1.5])` |

The `Independent`

distribution gives us a way to absorb some or all of the batch dimensions into the `event_shape`

. In the example above, we could use the `Independent`

distribution to transform our `batched_normal`

distribution so that it’s equivalent to the multivariate diagnoal normal distribution `mv_normal`

.

```
independent_normal = tfd.Independent(
batched_normal,
reinterpreted_batch_ndims=1 # how many batch dims absorbed to event
)
# batch_shape = [], event_shape=[2]
independent_normal.log_prob([-0.2, 1.8])
# tf.Tensor(-2.9388796, shape=(), ...)
```

Mathematically, this independent distribution is now equivalent to the multivariate diagonal normal distribution we had before.

### Higher Rank of `batch_shape`

Here is one more example when the `batch_shape`

has a rank that is greater than one.

```
batched_normal = tfd.Normal(
loc=[[-1., 0.5], [2., 0.], [-0.5, 1.5]],
scale_diag=[[1., 1.5], [2., 0.5], [1., 1.]] )
# batch_shape = [3, 2], event_shape = []
independent_normal = tfd.Independent(
batched_normal,
reinterpreted_batch_ndims=1 # how many batch dims absorbed to event
)
# batch_shape = [3], event_shape = [2]
independent_normal = tfd.Independent(
batched_normal,
reinterpreted_batch_ndims=2 # how many batch dims absorbed to event
)
# batch_shape = [], event_shape = [2, 2]
```

## Sampling and `log_prob`

Just as the `batch_shape`

and the `event_shape`

can have a rank greater than one, so can the `sample_shape`

. Suppose we already have gotten an `independent`

distribution object, with `batch_shape = [2, 1]`

and `event_shape = [2, 3]`

, now try to sample it.

```
ind_exp = tfd.Independent(exp, ...)
ind_exp.sample([4, 2])
```

The resulting Tensor object will be rank 6, i.e. `(4, 2, 2, 1, 2, 3)`

. Again, remember the order is sample_shape, batch_shape, and then event_shape.

Now let us consider log_prob. This is a simple example of using broadcasting when computing log prob.

`ind_exp.log_prob(0.5)`

Because the distribution will compute the log probability for each event in the batch, so the value 0.5 will be broadcast to both the event_shape of [2, 3] and the batch_shape of [2, 1], where every entry is equal to 0.5. The log probability for this event is computed for each distribution in the batch. The result is a (2, 1) tensor.

As a general rule, the `log_prob`

method will broadcast its input against the batch and event shape, which in this example is `(2, 1, 2, 3)`

. It will collapse the event_shape in the computation and the shape of the resulting tensor will be whatever is left, which here is the batch_shape of (2, 1).

## Make Distribution Objects Trainable

Recall that in TensorFlow, variable objects are used to capture the values of parameters of our deep learning models. These variables are objects that persist in our program once created, but can change their values during the course of the program, say by using an optimizer object to apply gradients obtained from a loss function and data.

For example, we can learn the mean of a normal distribution object, which also has a `trainable_variables`

attribute.

```
normal = tfd.Normal(
loc=tf.Variable(0., name='loc'),
scale=1. )
normal.trainable_variables
```

The mean value of this normal distribution is now trainable and can be updated according to some learning principle. The learning principle that we often use when training deep learning models is ** maximum likelihood**, which is the same as finding the parameters that

**. The function below can get the negative log likelihood:**

*minimize the negative log likelihood*```
def nll(x_train):
return -tf.reduce_mean(normal.log_prob(x_train))
```

Let’s continue with the implementation of a training loop to learn the main parameter from the data:

```
@tf.function
def get_loss_and_grads(x_train):
with tf.GradientTape() as tape:
tape.watch(normal.trainable_variables)
loss = nll(x_train)
grads = tape.gradient(loss, normal.trainable_variables)
return loss, grads
optimizer = tf.keras.optimizer.SGD(learning_rate=0.05)
for _ in range(num_steps):
loss, grads = get_loss_and_grads(x_samples)
optimizer.apply_gradients(zip(grads, normal.trainable_variables))
```

## My Certificate

For more on Distribution Objects in TensorFlow Probability, please refer to the wonderful course here https://www.coursera.org/learn/probabilistic-deep-learning-with-tensorflow2

## Related Quick Recap

*I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai*

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!

Tweet