We’ll be making extensive use of the TensorFlow Probability library to help us develop probabilistic deep learning models. The distribution objects from the library are the vital building blocks because they capture the essential operations on probability distributions. We are going to use them when building probabilistic deep learning models in TensorFlow.
Within the tfp library, there are several modules that we’ll use a lot, one of them being the
distributions module. The code below is an example of normal distribution.
import tensorflow as tf import tensorflow_probability as tfp tfd = tfp.distributions normal = tfd.Normal(loc=0., scale=1.) # mean = 0, std deviation = 1 normal.sample() # sample from the dist, returning a Tensor object normal.sample(3) # draw multiple independent samples from the dist normal.prob(0.5) # evaluate the prob density function at a point normal.log_prob(0.5)
Below is an example of a discrete univariate distribution object:
bernoulli = tfd.Bernoulli(probs=0.7) # prob that the random var takes 1 bernoulli = tfd.Bernoulli(logits=0.847) # sigmoid(0.847) ~= 0.7 bernoulli.sample(3) # draw multiple independent samples from the dist bernoulli.prob(1) # evaluate the prob of event 1, which ~= 0.7 bernoulli.log_prob(1) # evaluate using log prob
event_shape property of these objects is what captures the dimensionality of the random variable itself. In the case of univariate distributions with single random variable, the
event_shape property is empty.
Another one of the powerful features of distribution objects is that a single object can represent a batch of distributions of the same type. By designing distribution objects in this way, the TensorFlow probability library can exploit the performance gains from vectorizing computations.
# 1 object with 2 batches of distributions batched_bernoulli = tfd.Bernoulli(probs=[0.4, 0.5]) batched_bernoulli.batch_shape # returns (2,) batched_bernoulli.sample(3) # returns a Tensor with shape (3, 2) batched_bernoulli.prob([1, 1]) # eval the prob of event 1 for both batches batched_bernoulli.log_prob([1, 1])
Multivariate distributions can be constructed and used in a very similar way to that of the univariate distributions. Below is the code example of instantiating a 2-dimensional diagonal Gaussian:
mv_normal = tfd.MultivariateNormalDiag(loc=[-1., 0.5], scale_diag=[1., 1.5]) mv_normal.event_shape # returns (2,) mv_normal.sample(3) # returns a Tensor of shape (3, 2)
Note that a 2-dimensional multivariate distribution (batch_shape is empty and
event_shape = 2) is totally different from a univariate distribution with 2 batches (
batch_shape = 2 and event_shape is empty). This difference is clear when we compute
log_prob for a given input.
Multivariate distributions can also be batched. This MultivariateNormalDiag distribution has an event shape of two and a batch shape of three. In other words, it contains a batch of three multivariate Gaussians, each of which is a distribution over a two-dimensional random variable.
batched_mv_normal = tfd.MultivariateNormalDiag( loc=[[-1., 0.5], [2., 0.], [-0.5, 1.5]], scale_diag=[[1., 1.5], [2., 0.5], [1., 1.]] ) # batch_shape = , event_shape =  batched_mv_normal.sample(2) # returns a Tensor of shape (2, 3, 2) # (sample_size, batch_size, event_size)
Sometimes we might want to reinterpret a batch of independent distributions over an event space as a single joint distribution over a product of event spaces. For example, our model might assume that the features of our data are independent given a class label. In this case, we could set up a separate class conditional distribution for each feature in a batch.
But this batch of distributions is really a joint distribution over all the features, and we’d like that to be reflected in the
event_shape properties, and the outputs of the
In the distributions module, there is the
Independent distribution class, which is designed especially for this purpose. First lets do some comparison:
Independent distribution gives us a way to absorb some or all of the batch dimensions into the
event_shape. In the example above, we could use the
Independent distribution to transform our
batched_normal distribution so that it’s equivalent to the multivariate diagnoal normal distribution
independent_normal = tfd.Independent( batched_normal, reinterpreted_batch_ndims=1 # how many batch dims absorbed to event ) # batch_shape = , event_shape= independent_normal.log_prob([-0.2, 1.8]) # tf.Tensor(-2.9388796, shape=(), ...)
Mathematically, this independent distribution is now equivalent to the multivariate diagonal normal distribution we had before.
Higher Rank of
Here is one more example when the
batch_shape has a rank that is greater than one.
batched_normal = tfd.Normal( loc=[[-1., 0.5], [2., 0.], [-0.5, 1.5]], scale_diag=[[1., 1.5], [2., 0.5], [1., 1.]] ) # batch_shape = [3, 2], event_shape =  independent_normal = tfd.Independent( batched_normal, reinterpreted_batch_ndims=1 # how many batch dims absorbed to event ) # batch_shape = , event_shape =  independent_normal = tfd.Independent( batched_normal, reinterpreted_batch_ndims=2 # how many batch dims absorbed to event ) # batch_shape = , event_shape = [2, 2]
Just as the
batch_shape and the
event_shape can have a rank greater than one, so can the
sample_shape. Suppose we already have gotten an
independent distribution object, with
batch_shape = [2, 1] and
event_shape = [2, 3], now try to sample it.
ind_exp = tfd.Independent(exp, ...) ind_exp.sample([4, 2])
The resulting Tensor object will be rank 6, i.e.
(4, 2, 2, 1, 2, 3). Again, remember the order is sample_shape, batch_shape, and then event_shape.
Now let us consider log_prob. This is a simple example of using broadcasting when computing log prob.
Because the distribution will compute the log probability for each event in the batch, so the value 0.5 will be broadcast to both the event_shape of [2, 3] and the batch_shape of [2, 1], where every entry is equal to 0.5. The log probability for this event is computed for each distribution in the batch. The result is a (2, 1) tensor.
As a general rule, the
log_prob method will broadcast its input against the batch and event shape, which in this example is
(2, 1, 2, 3). It will collapse the event_shape in the computation and the shape of the resulting tensor will be whatever is left, which here is the batch_shape of (2, 1).
Make Distribution Objects Trainable
Recall that in TensorFlow, variable objects are used to capture the values of parameters of our deep learning models. These variables are objects that persist in our program once created, but can change their values during the course of the program, say by using an optimizer object to apply gradients obtained from a loss function and data.
For example, we can learn the mean of a normal distribution object, which also has a
normal = tfd.Normal( loc=tf.Variable(0., name='loc'), scale=1. ) normal.trainable_variables
The mean value of this normal distribution is now trainable and can be updated according to some learning principle. The learning principle that we often use when training deep learning models is maximum likelihood, which is the same as finding the parameters that minimize the negative log likelihood. The function below can get the negative log likelihood:
def nll(x_train): return -tf.reduce_mean(normal.log_prob(x_train))
Let’s continue with the implementation of a training loop to learn the main parameter from the data:
@tf.function def get_loss_and_grads(x_train): with tf.GradientTape() as tape: tape.watch(normal.trainable_variables) loss = nll(x_train) grads = tape.gradient(loss, normal.trainable_variables) return loss, grads optimizer = tf.keras.optimizer.SGD(learning_rate=0.05) for _ in range(num_steps): loss, grads = get_loss_and_grads(x_samples) optimizer.apply_gradients(zip(grads, normal.trainable_variables))
For more on Distribution Objects in TensorFlow Probability, please refer to the wonderful course here https://www.coursera.org/learn/probabilistic-deep-learning-with-tensorflow2
Related Quick Recap
I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai
All of your support will be used for maintenance of this site and more great content. I am humbled and grateful for your generosity. Thank you!
Don't forget to sign up newsletter, don't miss any chance to learn.
Or share what you've learned with friends!Tweet