Sequential data is data that has a natural sequential structure built into it, like text data or audio data. There are various network architectures and layers that we can use to make predictions from sequence data. However sequence data is often unstructured and not as uniform as other datasets.

## Preprocessing Sequential Data

Each sequence example might be a different length, we could use tools to pad or truncate sequences, so that a number of sequence examples can be stacked together for batch processing.

```
from tensorflow.keras.preprocessing.sequence import pad_sequences
test_input = [
[4, 12, 33, 18],
[63, 23, 54, 30, 19, 3],
[43, 37, 11, 33, 15]
]
test_input_2 = [
[ [2, 1], [3, 3] ],
[ [4, 3], [2, 4], [1, 1] ]
]
preprocessed_data = pad_sequences(test_input, padding='post', maxlen=5, truncating='post', value=-1)
# preprocessed_data
# [[4, 12, 33, 18,
```*-1*],
# [63, 23, 54, 30, 19],
# [43, 37, 11, 33, 15]]
preprocessed_data_2 = pad_sequences(test_input_2, padding='post')
# preprocessed_data_2
# [[ [2, 1], [3, 3], *[0, 0]* ],
# [ [4, 3], [2, 4], [1, 1] ]]

However padding sequences does lead to complications, though, because you will want to train your model on those parts of the input sequences that are the padding values. Fortunately, it’s really easy to handle this using masking in your network. The masking layer expects a three-dimensional input, i.e. `(batch_size, seq_length, features)`

, so you probability need to add a new dimension by using `[..., np.newaxis]`

.

The new tensor now has an extra attribute called `_keras_mask`

, which is a boolean tensor that signals which values in the input are part of the original data and which should be ignored. This mask is used to make sure the loss function is calculated correctly, and ignores any parts of the input that is padding.

```
from tensorflow.keras.layers import Masking
masking_layer = Masking(mask_value=-1)
masked_input = masking_layer(preprocessed_data)
# masked_input
# [[ [4], [12], [33], [18],
```*[-1]* ],
# [ [63], [23], [54], [30], [19] ],
# [ [43], [37], [11], [33], [15] ]]
# masked_input._keras_mask
# [[ True, True, True, True, *False* ],
# [ True, True, True, True, True ],
# [ True, True, True, True, True ]]

## The Embedding Layer

The embedding layer takes in a tokenized sequence and will map each one of those separate tokens to a point in some high-dimensional embedding space. This allows the network to learn its own representation of each token in a sequence input.

```
from tensorflow.keras.layers import Embedding
import numpy as np
embedding_layer = Embedding(1000, #input dimension
32, #embedding dimension
input_length=64,
mask_zero=True)
dummy_input = np.random.randint(1000, size=(16, 64)) #(batch_size, input_len)
embedding_imputs = embedding_layer(dummy_input) # (16, 64, 32)
```

The first argument is the input dimension, which you might find easier to think of as the vocabulary size. It’s just the total number of unique tokens or words in the sequence data inputs. The second argument is the embedding dimension, each of the input token will be mapped somewhere into the embedding dimension space, in such a way as to make a useful representation for the network to accomplish its task. The embedding layer is also able to handle padded sequence inputs correctly.

By setting `mask_zero=True`

, the embedding layer will interpret any zeros that are in the input as padding values. So the network will ignore them.

## Recurrent Neural Network

An important class of models to work with sequence data are recurrent neural networks, they are designed to capture the temporal dependencies in the data. Here’s an example of a simple recurrent neural network `SimpleRNN`

.

```
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Embedding, SimpleRNN, Dense
model = Sequential([
Embedding(1000, 32, input_length=64), #output (None, 64, 32)
SimpleRNN(64, activiation='tanh'), #output (None, 64)
Dense(5, activation='softmax') #output (None, 5)
])
```

In general, an RNN layer expects a three-dimensional tensor input with

`(batch_size, sequence_length, num_features)`

In the example above, this simple RNN layer is a plain recurrent neural network with hidden states of size 64. The RNN will process the sequence input and the output is from the final hidden state of the network, i.e. a two dimensional tensor with shape

`(batch_size, num_hidden_states)`

One of the strengths of recurrent neural nets is their ability to take flexible length sequences, so it is OK to omit the `input_length`

when using `Embedding`

layer, which will enable the network to take a batch of sequences of any length. Both the `batch_size`

and `sequence_length`

are flexible. That’s possible because the RNN layer is only returning its hidden state at the final time step.

```
model = Sequential([
Embedding(1000, 32), #output (None, None, 32)
SimpleRNN(64, activiation='tanh'), #output (None, 64)
Dense(5, activation='softmax') #output (None, 5)
])
```

`LSTM`

(Long Short Term Memory) and `GRU`

(Gated Recurrent Unit) can be used in the same way as you do with the `SimpleRNN`

. You might want to experiment with these different RNN layers to see which one works best for your application.

For example, using Functional API to define a model:

```
# flexible sequence_length, 10 features
inputs = Input(shape=(None, 10)) # (None, None, 10)
h = Masking(mask_value=0)(inputs) # (None, None, 10)
h = LSTM(64)(h) # (None, 64)
outputs = Dense(5, activation='softmax')(h) # (None, 5)
model = Model(inputs, outputs)
```

### Stacked RNN

Usually the RNN layers have only returned the output at the final time step, however sometimes what you’d like is for an RNN layer to return an output at every time step in the sequence. These outputs can then be used for:

- the final model predictions, or
- as an input for another recurrent layer, further downstream.

Each of the recurrent neural network layers have an optional argument `return_sequences`

, which is by default set to `False`

and the reason why the RNN layer only returns the output at the final time step. If this option is set to `True`

, then the layer will return an output for each time step.

The shape of the output after the first `LSTM`

layer is still a three-dimensional tensor of the form `(batch_size, sequence_length, num_features)`

, which can be used as an input to another recurrent neural network layer. This is how we can create stacked LSTMs.

```
h = LSTM(32, return_sequences=True)(h) # (None, None, 32)
h = LSTM(64)(h) # (None, 64)
```

### Bidirectional Wrapper

Bidirectional recurrent layers are often used when we’d like the network to take account future context as well as past contexts. We can create a bidirectional layer by using the bidirectional wrapper and calling it on a regular recurrent layer.

```
h = Bidirectional(LSTM(32, return_sequences=True))(h) # (None, None, 64)
h = Bidirectional(LSTM(64))(h) # (None, 128)
```

Because this is now a bidirectional layer for each LSTM, we effectively have two LSTM networks. One, running in forwards time and one in backwards time. So the outputs will be a combination of the final outputs from each of those recurrent networks.

- The first dimension is the batch_size as always
- The second dimension is the sequence_length, which in this model is flexible,
- The third dimension is the feature dimension that is the concatenation of the outputs for each of these LSTMs, running in forwards and backwards time.

We can also change the behavior of the bidirectional wrapper by changing the `merge_mode`

option. If we set it to `sum`

, then the forward and backward RNN outputs will be added together instead of concatenated.

## My Certificate

For more on Sequential Data and Recurrent Neural Networks, please refer to the wonderful course here https://www.coursera.org/learn/customising-models-tensorflow2

## Related Quick Recap

*I am Kesler Zhu, thank you for visiting my website. Check out more course reviews at https://KZHU.ai*

Don't forget to sign up newsletter, don't miss any chance to learn.

Or share what you've learned with friends!

Tweet