Understanding RNNs and LSTMs with PyTorch

Recurrent Neural Networks (RNNs) are a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence. This allows them to exhibit temporal dynamic behavior. They are particularly well-suited for tasks involving sequential data, such as natural language processing, speech recognition, and time series analysis.

What are RNNs?

At their core, RNNs process sequential data by maintaining a hidden state that captures information from previous steps in the sequence. This hidden state is updated at each time step, allowing the network to "remember" past inputs and influence future predictions. A simple RNN cell takes the current input and the previous hidden state to produce an output and the new hidden state.

Basic RNN Architecture

The mathematical formulation for a basic RNN cell can be expressed as:

h_t = tanh(W_hh * h_{t-1} + W_xh * x_t + b_h)
y_t = W_hy * h_t + b_y

Where:

The Vanishing Gradient Problem

While powerful, basic RNNs suffer from the vanishing gradient problem. During backpropagation through time, gradients can become very small, making it difficult for the network to learn long-term dependencies. This means that the influence of early inputs on later outputs diminishes significantly.

Introducing LSTMs

Long Short-Term Memory (LSTM) networks are a special type of RNN designed to overcome the vanishing gradient problem and better capture long-range dependencies. LSTMs achieve this through a more complex internal structure involving "gates" that control the flow of information.

LSTM Cell Structure

An LSTM cell has three primary gates:

These gates use sigmoid activation functions to output values between 0 and 1, indicating how much of each value to let through.

LSTM Cell Structure

The key component is the cell state (C_t), which acts as a conveyor belt running through the entire chain, with only minor linear interactions. Information can be easily added or removed from the cell state by the gates.

Key Takeaway: LSTMs use gates (forget, input, output) to selectively manage information flow and maintain a cell state, enabling them to learn long-term dependencies effectively.

Implementing RNNs and LSTMs in PyTorch

PyTorch provides convenient modules for building RNNs and LSTMs. The `torch.nn.RNN` and `torch.nn.LSTM` modules simplify the implementation process significantly.

Example: Basic RNN in PyTorch

import torch
import torch.nn as nn

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        # Forward propagate RNN
        out, hn = self.rnn(x, h0)

        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

# Example Usage:
# input_size = 10
# hidden_size = 20
# output_size = 5
# model = SimpleRNN(input_size, hidden_size, output_size)
# input_seq = torch.randn(64, 15, input_size) # batch_size, seq_length, input_size
# output = model(input_seq)
# print(output.shape) # Should be torch.Size([64, 5])

Example: LSTM in PyTorch

import torch
import torch.nn as nn

class SimpleLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(SimpleLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Initialize hidden and cell states with zeros
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        # Forward propagate LSTM
        out, (hn, cn) = self.lstm(x, (h0, c0))

        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

# Example Usage:
# input_size = 10
# hidden_size = 20
# output_size = 5
# model = SimpleLSTM(input_size, hidden_size, output_size)
# input_seq = torch.randn(64, 15, input_size) # batch_size, seq_length, input_size
# output = model(input_seq)
# print(output.shape) # Should be torch.Size([64, 5])

Applications

RNNs and LSTMs are foundational for many advanced AI applications: