Introduction
Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle sequential data. Unlike feed‑forward networks, RNNs retain a hidden state that captures information from previous time steps, making them ideal for tasks such as language modeling, speech recognition, and time‑series forecasting.
Basic Concept
An RNN processes an input sequence x₁, x₂, …, x_T by updating its hidden state h_t at each time step:
h_t = σ(W_hh·h_{t-1} + W_xh·x_t + b_h)
y_t = σ(W_hy·h_t + b_y)
where σ is typically a non‑linear activation (tanh or ReLU). The recurrent connection W_hh enables the network to retain memory of prior inputs.
Popular Architectures
- Vanilla RNN – Simple recurrent layer; suffers from vanishing gradients.
- LSTM – Long Short‑Term Memory uses gates to preserve long‑range dependencies.
- GRU – Gated Recurrent Unit, a streamlined version of LSTM.
- Bidirectional RNN – Processes sequence forward and backward for richer context.
Code Example
Below is a minimal PyTorch implementation of an LSTM for character‑level language modeling.
import torch
import torch.nn as nn
class CharLSTM(nn.Module):
def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers=2):
super().__init__()
self.embed = nn.Embedding(vocab_size, embed_dim)
self.lstm = nn.LSTM(embed_dim, hidden_dim, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_dim, vocab_size)
def forward(self, x, hidden=None):
x = self.embed(x)
out, hidden = self.lstm(x, hidden)
out = self.fc(out.reshape(-1, out.size(2)))
return out, hidden
# Example usage
batch, seq_len = 32, 100
vocab = 80
model = CharLSTM(vocab, embed_dim=128, hidden_dim=256)
inputs = torch.randint(0, vocab, (batch, seq_len))
logits, _ = model(inputs)
print(logits.shape) # (batch*seq_len, vocab)