Introduction
Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle sequential data. Unlike feed‑forward networks, RNNs retain a hidden state that captures information from previous time steps, making them ideal for tasks such as language modeling, speech recognition, and time‑series forecasting.
Basic Concept
An RNN processes an input sequence x₁, x₂, …, x_T
by updating its hidden state h_t
at each time step:
h_t = σ(W_hh·h_{t-1} + W_xh·x_t + b_h) y_t = σ(W_hy·h_t + b_y)
where σ
is typically a non‑linear activation (tanh or ReLU). The recurrent connection W_hh
enables the network to retain memory of prior inputs.
Popular Architectures
- Vanilla RNN – Simple recurrent layer; suffers from vanishing gradients.
- LSTM – Long Short‑Term Memory uses gates to preserve long‑range dependencies.
- GRU – Gated Recurrent Unit, a streamlined version of LSTM.
- Bidirectional RNN – Processes sequence forward and backward for richer context.
Code Example
Below is a minimal PyTorch implementation of an LSTM for character‑level language modeling.
import torch import torch.nn as nn class CharLSTM(nn.Module): def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers=2): super().__init__() self.embed = nn.Embedding(vocab_size, embed_dim) self.lstm = nn.LSTM(embed_dim, hidden_dim, num_layers, batch_first=True) self.fc = nn.Linear(hidden_dim, vocab_size) def forward(self, x, hidden=None): x = self.embed(x) out, hidden = self.lstm(x, hidden) out = self.fc(out.reshape(-1, out.size(2))) return out, hidden # Example usage batch, seq_len = 32, 100 vocab = 80 model = CharLSTM(vocab, embed_dim=128, hidden_dim=256) inputs = torch.randint(0, vocab, (batch, seq_len)) logits, _ = model(inputs) print(logits.shape) # (batch*seq_len, vocab)