Recurrent Neural Networks

Introduction

Recurrent Neural Networks (RNNs) are a class of neural networks designed to handle sequential data. Unlike feed‑forward networks, RNNs retain a hidden state that captures information from previous time steps, making them ideal for tasks such as language modeling, speech recognition, and time‑series forecasting.

Basic Concept

An RNN processes an input sequence x₁, x₂, …, x_T by updating its hidden state h_t at each time step:

h_t = σ(W_hh·h_{t-1} + W_xh·x_t + b_h)
y_t = σ(W_hy·h_t + b_y)
        

where σ is typically a non‑linear activation (tanh or ReLU). The recurrent connection W_hh enables the network to retain memory of prior inputs.

Popular Architectures

Code Example

Below is a minimal PyTorch implementation of an LSTM for character‑level language modeling.

import torch
import torch.nn as nn

class CharLSTM(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, num_layers=2):
        super().__init__()
        self.embed = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x, hidden=None):
        x = self.embed(x)
        out, hidden = self.lstm(x, hidden)
        out = self.fc(out.reshape(-1, out.size(2)))
        return out, hidden

# Example usage
batch, seq_len = 32, 100
vocab = 80
model = CharLSTM(vocab, embed_dim=128, hidden_dim=256)
inputs = torch.randint(0, vocab, (batch, seq_len))
logits, _ = model(inputs)
print(logits.shape)  # (batch*seq_len, vocab)
        

Further Resources