Recurrent Neural Networks (RNNs)

What are Recurrent Neural Networks?

Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to process sequential data. Unlike traditional feedforward networks, RNNs have connections that loop back on themselves, allowing them to maintain an internal "memory" of previous inputs. This makes them exceptionally well-suited for tasks involving time series, natural language processing, speech recognition, and more.

The Core Idea: Loops and Memory

The defining characteristic of an RNN is its recurrent connection. At each time step, the network not only receives an input but also a hidden state from the previous time step. This hidden state acts as a summary of the information processed so far, enabling the network to understand context and dependencies within the sequence.

A simplified representation of an RNN cell, showing the flow of information and the hidden state.

How RNNs Work

An RNN processes each element of a sequence one by one. For each input element x(t) at time step t, the network computes a hidden state h(t) and an output y(t). The key is that h(t) is a function of both the current input x(t) and the previous hidden state h(t-1). This allows information to persist through the sequence.

h(t) = f(W_hh * h(t-1) + W_xh * x(t) + b_h)
y(t) = f(W_hy * h(t) + b_y)

Here, f is an activation function (e.g., tanh or ReLU), W_hh, W_xh, and W_hy are weight matrices, and b_h and b_y are bias vectors.

Challenges with Basic RNNs

While powerful, basic RNNs struggle with learning long-term dependencies. This is due to the vanishing gradient problem, where gradients become very small during backpropagation through time, making it hard for the network to learn from events that happened many steps in the past.

Advanced Architectures: LSTM and GRU

To address the vanishing gradient problem, more sophisticated RNN architectures were developed:

Long Short-Term Memory (LSTM): LSTMs introduce "gates" (forget, input, and output gates) that control the flow of information into and out of a cell's memory. This allows them to selectively remember or forget information over long sequences.
Gated Recurrent Unit (GRU): GRUs are a simplified version of LSTMs, with fewer parameters but often comparable performance. They use an update gate and a reset gate to manage information flow.