Convolutional Neural Networks

Table of Contents 1. Introduction 2. Architecture Overview 3. The Convolution Operation 4. Pooling Layers 5. Fully Connected Layers 6. Training Strategies 7. Hands‑On Code

1. Introduction

Convolutional Neural Networks (CNNs) are a class of deep learning models that excel at analyzing visual data. They automatically learn spatial hierarchies of features directly from raw pixels, making them the backbone of modern computer vision applications such as image classification, object detection, and segmentation.

2. Architecture Overview

A typical CNN consists of a stack of layers:

Convolutional layers – extract local patterns.
Activation functions – introduce non‑linearity (e.g., ReLU).
Pooling layers – down‑sample feature maps.
Fully connected layers – perform high‑level reasoning.

The combination of these layers enables the network to learn increasingly abstract representations.

3. The Convolution Operation

Convolution slides a small kernel across the input image and computes dot products. The kernel’s parameters are learned during training.


# Simple 2‑D convolution illustration
input = [[1,2,3],
         [4,5,6],
         [7,8,9]]

kernel = [[1,0],
          [0,-1]]

output = [[ (1*1 + 2*0 + 4*0 + 5*(-1)), ... ],
          ... ]

Key hyper‑parameters:

Kernel size (e.g., 3×3, 5×5)
Stride – step size of the sliding window.
Padding – adding zeros around the input to preserve dimensions.

4. Pooling Layers

Pooling reduces spatial dimensions, providing translation invariance and decreasing computational cost.

Common types:

Max pooling – selects the maximum value in each window.
Average pooling – computes the average.


# Max pooling 2×2 example
input = [[1,3,2,4],
         [5,6,7,8],
         [9,10,11,12],
         [13,14,15,16]]

output = [[6,8],
          [14,16]]

5. Fully Connected Layers

After several convolution‑pooling blocks, the feature maps are flattened and passed through one or more dense layers, culminating in a softmax output for classification tasks.

6. Training Strategies

Typical training pipeline:

Initialize weights (He or Glorot).
Feed batches of images through the network.
Compute loss (e.g., categorical cross‑entropy).
Back‑propagate gradients.
Update weights with an optimizer (SGD, Adam).
Apply regularization (dropout, weight decay).

Monitoring metrics such as accuracy and loss on a validation set helps prevent overfitting.

7. Hands‑On Code

Run the following minimal CNN on the MNIST dataset using TensorFlow/Keras.


import tensorflow as tf
from tensorflow.keras import layers, models

# Load data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train[..., tf.newaxis]/255.0
x_test = x_test[..., tf.newaxis]/255.0

model = models.Sequential([
    layers.Conv2D(32, 3, activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=3, validation_split=0.1)
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")