PyTorch Tutorials: Training Models

Training Models with PyTorch

Welcome to the tutorial on training models using PyTorch! This section will guide you through the essential steps involved in training a neural network, from defining your model and loss function to optimizing its parameters.

Key Concepts: Loss Functions, Optimizers, Epochs, Batches, Forward Pass, Backward Pass.

1. Defining Your Model

Before training, you need a model. We'll assume you have already defined your neural network architecture using torch.nn.Module, as covered in the "Neural Network Basics" tutorial. For this example, let's consider a simple feed-forward network.


import torch
import torch.nn as nn
import torch.optim as optim

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 784) # Flatten the input
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model = SimpleNet()

2. Choosing a Loss Function

The loss function quantifies how well your model is performing. PyTorch offers a wide range of loss functions, such as nn.CrossEntropyLoss for classification and nn.MSELoss for regression.


criterion = nn.CrossEntropyLoss()

3. Selecting an Optimizer

Optimizers are algorithms that adjust the model's parameters to minimize the loss. Popular choices include Stochastic Gradient Descent (SGD), Adam, and RMSprop. You'll need to pass the model's parameters and a learning rate to the optimizer.


learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
# Or using SGD:
# optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

4. The Training Loop

The core of training is the training loop. This loop iterates over your dataset multiple times (epochs), processing data in batches. For each batch:

Perform a forward pass to get predictions.
Calculate the loss.
Perform a backward pass to compute gradients.
Update the model's weights using the optimizer.

Here's a typical structure for a training loop:


num_epochs = 5
batch_size = 64

# Assuming you have your data loaded into DataLoaders:
# train_loader = ...

for epoch in range(num_epochs):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        # 1. Zero the parameter gradients
        optimizer.zero_grad()

        # 2. Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # 3. Backward pass and optimize
        loss.backward()
        optimizer.step()

        # Print statistics
        running_loss += loss.item()
        if i % 100 == 99:    # Print every 100 mini-batches
            print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {running_loss / 100:.4f}')
            running_loss = 0.0

print('Finished Training')

5. Using DataLoaders

For efficient data handling, PyTorch's torch.utils.data.DataLoader is indispensable. It helps in batching, shuffling, and loading data in parallel.


from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, features, labels):
        self.features = features
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return self.features[idx], self.labels[idx]

# Example dummy data
dummy_features = torch.randn(1000, 784)
dummy_labels = torch.randint(0, 10, (1000,))

dataset = CustomDataset(dummy_features, dummy_labels)
train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

6. Saving and Loading Models

After training, you'll want to save your model's state. You can save the entire model or just its state dictionary.


# Save the entire model
torch.save(model, 'model.pth')

# Save the model's state dictionary
torch.save(model.state_dict(), 'model_state_dict.pth')

# Load the entire model
# loaded_model = torch.load('model.pth')

# Load the state dictionary into a new model
# loaded_model_sd = SimpleNet()
# loaded_model_sd.load_state_dict(torch.load('model_state_dict.pth'))
# loaded_model_sd.eval() # Set model to evaluation mode

Remember to set your model to evaluation mode using model.eval() before making predictions on new data, as this disables dropout and batch normalization updates.

Pro Tip: Monitor your training and validation loss to detect overfitting. Use techniques like early stopping and dropout for regularization.

This covers the fundamental steps for training models in PyTorch. Experiment with different architectures, optimizers, and hyperparameters to achieve optimal performance for your specific tasks.