Understanding how to maximize GPU utilization when training deep neural networks is key to reducing time‑to‑insight. Below we discuss best practices, tooling, and benchmark results across popular frameworks.
#!/usr/bin/env python
import torch, time
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=False).cuda()
input = torch.randn(64, 3, 224, 224).cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
def train_step():
optimizer.zero_grad()
output = model(input)
loss = output.mean()
loss.backward()
optimizer.step()
# Warm‑up
for _ in range(10):
train_step()
# Timing
start = time.time()
for _ in range(100):
train_step()
print(f"Throughput: {64*100/(time.time()-start):.2f} images/sec")
torch.backends.cudnn.benchmark = True
gave me an extra 10% boost on a RTX 3090.