Gradient Boosting Explained
Table of Contents
What is Gradient Boosting?
Gradient Boosting is an ensemble technique that builds models sequentially, each trying to correct the errors of its predecessor. By fitting new models to the negative gradient of the loss function, it optimizes the overall predictive performance.
How It Works
The core idea is simple:
- Start with an initial prediction (often the mean for regression or log‑odds for classification).
- Compute the residuals (the gradient of the loss).
- Fit a weak learner (usually a shallow decision tree) to these residuals.
- Update the ensemble by adding the new learner, scaled by a learning rate.
- Repeat for a fixed number of iterations or until convergence.
Algorithm Pseudocode
F0(x) = argmin_γ Σ L(yi,γ)
for m = 1 to M:
r_i = -[∂L(yi, F_{m-1}(xi)) / ∂F_{m-1}(xi)]
h_m(x) = FitTree(x, r)
γ_m = argmin_γ Σ L(yi, F_{m-1}(xi) + γ h_m(xi))
F_m(x) = F_{m-1}(x) + η·γ_m·h_m(x)
return F_M(x)
Example & Visualization
Below is a simple demonstration of how loss decreases over iterations.
Practical Tips
- Use a small learning rate (e.g., 0.01–0.1) and increase the number of trees.
- Limit tree depth (3–5) to keep learners weak.
- Apply regularization: subsample rows (
subsample) and columns (colsample_bytree). - Monitor out‑of‑bag or validation loss to prevent over‑fitting.