Gradient Boosting Explained

What is Gradient Boosting?
How It Works
Algorithm Pseudocode
Example & Visualization
Practical Tips

What is Gradient Boosting?

Gradient Boosting is an ensemble technique that builds models sequentially, each trying to correct the errors of its predecessor. By fitting new models to the negative gradient of the loss function, it optimizes the overall predictive performance.

How It Works

The core idea is simple:

Start with an initial prediction (often the mean for regression or log‑odds for classification).
Compute the residuals (the gradient of the loss).
Fit a weak learner (usually a shallow decision tree) to these residuals.
Update the ensemble by adding the new learner, scaled by a learning rate.
Repeat for a fixed number of iterations or until convergence.

Algorithm Pseudocode

F0(x) = argmin_γ Σ L(yi,γ)
for m = 1 to M:
    r_i = -[∂L(yi, F_{m-1}(xi)) / ∂F_{m-1}(xi)]
    h_m(x) = FitTree(x, r)
    γ_m = argmin_γ Σ L(yi, F_{m-1}(xi) + γ h_m(xi))
    F_m(x) = F_{m-1}(x) + η·γ_m·h_m(x)
return F_M(x)

Example & Visualization

Below is a simple demonstration of how loss decreases over iterations.

Practical Tips

Use a small learning rate (e.g., 0.01–0.1) and increase the number of trees.
Limit tree depth (3–5) to keep learners weak.
Apply regularization: subsample rows (subsample) and columns (colsample_bytree).
Monitor out‑of‑bag or validation loss to prevent over‑fitting.