Gradient Boosting
Gradient Boosting is a powerful ensemble technique that builds models sequentially, each trying to correct the errors of its predecessor. It works by fitting a new model to the residual errors of the combined ensemble of previous models.
When to use Gradient Boosting
- Tabular data with a mix of numeric and categorical features
- When high predictive accuracy is required
- Regression or classification problems
Key Hyper‑parameters
- n_estimators – number of boosting rounds
- learning_rate – shrinkage factor (0 < lr ≤ 1)
- max_depth – depth of each weak learner (tree)
- subsample – fraction of samples used for each tree (helps reduce over‑fitting)
Python Example
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
gbr = GradientBoostingRegressor(
n_estimators=200,
learning_rate=0.1,
max_depth=3,
subsample=0.9,
random_state=42
)
gbr.fit(X_train, y_train)
pred = gbr.predict(X_test)
print("RMSE:", mean_squared_error(y_test, pred, squared=False))