Understanding how to evaluate the performance of your machine learning models is crucial. This section dives into the common metrics used to assess the effectiveness of different models.
For classification tasks, we often rely on a confusion matrix to understand true positives, true negatives, false positives, and false negatives.
Measures the overall correctness of the model. It's the ratio of correctly predicted instances to the total number of instances.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Measures the accuracy of positive predictions. It's the ratio of true positives to the total number of predicted positives (true positives + false positives).
Precision = TP / (TP + FP)
Measures the model's ability to find all the relevant cases. It's the ratio of true positives to the total number of actual positives (true positives + false negatives).
Recall = TP / (TP + FN)
The F1-Score is the harmonic mean of Precision and Recall. It's useful when you need to balance both precision and recall.
F1-Score = 2 * (Precision * Recall) / (Precision + Recall)
The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (Recall) against the False Positive Rate at various threshold settings. Area Under the Curve (AUC) summarizes the ROC curve into a single value, indicating the model's ability to distinguish between classes.
For regression tasks, we aim to predict continuous values, and metrics focus on the difference between predicted and actual values.
The average of the absolute differences between predicted and actual values. It's less sensitive to outliers than MSE.
MAE = (1/n) * Σ |y_i - ŷ_i|
The average of the squared differences between predicted and actual values. It penalizes larger errors more heavily.
MSE = (1/n) * Σ (y_i - ŷ_i)²
The square root of MSE. It's in the same units as the target variable, making it easier to interpret.
RMSE = √MSE
Represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It ranges from 0 to 1.
R² = 1 - (SS_res / SS_tot)
The choice of metric depends heavily on the specific problem and the business objective: