A table summarizing outcomes: TP, FP, TN, FN.
Model Evaluation & Performance Metrics
17 questions. Use Show Answer, then slide right (or use Next) to continue.
With class imbalance, predicting the majority class can yield high accuracy but poor minority detection.
$$\text{Precision}=\frac{TP}{TP+FP}$$
$$\text{Recall}=\frac{TP}{TP+FN}$$
When false positives are more costly than false negatives.
When false negatives are more costly than false positives.
$$F1=2\cdot\frac{\text{Precision}\cdot\text{Recall}}{\text{Precision}+\text{Recall}}$$
A plot of TPR vs FPR as the classification threshold varies.
AUC measures average performance across all thresholds and reflects ranking quality:
$$\mathrm{AUC}=P\big(\hat s(x^+) > \hat s(x^-)\big)$$
Under severe class imbalance, ROC-AUC may look strong while positive-class performance is poor.
A plot of precision vs recall across thresholds.
When the positive class is rare and imbalance is high.
Choosing a cutoff that balances errors according to business costs.
Whether predicted probabilities match observed frequencies (e.g., scores near 0.7 are correct about 70% of the time).
$$\mathrm{MAE}=\frac{1}{n}\sum_{i=1}^n |y_i-\hat y_i|$$
$$\mathrm{RMSE}=\sqrt{\frac{1}{n}\sum_{i=1}^n (y_i-\hat y_i)^2}$$
- Offline: historical/holdout evaluation
- Online: production evaluation (e.g., A/B tests)