Learning a mapping from inputs to outputs using labeled data.
Supervised Learning Models
16 questions. Use Show Answer, then slide right (or use Next) to continue.
A decision tree recursively splits the feature space using input-based rules.
It creates regions that are as homogeneous as possible with respect to the target.
Purity refers to how homogeneous the target labels are within a leaf.
- Higher purity means fewer mixed labels.
- Pure leaf: all observations belong to the same class.
- Impure leaf: observations belong to multiple classes.
Gini impurity measures expected misclassification if labels are assigned according to class proportions in a node:
$$ ext{Gini} = 1 - \sum_{k=1}^{K} p_k^2$$
- \(p_k\) is the proportion of class \(k\) in the node.
- Lower values indicate higher purity.
Entropy measures uncertainty or disorder of class labels in a node:
$$ ext{Entropy} = - \sum_{k=1}^{K} p_k \log(p_k)$$
- Lower entropy indicates a more pure node.
- During training, evaluate candidate splits on input features.
- Select the split that maximizes impurity reduction (using Gini or entropy).
- During prediction, follow the split rules to a leaf.
[ Credit Score ≥ 650 ? ]
/ \
Yes No
| |
[ Income ≥ $60k ? ] [ Debt Ratio ≤ 40% ? ]
/ \ / \
Leaf A Leaf B Leaf C Leaf D
PURE IMPURE IMPURE PURE
-------- --------- --------- --------
NoDef:20 NoDef:12 NoDef: 6 Def:15
Def: 0 Def: 8 Def: 4 NoDef:0
They can keep splitting until leaves become very pure.
- This can fit noise in training data.
- That increases variance.
- High variance.
- Sensitivity to small data changes.
- Poor generalization without constraints.
A non-parametric method that predicts outcomes based on the labels of the \(k\) closest observations in feature space.
Distance-based methods are sensitive to feature scale.
- Unscaled features can dominate distances.
- Curse of dimensionality.
- Sensitivity to irrelevant features.
- Slow prediction for large datasets.
When one class is much rarer than another.
- Models can favor the majority class.
Reducing majority-class observations to balance class representation.
Increasing minority-class observations by duplicating or generating samples.
- Down-sampling loses information.
- Up-sampling increases overfitting risk.