Decision Tree
- Components
- Each node specifies an evaluation of a feature.
- Each branch corresponds to a feature value.
- Each leaf signifies a categorical decision (label).
- Procedure
- Select the best feature for root.
- Construct branch for each possible feature value
- Split data along each branch
- Recurse for each branch
- Terminate into leaf node after adequate performance
- Selecting the best feature = minimizing the information entropy
after partitioning.
- Information Gain (IG)
- Features that perfectly partition should give maximal information
- Unrelated attributes should give no information
- Defined as expected reduction in entropy by partitioning according to the
particular feature.
- Calculate entropy in each partition, times the probability, and sum up
- Prevent overfitting
- Pre-pruning: halt tree construction early
- Post-pruning: remove branches from “fully-grown” trees