Decision Tree

  • Components
    • Each node specifies an evaluation of a feature.
    • Each branch corresponds to a feature value.
    • Each leaf signifies a categorical decision (label).
  • Procedure
    1. Select the best feature for root.
    2. Construct branch for each possible feature value
    3. Split data along each branch
    4. Recurse for each branch
    5. Terminate into leaf node after adequate performance
  • Selecting the best feature = minimizing the information entropy after partitioning.
  • Information Gain (IG)
    • Features that perfectly partition should give maximal information
    • Unrelated attributes should give no information
    • Defined as expected reduction in entropy by partitioning according to the particular feature.
    • Calculate entropy in each partition, times the probability, and sum up
  • Prevent overfitting
    • Pre-pruning: halt tree construction early
    • Post-pruning: remove branches from “fully-grown” trees