CISC483 Lecture 8: Discretizing Methods (Entropy & Chi-Squared)
Document Summary
Entropy-based discretization use entropy to split numeric valued attributes into intervals work top-down, starting with the whole interval, identifying a split point, and then recursively deciding whether to split the interval further. Find the weighted entropy for all possible split points. Choose the division where the change in entropy is greatest (smallest weighted entropy) the smallest weighted entropy will never be between 2 like values. The split will be at the midpoint between the two values. Keep doing this until you reach a stopping criteria. Minimum distance length (mdl) principle we won"t be discussing or tested on this likely to produce a discretization useful for learning (since it uses class information) one of the best supervised discretization techniques time consuming. Method start with n intervals every value is its own individual interval at the start. Find chi-squared for every possible pair of intervals (all values when starting)