Decision Tree Explained (Classification)

 Decision Tree Explained (Classification)
Decision Tree Explained (Classification)

Classification and Regression Trees (CART) is one of the most used algorithms in Machine Learning, as it appears in Gradient Boosting. This means that the most popular packages like XGBoost and LightGBM are using CART to build trees.

Decision Tree is a generic term, and they can be implemented in many ways – don't get the terms mixed, we mean the same thing when we say classification trees, as when we say decision trees. But a decision tree is not necessarily a classification tree, it could also be a regression tree.

We will be exploring Gini Impurity, which helps us measure the quality of a split. Gini impurity is a classification metric that measures how we should create internal nodes and leaf nodes. This metric is different from but similar to Gini Index and Information Gain/Impurity Improvement.

Table of Contents (Click To Scroll)

  1. Terminology Of Trees
  2. Gini Impurity: Leaf Nodes
  3. Gini Impurity: Internal Nodes
  4. Example: Building A Classfication Tree
  5. Regularization For CARTs
  6. Further Reading

Terminology of Trees

Let's get the definitions rolling right away. We have root nodes, internal node and leaf nodes – each illustrated in the picture below:

  • Root node: The very first node in a tree.
  • Internal node: Nodes which are connected to deeper nodes.
  • Leaf nodes: Nodes that are not connected to deeper nodes, but have an internal node connected to it.
Decision Tree Explained (Classification)

Throughout this article, we will color code the root and internal nodes as blue, while the leaf nodes are mint green.

Some general terminology for trees is the concept of parents and children. The nodes above a certain node are called parent nodes, while the nodes below are called child nodes.

Gini Impurity For Leaf Nodes

When calculating the gini impurity for a single leaf node, we can have multiple classes $C$ – e.g. the classes "YES" and "NO". The following are the steps:

  1. Calculate the probabilities of all classes, given the samples in a Leaf $k$
  2. Square the calculated probabilities, i.e. raised to the power of 2
  3. Sum all the squared probabilities into a single integer
  4. Subtract the single integer from $1$

The probabilities are easy to find. If the class is "YES", then we simply count the number of observations labelled "YES" in $Leaf_k$ and divide it by the total number of observations. We will see exactly how this is done later on.

Gini(Leaf_k) = 1 sum_{class=1}^{C , classes} -(probability(class|Leaf_k))^2
Gini(L_k) =
p(c|L_k) right

We have ex


Source - Continue Reading:


Related post