Sunday, February 26, 2017

Out of Bag and k-Fold Validation

Two of the main validation techniques for CART models are Out-Of-Bag (OOB) validation and k-Fold validation.

OOB - Used mainly for Random Forests.
k-Fold - Used mainly for XGB models

Out-Of-Bag (OOB) Validation:

OOB validation is a technique where each tree sample not used in the construction of the current tree becomes the test set for the current tree.

As we know, in a random forest, a random selection of data and/or variables is chosen as a subset for training for each tree. This means that only a sample of the entire training set is used for training a tree. The remaining points belong to the out-of-bag set and is used for validation.

k-Fold Cross Validation:

Keeping a fixed set of data points for validation might not be conducive for models like XGB. Hence, a k-fold validation is used. In a k-fold method, the entire dataset is divided into k folds. One of the fold is used for validation and the others are used for training. The final performance metric is the average of the metric of each fold.

Usually, the number of folds is taken to be 10.

The following illustration better explains a 10-fold cross validation.


No comments:

Post a Comment