# Nested Cross-Validation Python Code

Want an unbiased estimation of the true error of an algorithm? This is where you are going to find it. I will explain the what, why, when and how for nested cross-validation. Specifically, the concept will be explained with K-Fold cross-validation.

Update 1: The images in this article was updated to the new theme on the site. Multiprocessing was added to the GitHub package, along with other fixes. If you have any issues, please report them on GitHub and I will try to take action!

## What Is Cross-Validation?

Firstly, a short explanation of cross-validation. K-Fold cross-validation is when you split up your dataset into K-partitions — 5- or 10 partitions being recommended. The way you split the dataset is making K random and different sets of indexes of observations, then interchangeably using them. The percentage of the full dataset that becomes the testing dataset is $1/K$, while the training dataset will be $K-1/K$. For each partition, a model is fitted to the current split of training and testing dataset.

Below is an example of K-Fold cross-validation with $K=5$. The full dataset will interchangeably be split up into a testing and training dataset, which a model will be trained upon.

The idea is that you use cross-validation with a search algorithm, where you input a hyperparameter grid — parameters that are selected before training a model. In combination with Random Search or Grid Search, you then fit a model for each pair of different hyperparameter sets in each cross-validation fold (example with random forest model).

hyperparameter_grid={
'max_depth': [3, None],
'n_estimators': [10, 30, 50, 100, 200, 400, 600, 800, 1000],
'max_features': [2,4,6]
}


## What Is Nested Cross-Validation?

First the: why should you care? Nested Cross-Validation is an extension of the above, but it fixes one of the problems that we have with normal cross-validation. In normal cross-validation you only have a training and testing set, which you find the best hyperparameters for.

1. This may cause information leakage
2. You estimate the error of a model on the same data, which you found the best hyperparameters for. This may cause significant bias.

You would not want to estimate the

[...]