Since random forest (RF) plays an important role in the simulation and is not a standard method, it is described in more detail in the following. A short overview of the construction of RF can be found at the end of the section. Random forest is an ensemble of classification or regression trees (CARTs). For the case of a binary outcome, the classification trees are used. The trees in RF are unpruned. This means that each tree is grown to the largest extent possible, but may require a minimum node size of terminal nodes, usually 1. Each tree is grown in a bootstrap subsample drawn from the original training sample. Let
N be the number of individuals in the whole training sample and
N
∗ the size of a bootstrap sample. In usual,
N
∗=
N for sampling with replacement and
N
∗<
N for sampling without replacement. For the simulation, sampling with replacement was used. The number of independent individuals in the bootstrap sample is then
\(\acute {N^{*}} \approx 0.632\cdot N\), see [
28]. The remaining
\(N-\acute {N}^{*}\) individuals are out-of-bag (OOD) and can be used for an internal validation or out-of-bag prediction, which is not explained here further. One of the tuning parameters of RF is the number of trees to be generated. Let
B be the number of trees, and consequently the number of bootstrap samples in RF, often also called
ntree. Another source of diversity in RF is the fact that not all predictor variables are used at the same time, rather a set of randomly selected predictors is used in each node for split in a tree. Let
m identify the total number of predictors available in the training sample and
mtry the number of predictors randomly chosen in each node. Consequently,
mtry is another tuning parameter of RF. The default for classification problems is usually
\(\left \lceil \sqrt {m}\right \rceil \). Classification trees use a splitting function called Gini-index to determine which attribute to split on and what the best cutoff is. Gini-index is defined as
G
k
=2
f(1−
f), where
f represents the fraction of events assigned to node
k. In contrast to using one classification tree, RF returns not only the classification decision but can also estimate the predicted probability for an event. For
B trees in RF, the predicted probability for a new individual is:
$$\hat{P}(y=1|\mathbf{x})=\frac{1}{B}\sum_{b=1}^{B}\pi_{b}(\mathbf{x}), $$
where
π
b
(
x) is the majority vote in terminal node where the new individual is dropped in for
bth tree, so the classification decision of a single tree for outcome status
y∈{0,1}, given the covariate matrix
x. For more information and features of RF see [
29,
30].