Continuous variables are represented as median (I and III quartiles); categorical variables are represented as frequencies (percentages). The low number of subjects enrolled in our study poses some issues in the development of the statistical analysis plan. In order to overcome these issues, we carried out statistical analysis using an approach based on random forest (RF), one of the most popular machine learning techniques (MLTs) [
9]. RF belongs to the ensemble-of-trees methods, a family of algorithms that make predictions aggregating different regression and classification trees (CART), an algorithm that divides the space of the explanatory variables into different regions by recursive binary splitting and performs prediction in each region [
10]. RF works by growing one CART on several bootstrap replicates of the original data and then averaging the predictions returned by each tree. Each tree is built considering only a subset of the available predictors, thus allowing the implementation of several CARTs that can be very different from each other. The main idea is to use several “weak” CARTs that can have high predictive performances when pooled together. The adoption of RF in the current analysis was indeed motivated by the fact that such recursive partitioning on the variable space allows naturally dealing with clinical databases with more variables than subjects [
11] which is normally intractable with standard statistical regression models. To identify the exercise testing parameters that were associated with the severity of symptoms, we used the Boruta algorithm, a variable selection approach based on RF [
12]. Boruta algorithm aims at identifying which are the relevant predictors that impact the outcome of interest. It basically consists of implementing an RF on an augmented set of covariates, where the additional covariates, called shadow variables, are copies of the original ones obtained by permuting the observations and thus removing the eventual association with the outcome. For each explanatory variable, an importance measure is computed, i.e. the Z-score, which is the average improvement in the predictive performance of the RF with the considered explanatory variable divided by its standard deviation. The important predictors are those that show a Z-score higher than the one observed for the variable with the maximum Z-score among the shadow variables. The procedure is repeated until an importance measure is assigned to each predictor or until the maximum number of RF is reached. A Boruta algorithm was implemented for each of the three symptoms: dyspnea, muscle effort, and muscle pain. For computational reasons, the three outcomes, measured on the Borg scale, were rescaled such that they ranged between 0 and 1. The association with the outcomes was tested for the following pulmonary function parameters: VO
2 peak (mL/min and mL/kg/min), WR (watts), heart rate [HR peak (bpm, %pred)], O
2 pulse peak (mL/min/beat), HR/VO
2 slope, VO
2/work slope, minute ventilation [V
E peak (L)], breathing reserve as % of maximum voluntary ventilation [BR %MVV], V
E/VCO
2 slope, V
E/VO
2 slope, respiratory exchange ratio (RER max), P
aCO
2 at rest (mmHg), P
aCO
2 peak (mmHg), P(a–ET)CO
2 peak (mmHg), A-aO
2 peak (mmHg), P
aO
2 at rest (mmHg), P
aO
2 peak (mmHg), pH at rest and peak, bicarbonate [HCO
3−] at rest and peak (mmol/L), base excess (BE) at rest and peak (mmHg), [K
+] at rest and peak (mmol/L), BMI, age, gender. To understand how the pulmonary function parameters selected by the Boruta algorithm impact the symptoms on the Borg scale, we grew a CART for each outcome using the predictors identified by the algorithm. The CARTs were tuned using repeated k-fold cross-validation, setting the number of folds equal to 5 and repeating the operation 10 times [
13].
Statistical analysis was performed using R statistical software (version 3.5.2) [
14]. The Boruta algorithm was implemented using the
Boruta R package (version 6.0.0) [
15], whereas the CARTs were implemented using the
rpart R package (version 4.1-10) [
16].