Introduction
While spirometry is the gold standard method for assessing lung function in older children and adults, performing it is not possible when a forced expiratory maneuver cannot be successfully achieved, such as in young children [
1]. The forced oscillation technique (FOT), used in children, measures, at different frequencies, the mechanical behaviour of the respiratory system, including resistance (Rrs), reactance (Xrs), resonance frequency (Fres), frequency dependence (Fdep) and the area under reactance curve (AX) [
2‐
4]. As all these measures change physiologically with growth, reporting their Z score values facilitates their interpretation across a wide range of anthropometric measures throughout childhood [
5‐
7]. Establishing reference data for FOT in children is, therefore, essential.
Many studies have already reported FOT reference data in healthy children as a function of several factors, such as age, weight, height, gender, ethnicity and equipment used [
8‐
17]. In all these studies, the prediction equation of each FOT parameter was derived from a multivariable regression model that included one or more of the following explanatory variables: age, gender, weight and height. The significant coefficients of these models were used to build the respective prediction equations [
5,
6].
All multivariable modeling strategies have strict assumptions and several limitations which, when violated, may affect the accuracy of the results, their proper interpretation by the reader and their use in clinical care and in future research [
18‐
25]. The majority of clinicians and researchers rely on the editorial and peer review processes of the journals to ensure that the statistical methods in the articles have been appropriately used and correctly interpreted [
26‐
28].
One assumption, often neglected in multivariable regression models, is collinearity. It occurs whenever there is a high correlation between two explanatory variables and is called multicollinearity in case of correlations between three or more variables. Both terms will be used interchangeably in the text. Collinearity creates very unstable estimated regression coefficients caused by redundant information, because the effect of correlated variables overlaps, making it impossible to accurately estimate the independent effect that each variable has on the studied outcome. It affects the estimations of individual predictors because the coefficient estimates will change erratically in response to small changes in the model or the data [
29]. Collinearity also inflates the standard errors of these estimates, causing inaccurate and inflated variances. This affects the reliability of the confidence intervals estimation and leads to incorrect inferences about relationships between explanatory and response variables. Variables with no significant relationship with the outcome, when considered in isolation, might then become highly significant when considered in conjunction with collinear variables, resulting in an increased risk of false-positive results (Type I error). Similarly, several coefficients might show no statistical significance due to incorrectly estimated wide confidence intervals, resulting in an increased risk of false-negative results (Type II error). Furthermore, although the collinear variables may sometimes remain statistically significant, the sign of their regression coefficient might be the reverse of what would be expected (from positive to negative coefficients, or vice-versa) [
30]. Thus, erroneous conclusions might be drawn about the relationships between explanatory and response variables. Although the reporting quality and reliability of models constructed by researchers can always be improved by editorial and peer review processes, as well as a statistical review system, [
26,
31] collinearity is often ignored as studies have shown that it is systematically checked in only 1–2% of published articles [
27,
28].
In this study, we aimed to evaluate the role of collinearity in previously published pediatric FOT reference articles which are often cited in most manuscripts. We reviewed several publications in children since 2005, to estimate if collinearity had been taken into consideration before modeling and reporting the predictive regression equations. Furthermore, to illustrate the impact of collinearity on the interpretation of the coefficients in such models, we also analyzed, hypothetically, the effect that collinearity might have had on the findings in our own report in which we constructed FOT reference data for children in our community (AlBlooshi, unpublished data).
Discussion
The observed differences in the results between these studies which we have compared can be attributed to the use of different equipment for FOT measurements, using differing characteristics of perturbation signals of impulse oscillometry versus composite sinusoidal FOT signaling, for example (Table
1). This, however, was not the main purpose of our study. Our aim was to demonstrate that, if collinearity is not considered in a study, the resulting prediction equation obtained using any equipment may be incorrect, as illustrated with our own data. This error may inflate further the differences observed between studies using different equipment.
None of the ten reviewed studies had stated if collinearity was checked for, confirming prior reports [
27,
28]. We were, however, unable to determine whether the statistical analyses were incorrect, if the authors had deemed unnecessary to check for collinearity, if they had valid but undeclared reasons to make exemptions or simply reported an incomplete methodology. Of concern, however, is that half of those reviewed reports still included in their equations explanatory variables which are physiologically correlated, such as age, weight and height. Clearly, it is biologically implausible if these three variables were not correlated. Depending on which published equation a clinician may use, a 28% difference in the predicted Rrs may occur with potential impact on the quality of care.
In our own study, we found multicollinearity between the explanatory variables initially considered for the regression model (age, weight and height). Its effects included the wide variations in the coefficients of the explanatory variables, their changing signs (positive or negative), their wide confidence intervals, their changing significance level and the different results of the model goodness of fit obtained by the different hypothetical models. In addition, the centered VIF values of most coefficients was > 2.5, with an average of 3.3, significantly higher than one, constituting further evidence of collinearity in the models [
32]. A 11.7% difference in the predicted Rrs in our population, depending on the model in use, with and without collinear variables, cannot be inconsequential.
As there is no automatic warning of the presence of multicollinearity in many statistical packages, it is necessary for the researchers to check for it systematically before constructing multivariable models. Several methods exist to identify multicollinearity. A simple rule of thumb is to first test the explanatory variables for correlation. Another commonly used measure is the variance inflation factor (VIF), defined as VIF = 1/(1-R
2i) where R
2i is the R
2 for a covariate x
i regressed on the remaining covariates in a separate regression. It indicates the strength of the dependencies and quantifies the collinearity-induced inflation of the variances of each regression coefficient compared to when the independent variables are not correlated. Although there are no formal rules, it is generally accepted that a VIF value exceeding 10 is often regarded as indicating multicollinearity, while values above 2.5 should also be a cause for concern [
33,
34]. Unexpected changes in the direction of association between the outcome and an explanatory variable (from positive to negative coefficient, or vice-versa) is also a common result of collinearity [
35].
To avoid the detrimental effects of collinearity on a regression mode, several methods have been suggested. Redundant collinear or duplicate explanatory variables are often removed [
36,
37]. Collinear variables can also be combined into a single index. One method is centering, which involves the creation of a new covariate or an interaction term (usually by multiplication) between two collinear variables, after having centered their initial values (i.e. transforming them by subtracting the calculated mean from their individual value) [
29,
35]. Principal component analysis (PCA), or factor analysis, is also useful to eliminate the effect of multi-collinearity and also to eliminate the indirect effect of imperfect parameters [
29,
38].
Conclusion and recommendations
An improvement in the construction and reporting of multivariable regression models would undoubtedly help the reader in appropriately interpreting the data. Researchers should systematically adopt robust diagnostics for collinearity, report them and use appropriate procedures to eliminate them, prior to constructing the final model and establishing the predictive equation coefficients. A closer cooperation with statisticians and epidemiologists would be very constructive in that regard. Journals should also develop statistical reporting guidelines concerning multivariate regression models [
26,
31,
39]. The regression models and their results in the submitted manuscripts should be verified at the editorial level, by the peer reviewers and also require a formal statistical review [
26,
31]. The accurate, reliable and responsible transmission of scientific knowledge from the researcher to the reader requires no less.