Prediction Models for Bronchopulmonary Dysplasia in Preterm Infants: A Systematic Review

Peng, Hai-Bo; Zhan, Yuan-Li; Chen, You; Jin, Zhen-Chao; Liu, Fang; Wang, Bo; Yu, Zhang-Bin

doi:10.3389/fped.2022.856159

SYSTEMATIC REVIEW article

Front. Pediatr., 12 May 2022
Sec. Neonatology
Volume 10 - 2022 | https://doi.org/10.3389/fped.2022.856159

Prediction Models for Bronchopulmonary Dysplasia in Preterm Infants: A Systematic Review

Hai-Bo Peng¹

Yuan-Li Zhan¹

You Chen¹

Zhen-Chao Jin¹

Fang Liu¹

Bo Wang²

Zhang-Bin Yu^3,4*

¹Department of Neonatology, Affiliated Shenzhen Baoan Women’s and Children’s Hospital, Jinan University, Shenzhen, China
²Department of Pediatrics, The Affiliated Suqian First People’s Hospital of Nanjing Medical University, Suqian, China
³Department of Neonatology, Shenzhen People’s Hospital, The Second Clinical Medical College, Jinan University, Shenzhen, China
⁴The First Affiliated Hospital, Southern University of Science and Technology, Shenzhen, China

Objective: To provide an overview and critical appraisal of prediction models for bronchopulmonary dysplasia (BPD) in preterm infants.

Methods: We searched PubMed, Embase, and the Cochrane Library to identify relevant studies (up to November 2021). We included studies that reported prediction model development and/or validation of BPD in preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight. We extracted the data independently based on the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). We assessed risk of bias and applicability independently using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).

Results: Twenty-one prediction models from 13 studies reporting on model development and 21 models from 10 studies reporting on external validation were included. Oxygen dependency at 36 weeks’ postmenstrual age was the most frequently reported outcome in both development studies (71%) and validation studies (81%). The most frequently used predictors in the models were birth weight (67%), gestational age (62%), and sex (52%). Nearly all included studies had high risk of bias, most often due to inadequate analysis. Small sample sizes and insufficient event patients were common in both study types. Missing data were often not reported or were discarded. Most studies reported on the models’ discrimination, while calibration was seldom assessed (development, 19%; validation, 10%). Internal validation was lacking in 69% of development studies.

Conclusion: The included studies had many methodological shortcomings. Future work should focus on following the recommended approaches for developing and validating BPD prediction models.

Introduction

Preterm infant survival has increased in the last three decades (1–3), while bronchopulmonary dysplasia (BPD) remains the most prevalent serious complication of prematurity, affecting 10.8–37.1% of preterm neonates born at 240/7 to 316/7 weeks’ gestational age and birth weight <1,500 g (4). As survivors with BPD have high risk of poor long-term pulmonary and neurodevelopmental outcomes in childhood and even adulthood (5–8), it is imperative to optimize BPD prevention and treatment strategies. Early identification of infants at risk of developing BPD would benefit preventive interventions when airway injury is still functional and reversible. To aid health care providers in estimating the probability of BPD occurrence in the future and to inform decision-making, many models for predicting BPD have been established in recent years. Nevertheless, such models are often of variable quality and yield inconsistent findings, leading to confusion or uncertainty among health care providers regarding which model to use or recommend.

In a 2013 systematic review, Onland et al. reported 26 prediction models for assessing the probability of BPD or death in all preterm infants born at <37 weeks’ gestation, where most existing clinical prediction models were poor to moderate BPD predictors (9). Furthermore, during that review, no guides for systematic reviews of prediction modeling studies or standardization tools for assessing the prediction models’ risk of bias (ROB) were available. Since then, more BPD prediction modeling studies have been published, whereas systematic reviews of such studies have not yet been updated in the last 9 years. The guideline CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) has been available since 2014 (10), and the Prediction model Risk Of Bias ASsessment Tool (PROBAST) for assessing the ROB and applicability of prediction model studies has been available since 2019 (11).

Accordingly, the present systematic review was aimed at updating the systematic review of BPD prediction models and critically evaluating the methods and reporting of studies that developed or externally validated prediction models for BPD in preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight based on the CHARMS checklist and PROBAST.

Methods

This systematic review of all studies on prediction models for BPD in preterm infants is reported according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (12).

Search Strategy

PubMed (MEDLINE), Embase, and the Cochrane Library were systematically searched from inception through to 12 November 2021, for studies reporting prediction models of BPD in preterm infants. We identified relevant studies and maximized search accuracy using the following terms: BPD, chronic lung disease, preterm infants, and prediction. The online Supplementary Material 1 shows the electronic search strategies. The search was not limited by language.

Eligibility Criteria

Articles were included if: (1) the target population was preterm infants born at ≤32 weeks and/or ≤1,500 g birth weight; (2) the study detailed prediction model development and/or external validation; (3) the main prediction outcome was BPD, defined as oxygen requirement at 28 days of life (BPD28) and/or oxygen requirement at 36 weeks’ postmenstrual age (PMA) (BPD36); (4) the model was constructed with at least two predictors; and (5) the purpose of the model was for predicting BPD in preterm infants from the first 2 weeks of life. Articles were ineligible when the studies used the data of infants born before 1990, as surfactant was not routinely used before this year (pre-surfactant era); if the outcome to be predicted was the composite outcome “BPD or death”; when the prognostic use of lung ultrasound scores (LUS) was investigated; when the study was conducted at high altitudes; when it was only a methodological study; when the article was not published in English; or when the article was a conference abstract, review, or letter.

Study Selection and Data Extraction

Two reviewers independently screened the titles, abstracts, and full texts in duplicate for eligibility. In case of discrepancies, a third reviewer was involved to establish consensus. The reviewers used a standardized data extraction form based on the CHARMS checklist (10). The following items were extracted from the studies on prediction model development: study design, study population, predicted outcome and time horizon, intended moment of model use, number of candidate predictors, sample size, number of events, missing data approach, variables selection method, modeling method, model presentation, predictors included in the final model, internal validation method, and assessment of model performance (i.e., discrimination and calibration). The following items were extracted from the prediction model external validation studies: study design, study population, predicted outcome and time horizon, intended moment of model use, sample size, number of events, missing data approach, and assessment of model performance (i.e., discrimination and calibration). The events per variable (EPV) was defined as the number of events divided by the number of candidate predictor variables used. The outcome BPD28 was defined as oxygen dependency at 28 days of life; BPD36 was defined as oxygen dependency at 36 weeks PMA.

Assessment of Bias

We assessed the ROB and applicability of each article with PROBAST. PROBAST consists of 20 signaling questions across four domains (participants, predictors, outcome, and analysis). The ROB and applicability of original studies were classified as high, low, or unclear for each domain via comprehensive evaluation. Only if each domain had low ROB would a study be classified as overall low ROB.

Model Performance

The results of the development and external validation studies were summarized by using descriptive statistics. If an article described the development or external validation of multiple (existing) models, separate data extraction for each model was conducted. Each model’s predictive performance, including model discrimination and calibration measures, was extracted. Discrimination is often quantified by the C statistic. The C statistic is the most commonly used measure for determining the discriminative performance for binary outcomes. Generally, a C statistic < 0.6 is considered poor, a C statistic between 0.6 and 0.75 is possibly helpful, a C statistic > 0.75 is clearly useful (13). Calibration is often quantified by the calibration intercept and calibration slope.

Results

After excluding duplicates, the initial search returned 5,749 articles. After title and abstract screening, 106 articles were provisionally selected for full-text screening. Subsequently, 88 articles were excluded, among which 11 articles used the composite outcome “BPD or death.” In total, 18 studies (14–31) were included in this systematic review (Figure 1). Eight studies (14, 16, 19, 21, 22, 25–27) described model development without external validation, five studies (15, 17, 24, 29, 30) described model development with external validation in independent data, and five studies (18, 20, 23, 28, 31) described external validation with or without model updating.

FIGURE 1

Figure 1. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram.

Characteristics of Studies Describing Bronchopulmonary Dysplasia Prediction Model Development

Thirteen studies described BPD prediction model development, in which 21 models were developed. Table 1 shows the key characteristics of study design, study population, outcome, and intended moment of model use in the included model development studies. Table 2 shows the study and performance characteristics of the developed models.

TABLE 1

Table 1. Design characteristics of the 13 studies describing the development of BPD prediction models.

TABLE 2

Table 2. Study and performance characteristics of the developed prediction models.

Study Design

Eleven included studies (85%) originated from registry or prospective cohorts; two studies (15%) were derived from retrospective cohorts. The data used for developing the models were collected between 1997 and 2019. Of all 13 model development studies, four (31%) used only gestational age as the inclusion criterion, three studies (23%) used only birth weight as the inclusion criterion, and six studies (46%) used both gestational age and birth weight as inclusion criteria. All studies were developed based on statistical methods. Twelve studies (92%) used logistic regression as the prediction modeling approach; one study (8%) used machine learning.

Outcome to Be Predicted

The outcome to be predicted in all included studies was BPD, yet the definitions of BPD varied across the models. Six models (29%) used BPD28 as the primary outcome; the median incidence was 29% (range, 22–50%). Fifteen models (71%) used BPD36 as the primary outcome, with values of 11–56% (median, 22%). Eighteen models (86%) were developed to predict the risk of developing BPD within 7 days of life, and three models (14%) were developed to be used between 7 and 14 days of life.

Predictors

Ten of the 13 studies reported the number of candidate predictors considered for inclusion in the BPD prediction models, with 12–31 candidate predictors (median, 15). Two to 11 predictors were included in the final model (median, 5). Five studies (38%) used univariable analysis to select predictors in the multivariable analyses.

Figure 2 shows the predictors included in the final prediction models. Nineteen models (90%) used perinatal variables, 7 studies (33%) used antenatal variables, and 17 models (81%) used postnatal variables. The most frequently included predictor in the 21 prediction models was birth weight (n = 14, 67%), followed by gestational age (n = 13, 62%), sex (n = 11, 52%), 5-min Apgar score (n = 6, 29%), respiratory distress syndrome (n = 6, 29%), mechanical ventilation (n = 5, 24%), antenatal steroids (n = 4, 19%), maternal hypertensive disorders (n = 4, 19%), surfactant (n = 4, 19%), and patent ductus arteriosus (n = 4, 19%).

FIGURE 2

Figure 2. Predictors included in the final development models.

Sample Size

The models were developed with 37–18,858 participants (median, 1,225), and there were 18–4,986 events (median, 159). The EPV could be calculated in 16 models (76%) with a median of 59 and a range of 1–416. The EPV was <10 in 31% of the models in which it was calculated.

Missing Data

Seven studies (54%) did not mention missing data. Six studies (46%) mentioned the methods for addressing missing data, where they all used complete case analysis.

Model Presentation

Presentation was available for 12 models (57%). Five models were presented as regression formulae, two models were presented as scoring systems, four models were presented as web calculators, and one model was presented as both a regression formula and web calculator.

Apparent Predictive Performance

Twelve studies (95%) assessed discrimination with the C statistic, with values of 0.76–0.97. Calibration was assessed for four models (19%), two models used the Hosmer–Lemeshow goodness-of-fit test, and one model used calibration plots.

Internal Validation

Nine studies (69%) did not report internal validation of the developed models. Nine models developed in four studies were internally validated. Validation was performed for five models (56%) with split sampling, in one model (11%) with cross-validation, and in three models (33%) with bootstrapping.

Risk of Bias and Applicability Assessment of the Included Model Development Studies

Figure 3 shows a summary of the ROB and applicability for all developed models. For the domain outcome, the ROB of all models was considered low, as a broad definition of BPD was accepted. There was high participants’ domain-related ROB in 29% of the models. For the domain predictors, 33 and 67% of the models had high and low ROB, respectively. The domain analysis was assessed as having high ROB in all prediction models. No study handled missing data appropriately, as information on missing data was rarely reported or participants with missing data were omitted. Prediction model calibration was insufficient, as only one study reported calibration plots, while the other studies did not report calibration or only used the Hosmer–Lemeshow test. In summary, the overall ROB was high across all models.

FIGURE 3

Figure 3. Risk of bias and applicability assessment of developed models using Prediction model Risk Of Bias ASsessment Tool (PROBAST).

When the 21 models were assessed according to applicability concerns, 24% of the models were assessed as high concern due to the inclusion of participants different from those in our research question (n = 4) or inconsistency between predictors and the review question (n = 5).

Characteristics of Studies Describing External Validation of the Bronchopulmonary Dysplasia Prediction Models

We included 10 studies that externally validated 21 BPD prediction models (Table 3). Five of these studies also described prediction model development. Table 4 shows the study and performance characteristics of the validated models.

TABLE 3

Table 3. Design characteristics of the 10 studies describing external validation of BPD prediction models.

TABLE 4

Table 4. Study and performance characteristics of externally validated models.

Models Validated

The most frequently validated models were CRIB-II (Clinical Risk Indicator fores-II) and SNAP-II (Score for Neonatal Acute Physiology-II); both were externally validated twice. The other models were externally validated once.