Background
Very premature and very low birth weight (VLBW) infants are at high risk of mortality and morbidities. Effective outcome prediction and benchmarking, for parental counseling, quality improvement and informing the wider community, have their foundation in the outcome statistics of infant cohorts [
1]. There are two established methods of cohorting high-risk infants, by birth weight (for example, VLBW, <1500 g) or by gestational age (for example, very low gestational age [VLGA], <32 weeks), with the relative advantages of each yet to be determined. There has been increasing acceptance of gestational age (GA) based cohorting in recent literature [
2‐
5], following studies such as that by Arnold et al. in 1991 [
6] and Blair et al. in 1996 [
6,
7], which raised concerns that VLBW cohorts may be inherently biased.
Birth weight (BW) is dependent on two separate influences; GA at birth and fetal growth rate [
8]. It follows that a VLBW cohort may contain infants at any point along a spectrum from very preterm and sized appropriately for their GA (AGA) to small for gestational age (SGA). There is an inherent selection bias toward SGA infants in VLBW cohorts, which becomes more pronounced at higher gestations, as the birth weights of AGA infants become greater than 1500 g [
6,
7,
9]. This disproportionate SGA percentage is exemplified in published studies that have used VLBW cohorts, wherein 19% to 40% of infants were SGA [
10‐
14]. A skewing of risk toward poorer outcome would be expected even in multivariate analyses because high-risk SGA infants lack an equivalent AGA control for adjustment within the cohort. In comparison, GA is independent of BW and fetal growth rate [
7], and hence fetal growth and BW for GA show a normal distribution in VLGA cohorts [
6]. SGA proportion, by definition, will remain close to 10% across all published cohorts, where SGA percentage ranged from 9.2–12% [
15,
16].
Meaningful international examination of neonatal outcomes is currently limited by the variations in reporting between nations [
17], as direct comparison of neonatal outcomes through benchmarking requires prior standardization of the infant cohorting method used for data collection and reporting. The World Health Organization changed its standard cohorting practice to GA-based in 1961 [
18], but some studies and analyses persist in the use of BW criteria [
1,
19‐
21].
The overall aim of this study was to evaluate and compare the predictive power of prediction models developed using VLGA and VLBW-based cohorts. It was hypothesized that predictive power of the VLGA-based models would be significantly better than that of the VLBW-based models across all networks because it would reduce the selection bias introduced by the disproportionately high number of SGA infants.
Methods
De-identified clinical data were obtained from the Australian and New Zealand Neonatal Network (ANZNN), Canadian Neonatal Network (CNN) and Swedish Neonatal Quality Register (SNQ) for all infants born either at <32 weeks gestational age or with birth weights <1500 g, who were admitted to participating NICUs in between January 2008 and December 2011. Networks were selected because of their intrinsic similarity, with comparable demographics and healthcare systems. All three networks have the registration criteria for data collection if admitted infants are either <32 weeks or <1500 g. Infants were also excluded if they were moribund (died within the first day of admission without being offered mechanical ventilation or intensive care) or had major congenital anomalies.
The parameters for data collection in each of the network databases were compared. Definitions of outcomes and variables to be analyzed were standardized by consensus a priori. National preterm BW percentiles were examined for each network, and found to be very similar in Australia [
22] and Canada [
23], however in-utero growth charts were used in the SNQ [
24], and therefore the Swedish percentiles were not comparable. For this reason, Canadian BW percentile charts were applied to all infants to define SGA and BW z score.
For the study period, ANZNN data comprised all 29 tertiary hospitals in Australia/New Zealand; CNN comprised 28 of 30 tertiary hospitals in Canada; and SNQ all 25 hospitals with neonatal units in 6 of the 7 health care regions of Sweden. Study data were available through the iNeo (International Network for Evaluating Outcomes in Neonates) project housed at Mother-Infant Care Research Center, Mount Sinai Hospital, University of Toronto, Canada.
The primary outcome studied was in-hospital mortality. The secondary outcome was composite adverse outcome (CAO), defined as in-hospital mortality or a pre-discharge diagnosis of any major neonatal morbidities of chronic lung disease (CLD), serious neurological injuries (SNI) including intraventricular hemorrhage grade III or IV [
25] or periventricular leukomalacia, severe retinopathy of prematurity stage 3 or more (ROP) [
26] and radiologically or pathologically proven necrotizing enterocolitis (NEC) [
27], Consensus outcome definitions are provided in Additional file
1
: Table S1. Nosocomial infection was not included in CAO but was included in descriptive analyses, as rate of NI may be used as a marker of patient safety and healthcare effectiveness and outcomes, and hence has relevance for comparison between international cohorts [
28].
Data from all networks were amalgamated and formed into two overlapping cohorts of infants less than 32 weeks (VLGA) and/or less than 1500 g (VLBW). Originating network was added as a covariate for subsequent analysis. Of the two overlapping VLBW and VLGA cohorts, two-thirds (balanced for network) were randomly selected, using a split sample method [
29], to form the derivation samples for development of two prediction models. The remaining one-third of infants from each cohort formed the internal validation samples, for assessment of predictive power on independent samples.
Prediction models were developed for mortality and CAO by multivariable logistic regression with backwards procedures using exclusion criteria of 0.05, according to methodology validated in previous population studies [
30,
31]. The interaction of BW z-score and GA was also included as a covariate in multivariable analysis to adjust for the varying confounding effects of growth status and maturity.
Analysis for prediction power was conducted for each model on both the VLGA and the VLBW validation samples, which consisted of the VLBW and VLGA validation samples, and two mutually exclusive “extreme” subcomponents of infants <1500 g but ≥32 weeks, and infants <32 weeks but ≥1500 g. Prediction power was assessed using area under the Receiver Operating Characteristic (ROC) curve [
32‐
34]. An AUC of >0.80 is generally accepted as excellent prediction [
35]. AUC of each prediction was compared. Goodness-of-fit was determined by use of the Hosmer-Lemeshow test [
13] to test for systematic over or underestimation of outcomes by the model [
36]. Data management and analyses were performed using SAS 9.3 [
37] and R 2.10.15 [
38]. A two-sided significance level of 0.05 has been used without adjustment for multiple comparisons.
ANZNN data collection, access and use of de-identified data for audit and research was approved by all relevant institutional research ethics committees of each NICU hospital (see list of hospitals in Acknowledgement) in Australia, and by the New Zealand Multi-regional Ethics Committee for all the New Zealand hospitals listed. For the CNN and SNQ, de-identified data collection was approved at each site by either an institutional ethics board or quality improvement committee of the hospitals listed. All participating networks have obtained ethics/regulatory approval or the equivalent from their local granting agencies to allow for de-identified data to be collated. De-identified ANZNN, CNN and SNQ data were amalgamated at the iNeo collaboration centre where analysis occurred. Approval for this project was obtained from the South Eastern Sydney Local Health District Human Research Ethics Committee and approval for data transfer was obtained from all three networks executive committees. The ethics committees waived the requirement for the consent. Data from all networks were amalgamated and used for this study.
The Coordinating Centre has been granted Research Ethics Board approval for the development, compilation, and hosting of the dataset, and all 3 networks have signed data transfer agreements with the Coordinating Centre. Privacy and confidentiality of patient and unit-related data will be of prime importance to the iNeo collaboration, and data collection, handling, and transfer will be performed in accordance with the Canadian Privacy Commissioner’s guidelines, the Personal Information Protection and Electronic Documents Act, and any other local rules and regulations. No data identifiable at the patient level will be collected or transmitted, and only aggregate data will be reported. For all stages of the project, participating units will be assigned a code by their own network prior to data transfer into the iNeo dataset so that units remain anonymous within the iNeo collaborative. Following data analysis, findings will be disseminated within networks by their own network coordination team and not by the iNeo central team.
Following completion of the study in 2017, the data will be kept at the iNeo Coordinating Centre for a further 2 years before being returned to the originating networks unless otherwise agreed by the member networks.
Discussion
This study is the first to systematically assess the comparative predictive power of VLBW and VLGA cohorting methods. Belief in the superiority of gestation-based cohorts has grown amongst many investigators [
2,
3,
39‐
41] in response to suggestion that VLBW cohorts are limited by their innate confounding of growth status and maturity [
6‐
8,
17,
42], but have not been formally validated. In this retrospective population study of 31,940 neonates from Australia/New Zealand, Canada and Sweden, we identified that outcome prediction models derived from VLGA and VLBW cohorts perform equally well for prediction of in-hospital mortality and CAO in these high-risk preterm infants.
As expected, the VLBW study cohort held a disproportionately high number of SGA infants [
6‐
8,
17,
42] in harmony with previous large population studies, which show SGA proportions of 20–39% in VLBW groups [
10‐
13,
43,
44] compared to 8–12% in VLGA groups [
15,
16,
45‐
47].
The expected skewing of risk toward poor outcome in VLBW cohorts was confirmed by the higher rate of CAO in this group (36.5%) compared to the VLGA group (32.6%). The VLBW study cohort had higher rates of NI (18.1% vs. 16.0%) and CLD (21.7% vs. 19.2%) than the VLGA cohort across all networks [
48] confirming previous studies that SGA infants have higher risk of CLD [
49‐
53] and NI [
50,
52,
54] compared to AGA infants of the same GA. Previous studies have also suggested higher mortality [
49‐
51] and NEC [
52] rates in SGA infants, yet inconclusive as to whether SGA groups have excess risk of severe ROP and SNI [
48,
50,
52,
54]. The current international study has the largest sample size of any research examining these morbidities and thus has the statistical power to determine small differences in outcome. The smaller than hypothesised outcome difference found between the VLGA and VLBW groups is likely related to improvement of SGA outcomes associated with advances in contemporary clinical practice. The protective effect (negative coefficient) found for vertex presentation for both VLBW and VLGA cohorts suggests other presentations such as breech, transverse or others are associated with a less favourable outcome.
No clinically significant difference in predictive performance was found between the VLGA and VLBW models in this study. The higher SGA percentage within the VLBW cohort did not affect the discrimination power of the VLBW model, suggesting adequate control within the model for the confounding effect present. We propose two explanations for the rejection of our hypothesis. First, in previous VLBW cohort publications, many infants may not have had accurate prenatal gestation assessments, primarily due to substantial limitations in accessibility to early dating ultrasound. In comparison, GA assessment in the three networks of this contemporary study was robust, as all three networks have national healthcare access with nearly universal ultrasound examinations of pregnancies. The accurate GA data in both the VLGA and VLBW cohorts improved the accuracy of the models in this study, compared to expectations from previously published VLBW cohort data. Second, the research methodology of this study allowed for inclusion of non-linear relationships, such as the GA and BW z-score interaction. In the VLBW models, this adjusted for growth status and maturity through a balanced shift in the coefficients for BW z-score and GA as well as the negative coefficient in their interaction being the protective confounding effect of growth status and maturity. The non-inclusion of these covariates in the VLGA models likely reflects that similar adjustment for SGA infants was not needed, as expected in keeping with the consistent 10% SGA. Consequently, the large sample sizes of this study combined with sophisticated modelling allowed development of models able to effectively control for confounding and bias, leading to the null findings.
Comparison of the models’ usefulness for prediction in the two ‘extreme’ subsets of VLGA-not-VLBW and VLBW-not-VLGA tests the scope of application. It was found that the power of all models fell when applied to the <1500 g BW ≥32-week GA infants, who would almost all be moderately or severely SGA. Predictive power also dropped for both the VLGA and VLBW models when used for CAO prediction in the BW ≥1500 g and GA <32-week subset, but remained excellent for mortality prediction. This clearly confirms that both mortality prediction models perform well for an extreme cohort containing no SGA infants [
23]. The finding that the CAO prediction model did not perform as well as expected could indicate increased vulnerability of large for GA infants to morbidities. The findings suggest that separate prediction models may need to be developed for infants on the extreme subsets of established cohorts where there is a high proportion of SGA or LGA infants, as standard statistical modelling derived from either VLGA or VLBW may not be appropriate for use.
This study is reliable due to its large sample size of 31,940, and the population based nature of the data [
55]. Relative to the size of the samples there were very few missing or incorrect data, attesting to the high quality of the originating databases. The international collaboration allowed validation of study findings across three neonatal networks, and was made more effective by choosing networks with similar databases. Additionally, Canada, Sweden and Australia/New Zealand have high coverage with early dating ultrasound and thus accurate GA data, in contrast to other studies that have combined last menstrual period and ultrasound dating, thus applying GA estimations that differ by up to 3 weeks [
8,
56]. Through the examination of both mortality and CAO, this study will be useful as survival at lower GA becomes possible and prediction of survival without major morbidities becomes increasingly vital.
This study was limited to the analysis of variables collected uniformly across all network databases for the complete study period, but the similarity and quality of the network databases included curtailed the effect of this limitation. The observational, retrospective design meant that no causal mechanisms can be imputed..
The conclusion that VLBW cohorts perform as well as VLGA cohorts for prediction of mortality and morbidity will have ramifications at the international and population levels. Comparison of population outcome may now be considered valid regardless of the cohorting method used to obtain data, providing both GA and BW z scores are included in analytic models. This represents a major advancement in international benchmarking. This study also provides evidence to justify the continued use of BW-based cohorting in some nations provided accurate GA data are included. A further corollary of this study is clarification of the literature on VLGA and VLBW neonates. The findings elucidate both the external validity of research based on one cohort for application to the other, and the appropriateness of comparing data or conclusions based on disparately cohorted groups.
Further investigation is warranted into whether the findings of this study can be extrapolated to countries with poorer access to antenatal care, in particular early dating ultrasound, and hence less accurate GA estimation. Moreover, further studies should compare predictive power for longer-term outcomes such as neurodevelopment, where differing SGA proportions would be expected to have greater effect.
Acknowledgements
The authors wish to thank all the diligent data managers and extractors of all participating hospitals of the 3 neonatal networks in prospectively collecting the relevant clinical data for their respective network databases.
Advisory Council Members of ANZNN (* denotes ANZNN Executives).
Peter Marshall (Flinders Medical Centre, SA), Paul Craven (John Hunter Hospital, NSW), Karen Simmer* (King Edward Memorial and Princess Margaret Hospitals, WA), Jacqueline Stack (Liverpool Hospital, NSW), David Knight (Mater Mother’s Hospital, QLD), Andrew Watkins (Mercy Hospital for Women, VIC), Andrew Ramsden, Kenneth Tan*, Kaye Bawden* (Monash Medical Centre, VIC), Lyn Downe, Vjay Singde (Nepean Hospital, NSW), Michael Stewart (Newborn Emergency Transport Service, VIC), Andrew Berry (NSW Newborn & Paediatric Emergency Transport Service), Rod Hunt (Royal Children’s Hospital, VIC), Charles Kilburn (Royal Darwin Hospital, NT), Peter Dargaville (Royal Hobart Hospital, TAS), Kei Lui* (Royal Hospital for Women, NSW), Mary Paradisis (Royal North Shore Hospital, NSW), Nick Evans*, Shelley Reid* (Royal Prince Alfred Hospital, NSW), David Cartwright* (Royal Women’s Hospital, QLD), Carl Kuschel, Lex Doyle, (Royal Women’s Hospital, VIC), Andrew Numa (Sydney Children’s Hospital, NSW), Zsuzsoka Kecskes (The Canberra Hospital, ACT), Nadia Badawi (The Children’s Hospital at Westmead, NSW), Guan Koh* (The Townsville Hospital, QLD), Steven Resnick (Western Australia Neonatal Transport Service), Mark Tracy, William Tarnow-Mordi* (Westmead Hospital, NSW), Chad Andersen (Women’s & Children’s Hospital, SA).
New Zealand: Nicola Austin (Christchurch Women’s Hospital), Brian Darlow* (Christchurch School of Medicine), Roland Broadbent* (Dunedin Hospital), Jenny Corban* (Hawkes Bay Hospital), Lindsay Mildenhall (Middlemore Hospital), Malcolm Battin (National Women’s Hospital), David Bourchier (Waikato Hospital), Vaughan Richardson (Wellington Women’s Hospital).
ANZNN executives not contributing hospital data: Ross Haslam* Chair of the executives, Georgina Chambers* (National Perinatal Statistics and Epidemiology Unit, University of New South Wales); Victor Samual Rajadurai*, (KK Hospital, Singapore).
CNN Site Investigators
Shoo K Lee (Chairman, Canadian Neonatal Network; Sick Kids Hospital, Toronto, ON); Prakesh S Shah (Director, Canadian Neonatal Network; Mount Sinai Hospital, Toronto, ON); Andrzej Kajetanowicz (Cape Breton Regional Hospital, Sydney, NS); Anne Synnes (Children’s and Women’s Health Centre of British Columbia, Vancouver, BC); Nicole Rouvinez-Bouali (Children’s Hospital of Eastern Ontario, Ottawa, ON); Bruno Piedboeuf (Centre Hospitalier Universitaire de Quebec, Sainte Foy, QC); Valerie Bertelle (Centre Hospitalier Universitaire de Sherbrooke, Fleurimont, QC); Barbara Bulleid (Dr. Everett Chalmers Regional Hospital, Fredericton, NB); Wendy Yee (Foothills Medical Centre, Calgary, AB); Sandesh Shivananda (Hamilton Health Sciences Centre, Hamilton, ON); Kyong-Soon Lee (Hospital for Sick Children, Toronto, ON); Mary Seshia (Health Sciences Centre, Winnipeg, MB); Keith Barrington, Francine Lefebvre (Hospital Sainte-Justine, Montreal, QC); Douglas McMillan (IWK Health Centre, Halifax, NS); Wayne Andrews (Janeway Children’s Health and Rehabilitation Centre, St Johns, NL); Lajos Kovacs (Jewish General Hospital, Montreal, QC); Kimberly Dow (Kingston General Hospital, Kingston, ON); Orlando da Silva (London Health Science Centre; London, ON); Patricia Riley (Montreal Children’s Hospital, Montreal, QC); Prakeshkumar Shah (Mount Sinai Hospital, Toronto, ON); Abraham Peliowski/Khalid Aziz (Royal Alexandra Hospital, Edmonton, AB); Zenon Cieslak (Royal Columbian Hospital, New Westminster, BC); Zarin Kalapesi (Regina General Hospital, Regina, SK); Koravangattu Sankaran (Royal University Hospital, Saskatoon, SK); Daniel Faucher (Royal Victoria Hospital, Montreal, QC); Ruben Alvaro (St Boniface General Hospital, Winnipeg, MB); Roderick Canning (The Moncton Hospital, Moncton, NB); Cecil Ojah/Luis Monterrosa (St John Regional Hospital, St John, NB); Michael Dunn (Sunnybrook Health Sciences Centre, Toronto, ON); Todd Sorokan (Surrey Memorial Hospital, Surrey, BC); Adele Harrison (Victoria General Hospital, Victoria, BC) and Chuks Nwaesei /Mohammed Adie (Windsor Regional Hospital, Windsor, ON).
SNQ Site Investigators:
Stellan Håkansson (Chairman, Swedish Neonatal Quality Register; Norrlands Universitetssjukhus, Umeå); Gunnar Sjörs (Co-Director, Swedish Neonatal Quality Register; Akademiska Sjukhuset, Uppsala); Niklas Segerdahl (Borås lasarett, Borås); Tarek Morad (Mälarsjukhuset, Eskilstuna); Stefan Morén (Falu lasarett, Falun); Åke Stenberg (Gällivare sjukhus, Gällivare); Christer Simonsson (Länssjukhuset Gävle-Sandviken, Gävle); Lennart Stigsson (Östra sjukhuset, Göteborg); Jens Ladekjaer Christensen (Halmstads länssjukhus, Halmstad); Lars Åmasn (Hudiksvalls sjukhus, Hudiksvall); Fredrik Ingemanson (Länssjukhuset Ryhov, Jönköping); Laura Österdal (Länssjukhuset, Kalmar); Karl-Gustav Ellström (Centralsjukhuset, Karlstad); Thomas Abrahamsson (Universtitetssjukhuset, Linköping); Ingela Heimdahl (Sunderby sjukhus, Luleå); Tomas Hägg (Vrinnevisjukhus, Norrköping); Anna Hedlund (Skellefteå lasarett, Skellefteå); Ellen Elisabeth Lund (Kärnsjukhuset, Skövde); Björn Westrup (Karolinska sjukhuset, Stockholm); Ihsan Sarman (Södersjukhuset, Stockholm); Anna Stakkestad Jobe (NÄL, Trollhättan); Magnus Fredsriksson (Visby lasarett, Visby); Anders Palm (Västerviks sjukhus, Västervik); Birger Malmström (Centrallasarettet, Västerås); Eva Lindberg (Universitetssjukhuset, Örebro); Owe Ljungdahl, (Örnsköldsviks sjukhus, Örnsköldsvik) and Kerstin Eriksson (Östersunds sjukhus, Östersund).