Background
A recent report shows a slight decline in the rate of infants with low birthweights (less than 2500 g) in the United States, with a rate of 8.2 percent in 2007 compared to 8.3 percent in 2006 [
1]. While the rate for extremely low (ELBW; <1000 g) and very low birthweights (VLBW; 1000-1500 g) was unchanged at 1.5 percent, the rate for moderately low birthweights (MLBW; 1500-2500 g) declined from 6.8 to 6.7 percent [
1]. Data on the proportions of normal (NBW; 2500-4000 g) and high birthweights (HBW; >4000 g) were not provided. If confirmed in the final vital records data, the decline in the low birthweight rate will be the first in many years. National Center for Health Statistics (NCHS) records indicate that low birthweight rates have been rising since 1984, when the rate was 6.7 percent [
1].
Perinatal epidemiologists have long recognized birthweight as one of several factors related to fetal growth, and ultimately, infant survival and development [
2‐
4]. However, categories such as ELBW and VLBW, while useful for descriptive purposes, are not completely satisfactory for representing the birthweight distribution of a population, much less assessing the relationship between birthweight and fetal-infant mortality. First, cutoffs such as 1500 g and 2500 g are arbitrary and introduce an artificial discreteness to a naturally continuous phenomenon: presumably fetal-infant mortality risk decreases only incrementally as one moves from, for example, 2499 g to 2501 g. Second, there may still be heterogeneity at any fixed birthweight: some infants born at, say, 2499 g may be at higher risk than other infants born at 2499 g.
The preceding considerations motivate a new framework for modeling birthweight distributions and fetal-infant mortality. This is the second paper in a two-part series that introduces such a framework. In the first paper, we proposed a normal mixture model for birthweight distribution:
(1)
where
k is the number of components,
x is birthweight,
p
j
is the fraction of births in component
j,
μ
j
is the mean of the birthweights in component
j,
σ
j
is the standard deviation of the birthweights in component
j, and
f (
x;
μ
j
,
σ
j
) is the probability density for a normal distribution with mean
μ
j
and standard deviation
σ
j
. What distinguished our proposal from the contaminated normal model of Umbach and Wilcox [
5] and the 2-component normal mixture model of Gage and Therriault [
6] was that the number of components was not fixed
a priori but rather determined from the data using the Flexible Information Criterion (FLIC) (Pilla and Charnigo, Consistent estimation and model selection in semiparametric mixtures, submitted). We also showed how to construct confidence intervals for
p
j
,
μ
j
, and
σ
j
(1 <=
j <=
k) based on multiple samples from the same population, even if those samples overlapped.
Here we consider estimating birthweight-specific mortality curves within each component of the normal mixture model in Equation (
1). We begin by generalizing Gage's parametric mixtures of logistic regressions (PMLR) technique [
7] to accommodate a normal mixture model with more than two components. We proceed to show how confidence bounds can be constructed for birthweight-specific mortality curves. We then provide formulas for estimating mortality odds ratios comparing populations on the same component, such as
odds of mortality at 2500 g in component 3 (white heavy smoking population) divided by
odds of mortality at 2500 g in component 3 (white general population),
or comparing components in the same population, such as
odds of mortality at 1000 g in component 2 (white heavy smoking population) divided by
odds of mortality at 1000 g in component 1 (white heavy smoking population).
Being able to estimate the latter kind of odds ratio - in other words, being able to assert that some infants in a population are at higher risk than others, even when they are of the same birthweight - is the main advantage of modeling a birthweight distribution as we have proposed, rather than using a contaminated normal model [
5] or a 2-component normal mixture model [
6]. Thus, our two-part series provides a modeling framework through which heterogeneity in mortality can be revealed that might otherwise remain undetected.
Discussion
This paper completes a two-part series on a new framework for modeling birthweight distributions and fetal-infant mortality. The main advantage of this new framework is its potential to reveal heterogeneity in mortality risk that may be undetectable if one relies on a contaminated normal model or 2-component normal mixture to represent a birthweight distribution.
With the contaminated normal model, the lower residual distribution and the predominant distribution have little overlap. As such, there is little overlap in the ranges of birthweights over which each component has a well-defined risk function. This is depicted in Figure
1b, where the red and green dashed curves do not occupy the same birthweights except for a small interval near 1700 g. Thus, except for birthweights close to 1700 g, the contaminated normal model effectively imposes a unique mortality risk for all infants at any fixed birthweight. This occurs because the contaminated normal model classifies all NBW cases, along with almost all MLBW and HBW cases, as originating from the predominant distribution, while it classifies virtually all VLBW and ELBW births as arising from the lower residual distribution. Yet, presumably some compromised pregnancies yield MLBW, NBW, and HBW births. Hence, not only does the estimated proportion .975 overstate the fraction of uncompromised pregnancies, but also no distinction can be made between compromised and uncompromised pregnancies at birthweights above 1700 g.
In contrast, the 2-component normal mixture has some ability to reveal heterogeneity. However, this ability is limited to the MLBW, NBW, and HBW ranges. As shown in Figure
1c, the 2-component normal mixture effectively imposes a unique mortality risk at each birthweight in the VLBW and ELBW ranges. At first glance, that may not seem worrisome. After all, the MLBW, NBW, and HBW cases may arise from a mix of compromised and uncompromised pregnancies, while presumably the VLBW and ELBW cases arise almost exclusively from compromised pregnancies. Yet, implicit in the 2-component normal mixture is a belief that all compromised pregnancies are qualitatively similar, in the sense of sharing a common birthweight-specific mortality curve. Perhaps such a belief is approximately valid for some populations. Unfortunately, the 2-component normal mixture imposes this belief mathematically and does not provide any way for it to be tested empirically. The framework that we have presented, on the other hand, allows such a belief to be tested empirically. Indeed, the example in Section 3b of Results shows that component 2 in the population of white singletons has demonstrably higher mortality risk at some birthweights than component 4 in the same population. We regard component 3 as most plausibly representing uncompromised pregnancies in this population, so that components 2 and 4 most plausibly consist of compromised pregnancies. Therefore, not all compromised pregnancies in this population share a common birthweight-specific mortality curve.
The components identified in our empirical explorations are undoubtedly related to gestational age. While detailed speculations about the precise nature of the relationship are premature, one or more of the components may have an elevated rate of intrauterine growth restriction (IUGR). Typically, IUGR is measured in population-based vital statistics data as births below (variously) the 5th or 10th percentile of birthweight for gestational age. Other aspects not presently measured on birth certificates in the United States include head circumference at birth, birth length (i.e., crown-heel length or crown-rump length), and waist/hip ratio. However IUGR might be quantified, its frequency within each component could be estimated as indicated in the next paragraph.
A useful extension of our methodology would entail probabilistically relating a covariate of interest, such as gestational age or IUGR, to the mixture components. Suppose that the covariate of interest were dichotomous. For gestational age, we could create a dichotomy by labeling infants as "preterm" or "term". Then, given a fitted
k-component mixture model for birthweight distribution, we could apply PMLR with dichotomized gestational age or IUGR rather than mortality as the dependent variable. The resulting
would denote not the estimated mortality risk but rather the estimated probability of a preterm birth or of IUGR as a function of birthweight within component
j (1 <=
j <=
k). To estimate the overall probability of a preterm birth or of IUGR within component
j, we would integrate
over the estimated distribution of birthweights within component
j,
(10)
Pursuing this idea and extending it to multiple covariates, both categorical and continuous, would enable us to describe the joint distribution of covariates within each mixture component. If the joint distributions of covariates within different mixture components had little overlap, then we would be able to assert an approximate correspondence between the mixture components and identifiable subpopulations with biological meaning. Such discoveries would provide greater epidemiologic insight into the relationships among fetal-infant mortality and its prognostic factors.
Conclusions
The present paper, the second in a two-part series, develops a new and flexible approach to modeling fetal-infant mortality through the estimation of separate birthweight-specific mortality curves within each component of a normal mixture model describing a birthweight distribution, the number of components having been determined from the data rather than fixed a priori. This approach allows the detection of heterogeneity in mortality that cannot be found with a contaminated normal model or a 2-component normal mixture model. A 2-component normal mixture model assumes that infants from compromised pregnancies share a common birthweight-specific mortality curve, while a contaminated normal model assumes that all infants share a common curve over some (possibly quite large) interval of birthweights. Yet, our approach has demonstrated that components 2 and 4 in a 4-component normal mixture model for white singleton birthweights have distinct birthweight-specific mortality curves. Since components 2 and 4 in this population most plausibly consist of compromised pregnancies, we see that infants from compromised pregnancies need not share a common birthweight-specific mortality curve. Finally, this paper lays some groundwork for future research aimed at discovering approximate correspondences between mixture model components and identifiable subpopulations.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
RC - Concept and design, analysis and interpretation of data, drafting of the manuscript, critical revision of the manuscript for important intellectual content, statistical analysis, read and approved final manuscript. LWC - Concept and design, acquisition of data, analysis and interpretation of data, drafting of the manuscript, critical revision of the manuscript for important intellectual content, read and approved final manuscript. TL - Analysis and interpretation of data, drafting of the manuscript, critical revision of the manuscript for important intellectual content, read and approved final manuscript. RSK - Analysis and interpretation of data, drafting of the manuscript, critical revision of the manuscript for important intellectual content, read and approved final manuscript.