Introduction
Zero-inflated (ZI) non-negative count data frequently arise in medical studies, e.g., number of clinic visits, admissions, days in hospital, number of serious illnesses, and medical costs. This data has the following distinct characteristics: (1) the presence of a large proportion of zero values (i.e., zero inflation [
1,
2] or sparsity in count data [
3,
4]), (2) strictly non-negative values that are right-skewed, and (3) overdispersion (i.e., mean < variance) [
5‐
8]. (Hereafter, we omit the word “non-negative”.) To account for these complexities in this type of data, various models with flexible mixture distributions were introduced over the past decades, including ZI [
9,
10] and hurdle regression models [
11] using Poisson, quasi-Poisson, negative binomial (NB), and Poisson-Lindley distributions.
While ZI models have been previously compared to their non-inflated counterparts, the conclusions of which model outperforms the other have been inconsistent. For example, Du et al. [
12], Connelly et al. [
13], and Speedie et al. [
14] examined a similar ZI outcome (i.e., number of laboratory tests ordered during a first emergency department visit), but the selected models were not the same. On the basis of the likelihood ratio test, Akaike’s Information Criterion (AIC), and the Bayesian information criterion (BIC), Du et al. suggested that the ZINB model may be favored over Poisson, negative binomial (NB), hurdle, and zero-inflated Poisson (ZIP) regression models. However, the other two studies referred to the NB and hurdle models as “the best fit” and made use of both to predict and explain the outcomes of interest. Choi et al. [
15] used a Bayesian model selection criterion to evaluate zero inflation in scRNA-seq datasets. They demonstrated that the primary cause of zero inflation was biological in nature and argued that a quantitative estimate of zero inflation (i.e., an estimate of a parameter accounting for a level of zero inflation in the ZINB distribution) was not a reliable indicator of zero inflation. Outside the medical field, Ver Hoef and Boveng [
8] illustrated that the quasi-Poisson produced a better fit for ecological count data when compared to NB based on a diagnostic plot of the empirical fit of the variance. Other studies such as Naya et al. [
16] utilized the deviance information criterion and estimates of marginal likelihoods with the method of Newton and Raftery [
17] to assess and select the best model among various single- and multi-level ZI regression models. Even though there is a large body of literature on the superiority or non-superiority of ZI models, few studies conducted a further analysis of data characteristics that may directly determine which model(s) give better fit while yielding reliable inferences.
The current study is motivated by a recent clinical trial where, under a Bayesian framework, the ZINB model did not sufficiently outperform the NB model when modeling ZI count outcomes obtained in a trial of children with medical complexity [
18]. The ZINB model outperformed both the Poisson and ZIP models; however, the ZIP model did not outperform the Poisson model significantly. The study was a single-center randomized clinical trial, which evaluated the effectiveness of a telemedicine program with comprehensive care (CC) compared to CC alone. For the analysis of this trial, the performance of models (Poisson, NB, and their ZI counterparts) was evaluated using the
\(k\)-fold information criterion (kfoldIC) with
\(k\)equal to 10 (typical value used in studies) [
19].
In this simulation study, we re-analyzed the trial data under a Frequentist framework and further investigated which data properties have the largest effect on model fitness under varying sample sizes and degrees of zero inflation. We first compare the model performance of NB and ZINB models based on AIC and then examine whether any characteristics of a ZI outcome influence the model performance and effect sizes. Our hypotheses are as follows. First, we hypothesized that there would be no significant differences between the NB and ZINB regression models in terms of marginal treatment effects, bias, and coverage. Second, we hypothesized that other data characteristics such as skewness and variance of the non-zero part of the data, rather than the number of zero counts in the ZI data and the degree of overdispersion, would be more important in deciding between NB and ZINB models. Last, we hypothesized that the choice of the best fit model based on AIC is unrelated to sample sizes.
Discussion
The aim of this study was to compare the performance of NB and ZINB regression models in terms of bias, MSE, and coverage and to determine which properties of zero-inflated count data are better described by a ZINB model. Our simulation results indicated that a ZINB regression model does not necessarily outperform an NB model when evaluating ZI medical count outcomes obtained in a trial of children with medical complexity. This is consistent with our original analysis conducted under a Bayesian framework. Even when data were simulated from an underlying ZINB distribution, the NB model had a very similar or even smaller relative bias and MSE for the marginal treatment effect. This suggests that when data is explained and predicted using regression coefficients, as is common in medical and epidemiological studies, there is no significant difference between the NB and ZINB models. Additionally, we want to emphasize that determining the best-fitting model using quantitative model selection criteria (e.g., AIC) is not the only goal of statistical modeling. The ultimate goal of statistical modeling, as Hand [
37] stated, is to gain a better understanding of the
real world. From this perspective, the NB model may be preferred over the ZINB model even when the AIC is worse because the results would be straightforward to interpret.
When comparing which model gave a better fit to simulated data, our results showed that the NB model outperformed that of a ZINB model in terms of bias, MSE, and coverage for the treatment group coefficients, even with outcomes generated from a ZI distribution. It signifies that, in terms of the results (e.g., an intervention effect) in which medical professionals are primarily interested, there is no substantial difference between the ZI and non-ZI regression models, even when the outcome contains excess zeros. Note that, when we used a primary outcome with a sample size of 800, the coverage from the
sim. NB model under a true ZINB distribution was 0.87, which is lower than that of the
sim. ZINB model. In fact, for the primary outcome, the coverage decreased as the sample size increased in the combination of the DD ZINB model and the
sim. NB model (Additional file
1: Table S6), when the sample size ranged between 60 and 800. We note that the corresponding absolute bias and MSE of the NB model decreased (i.e., approaching 0) as the sample size increased. Increasing the sample size typically reduces the width of confidence intervals by lowering the standard error at the same time. As a result, it is possible to achieve a tighter confidence interval, which may result in a low coverage. As shown by Additional file
1: Fig. S4, the interval length of the CIs decreases as the sample size increases. The mean of their lower bounds, in particular, approaches the true coefficient with a smaller standard deviation. This could be the key contributor that brought both bias and MSE close to 0. Further studies investigating the association between data characteristics (e.g., ratio of mean to variance, sample size) and coverage are needed to ensure appropriate model performance.
From the multivariable ridge logistic regression, we observed that the proportion of zeroes played a smaller role in predicting a preference for the ZINB model than the MLE of the shape parameter, when the DD ZINB model was used to generate synthetic primary outcomes. This result indicates that, contrary to popular belief, the percentage of zero counts in predicting a preference for (or fitness of) the ZINB model (over the NB model) is not as substantial as we would assume.
For care days outside of the home, bias, MSE, and coverage for the treatment coefficient were comparable regardless of the type of DD and sim. models. This is expected given that this outcome was not zero-inflated; hence, there should be no difference between the NB and ZINB models. From the multivariable ridge logistic regression analysis, the percentage of zero counts was the strongest predictor in terms of regression coefficients. In approximately 90% or more cases where the dataset was not suitable for use with the ZI model (i.e., the percentage of zero counts \(\approx\) 5%), the NB model exhibited a smaller AIC value, which represents a better fit. However, it is observed that for these particular data, as the percentage of zero counts increases, there is a greater inclination toward favoring the ZINB model. This observation suggests that the prevalence of zero may introduce a bias, leading to a preference for the ZINB model, despite its unsuitability for the given context. This result underscores that when making a choice between the NB model and the ZINB model, it is essential to avoid over-reliance on the percentage of zero counts and instead consider other characteristics of the data. Interestingly, the MLE of the shape parameter in the NB distribution, which is a quadratic function of the mean and variance of the outcome, was the second most important predictor in determining the model preference between NB and ZINB models. This result strongly reinforces our fundamental proposition that when considering the use of ZI regression models, rather than prioritizing the number of zero counts and overdispersion, investigators should consider two-dimensional characteristics such as a shape parameter estimate of an NB distribution, as well as other one-dimensional characteristics of the outcome such as the mean and skewness in the non-zero part of the outcome.
However, there are the following three caveats to consider. First, we exclusively considered the use of the ZINB model (and the ZIP model without any subsequent results described) in this study, which assumes that zero counts are from either the
structural or
samplingsources [
38]. The scope of this study did not include other ZI models such as the ZI Conway-Maxwell-Poisson model or the ZI generalized Poisson model. The hurdle model, which assumes that all zero counts only originate from the
structuralsource, is another popular model for ZI outcomes. This model is similar to the ZI model but may be more versatile as the zero counts can be both deflated and inflated. Depending on the investigators’ subjective opinions and the study objectives, the hurdle model may be a good alternative. It will be worth studying the existence and/or nature of any latent variable(s) that may contribute to the observed ZI count outcome [
39]. Second, while various potential scenarios were considered (e.g., sample size, percentage of zero counts, types of DD models), the study was
only empirically conducted through simulation. A promising extension of this study would be to demonstrate theoretical roles for the mean and skewness of the non-zero part of the ZI outcomes, as well as the MLE of the shape parameter, in an NB or ZINB regression model. Third, we demonstrated these findings by employing medical count outcomes from a single-center trial of children with medical complexity. It should be noted that different types of data may yield different conclusions.
In spite of the caveats discussed, this study is significant because it sheds fresh light on modeling with zero-inflated outcomes, which are frequently observed in medical data. We recommend that the percentage of zero counts in the outcome not be used as the sole and primary reason for selecting ZI regression models. Investigators should also consider other data characteristics such as the mean and skewness of the non-zero part of the outcome when choosing a model for medical count data. In addition, if the performance of the NB and ZINB regression models is reasonably comparable even with ZI outcomes, we advocate the use of the NB regression model due to its clear and straightforward interpretation of the results.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.