Missing covariate data may give rise to sample selection bias. As noted by Choi et al., a “complete case estimate” reflects the average effect among the individuals selected for the analysis, i.e. those without missing values (
R = 0); let E[
Y1i −
Y0i|R = 0] denote this effect. In the presence of sample selection bias, we generally envision a systematic discrepancy between the sample average causal effect and the average causal effect in the target population. Even in the absence of missing values on covariates, sample selection bias may be an issue of concern. A “participating case estimate” based on the participants (
P = 0;
P is the
non-participant indicator) should reflect the effect E[
Y1i −
Y0i|
P = 0]. Commonly, epidemiologists face both incomplete participation in a cohort study and missing covariate values for the participants. If we are interested in estimating the population average exposure effect, and there is effect heterogeneity conditional on
P,
R, or both, then the estimate of E[
Y1i −
Y0i|P = 0,
R = 0] may suffer from poor validity. In relation to our example, it is easy to imagine a biased estimate of the population average exposure effect if the distributions of metabolic risk factors differ between the participants and the target population. Hence, there is an interrelation between the problem with missing individuals and the one with missing covariate values. Nevertheless, we need to handle these two problems differently. We cannot resort to multiple imputation without any data on the non-participants (or, more precisely, without any data generated by the cohort study). There are other methods to adjust for selective participation that might be useful: probability-of-participation weighting with weights derived from logistic regression or generalized boosted models based on external data on non-participants [
7,
8]; post-stratification or generalized regression using known population moments [
9]; entropy balancing [
10]; and empirical balancing calibration weighting [
11]. We welcome further investigations, in the spirit of Choi’s study, which provide guidance on methods to handle both missing individuals and missing values in the context of propensity score analysis for cohort studies.