Background
It is widely acknowledged, both theoretically and in practice, that incorporating design features into estimation of descriptive parameters, such as prevalence, can help avoid bias and reduce standard errors [
1‐
4]. However, in spite of the consensus in the statistical environment, grounded in clear evidence and well established procedures to deal with complex sample strategies in survival modeling [
5], these principles are quite often ignored in applied settings. For instance, recently published studies [
6,
7] using well-known cohort data (MESA [
8] and MONICA [
9]) neither incorporate design weighting into the analysis, nor discuss its appropriateness.
This paper was motivated by discussion of the sample strategy used in a recent large multi-center cohort study, with approximately 50,000 people as the target population and 15,000 participants to be followed-up for at least 20 years [
10]. The participants were selected by non-proportional stratified sampling. The main aim here is to present - as clearly as possible for non-statistical researchers - the impact of ignoring sample design, and thus to contribute to improving data analysis practice in epidemiology. In this case study we evaluate the impact of sampling weights and loss to follow-up on estimation of the parameters of a Cox proportional hazard model, by evaluating bias and precision.
Stratified random sampling involves dividing the population members into non-overlapping groups called strata, defined by selected characteristics and each sampled separately. Varying sample fractions by stratum improves the efficiency of sample design and estimators for relatively small but important population subgroups. As the proportion of the samples in each stratum varies, the weight of each individual will be proportional to the inverse of the sample fraction in the respective group, as described in Kish (1965) [
4]. Computing those weights gives each stratum the same relative importance as it displays in the population. In a Stratified sample, as the association between exposure and the event may vary within each stratum, estimation of the marginal association - the average association in the entire population - should consider the individual, and varying, probability of being included in the sample.
Varying sample weights across the strata may induce a difference between the probability distributions for the outcome in the sample and in the population, because of the covariates included in the model. In such cases the design carries information about the outcome, and is therefore considered informative or non-ignorable.
In a survival model, where time-to-event
T is the response variable,
x the covariates vector and
z the design factor, if
z is not related to
T |
x, the design factor
z is ignorable. Boudreau and Lawless [
11] analyzed the impact of sampling design on the Cox proportional hazards model, considering both clustering and stratification. If the sampling design is ignorable, both weighted and unweighted procedures are asymptotically unbiased and should yield similar point estimates. However, if the sampling design is non-ignorable, consistent estimation can be achieved by introducing design weights into the estimating functions, as proposed by Binder (1992) [
12] and Lin (2000) [
13].
Another major problem in long-term cohort studies is potential bias due to loss to follow-up. This problem is widely recognized and several approaches deal with it [
14]. The Cox model assumes non-informative censoring.
However, this is an unwarranted assumption in long-term cohort studies, and differential losses related to the sampling strata may increase the bias. Lawless (2003) [
5] discusses these issues further and considers the use of time-varying weights that deal at the same time with a non-ignorable sampling plan and non-ignorable censoring.
The next section presents the case study, describing the simulated population and two different scenarios of loss to follow-up. Next the sample plan strategies and model fitting are presented. The results section uses a graphical representation to make the discussion of the impact of ignoring sample design more accessible to non-mathematical readers.
Conclusions
Quite often researchers do not include either sample weights or strata indicators in statistical models. Yeboah et al (2010) [
19] used only white race in a univariate model, in spite of the four strata (white, African-American, Hispanic and Asian) that defined the sample strata in MESA [
8]. Race was included as a common covariate, and excluded from the multivariate models. Neither the six study communities nor the sample weights were mentioned. Two other papers on the same cohort were more careful. Polonsky et al (2010) [
25] controlled for race. Bertoni et al (2010) [
6] not only included race, but tested for interaction with the main exposure variable. Neither evaluated the impact of the study communities.
Our results confirmed that, in a correctly-specified model, ignoring the weights does not change the estimated parameters, and precision may improve (a result theoretically proven for inference based on ordinary least squares) [
26,
27]. As suggested by Winship and Radbill (1994) [
28], the decision whether or not to include the weights in the model should be based on the role of the stratifying variable. In the presence of interaction between the stratifying variable and other independent variable not included in the model, bias will be introduced if sample weights are not considered. However, the correct model is only known for simulated populations. Also strata are usually chosen to increase the sample size of populations whose characteristics are important to the outcome under study.
The primary objective of analyzing survey data is to make inferences about the population of interest [
29]. Therefore survey planning starts by defining the target population, to which results will be referenced [
2,
30]. The role of the population of reference in analysis of survey data is related to the meaning of the error term of the statistical model. In the physical sciences, the error of a regression is considered a measurement error. Epidemiology, however, besides measurement, has to consider different sources of variability relating to individuals, and not captured by the covariates included in the model [
2]. Actually, this reasoning lies behind the development of random effect ("frailty") models in survival analysis [
31]. Another issue is the use of crude estimates. The usual practice in epidemiology is to control for confounders. However, public health policies may need those numbers to estimate disease burden or to evaluate the impact of targeting specific risk factors. The
Smoke-only model (Eq:3) gives exactly the desired estimate for these purposes. The correct numbers should thus be given, using the appropriate weighting in an uncontrolled model.
The stratification by professional categories, which assigns much larger weight to the lower social stratum, was guided by the need to increase the power to detect social-related risk factors. Nevertheless, almost any covariate displays different prevalence in different socioeconomic groups. Also almost all covariates interact, positively or negatively, changing the risk. Smoking itself presents similar physiological risk across socioeconomic strata. However, belonging to the most deprived stratum implies differences in other risk factors such as larger body mass index, worse diet, inadequate exercise, all associated with cardiovascular diseases, and these are the known and easily-measured risk factors. Unknown or unreliable measures, such as stress or mental health, will always exist. Therefore allowance has to be made for the possibility of unknown confounders and interactions in our data associated with the sample strata. Rubin [
32] recommends that observational studies should approximate randomized experiments, and that the assignment mechanism, in our case smoking or not smoking, should be as unconfounded as possible. Graubard and Korn (2002) [
33] recommend weighted estimators, as they believe their model-free aspects outweigh their potential inefficiency. On the same reasoning, we strongly recommend always correcting by sample weights.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
Both authors designed, analyzed and wrote the paper.