Background
Longitudinal studies are important in public health research for identifying risk factors related to negative health outcomes. However, a major concern in such studies is that the longer the follow-up period, the higher the chances are for drop-out [
1]. Attrition rates from 30 to 70% are often reported [
2‐
7]. Thus, it is important to study the effect of attrition on the generalizability of findings from long-term longitudinal studies. One of the aims of the current paper is to examine attrition from a 15-year population-based longitudinal study (TOPP study) initiated in 1993 to investigate development in children and their families. The TOPP study includes information about socio-demographics, psychological factors in the children and their mothers, and social relationships.
Differences in mean levels of variables between those who drop out and those who stay in a study do not necessarily imply that there are differences in associations between variables [
7,
8]. Even though a major concern in public health research is associations between risk/protective factors and later health outcomes, most attrition studies only examine the degree to which those who stay and those who drop out of a study are comparable in terms of mean levels. The current study also examines possible effects of attrition on estimates of associations between variables.
Even if those who stay and those who drop out of a study are similar at baseline, they may be different at the time of follow-up. Researchers seldom have information about how people who drop out would have responded if they had stayed in the study. Therefore, real life studies are generally not suited to examine effects of attrition that is dependent on follow-up variables. Thus, in the current study we use a computer simulation to examine effects on parameter estimates of attrition that is dependent on both baseline and follow-up variables.
Baseline predictors of attrition
Previous research has shown that different factors can influence drop-out rate. Socio-demographic variables, such as low educational level, being out of work, and not being married, are typically related to increased risk of non-response and attrition in epidemiological studies [
2,
4,
5,
8‐
12]. In addition, unhealthy life style factors, such as smoking, high alcohol consumption, and physical inactivity, are related to non-participation and attrition [
8,
11‐
13].
High levels of psychological distress can predict attrition in high-risk populations, such as psychiatric outpatients and former hospitalized patients [
3,
14]. In population-based studies, psychological distress has been found to have no effect or a weak to moderate effect on attrition after adjusting for other variables [
2,
4,
9,
10]. Attrition may also be related to social factors, such as support from spouse or friends, and child’s characteristics. Poor relationship quality is an important predictor of mental health problems [
15]. However, social networks and support did not predict attrition in a 15-year follow-up study [
5], and marital satisfaction and spousal support did not predict attrition in a job satisfaction study [
6]. More knowledge is needed about the association between attrition and psychological as well as social factors.
Studies with high-risk populations found that externalizing problems and psychopathology in general among children were associated with a higher risk of parents dropping out [
16,
17], whereas child characteristics such as temperament, anxiety, and attention problems did not predict attrition in population-based studies [
18,
19]. It may be that the ways different factors affect attrition are dependent on whether the original sample was drawn from a high-risk population. In general, we need more knowledge about psychological variables and family characteristics, since previous research on these topics is divergent and relatively sparse.
Continued participation in a study 10–15 years later may depend on factors other than those related to participation in a shorter time perspective. For example, some mental disorders were differentially associated with attrition in a one-year compared to a 15-year follow-up of a geographical sub-sample in the same study [
5,
9]. Both short-term and long-term attrition should therefore be studied to examine whether samples in long-term studies become increasingly biased over time.
Baseline associations between variables
Public mental health research typically examines how variables such as demographics, social relationships, life stress, and personality predict mental health. Further studies are needed to examine possible baseline differences in associations between variables related to participants who stay compared to those who drop out of longitudinal studies.
Effects of attrition dependent on follow-up variables
Those who stay can be more different from those who drop out of a study at the time of follow-up than at baseline, suggesting that attrition is dependent on follow-up variables. This further implies that attrition is dependent on variables with missing data because the researcher generally only has information on follow-up variables from those who stayed in the study. Therefore, it is not possible to control for sample biases that are related to attrition dependent on follow-up variables. Statistical techniques to account for missing data, such as full information maximum likelihood and most current forms of multiple imputation, are less efficient when missingness is dependent on variables with missing data than when it is only dependent on variables with information from all participants [
20]. Therefore, the effect of attrition dependent on follow-up variables needs to be studied.
Computer simulation studies can be used to examine the effect of attrition that is dependent on unobserved variables. Researchers in simulation studies know the true parameter values in the population [
21‐
23]. Scenarios where attrition is dependent on unobserved variables can be simulated by generating data sets where attrition is dependent on variables with missing values. Parameter estimates obtained from such samples can then be compared to the known true population parameters. Previous simulation studies have examined the effects of non-random attrition on estimates of effects of interventions, estimates of odds ratios, and of cumulative probabilities [
21,
24,
25]. Such studies have typically compared the effect of attrition that is completely random to non-random attrition [
21,
24]. We extend current knowledge by examining the effect of attrition under conditions with different levels of dependency between risk of attrition and predictors as well as follow-up variables.
Aims
The general aim of the current study was to examine the effects of attrition in long-term longitudinal studies. Specific aims were to examine baseline predictors of short-term (one-year follow-up) and long-term (15-year follow-up) attrition and to examine potential differences in baseline correlations between those who stayed and those who dropped out of a 15-year study. The last aim was to perform a computer simulation study to examine the effect of attrition that is dependent on unobserved variables.
Discussion
Findings from the TOPP study showed that those who stayed compared to those who dropped out over the15-year period differed in baseline educational level but not in regard to baseline mental health and relationship variables. Furthermore, the two groups did not differ significantly regarding associations between variables. The results from the simulation study showed that mean estimates became substantially biased even at relatively weak dependencies between follow-up variables and attrition, whereas estimates of associations between variables were more robust to dependencies between attrition and study variables. In addition, mean estimates, but not regression estimates, were strongly affected by attrition rate. The results are more thoroughly discussed below.
Temperamental sociability was a significant predictor of short-term attrition (baseline to one-year follow-up), in that high scores on sociability predicted higher chances of dropping out. Apart from a study showing that antisocial personality predicted having died at follow-up [
5], there are few studies on adult personality and attrition from population-based studies. Our finding shows that psychological variables other than psychopathology can be important for understanding attrition.
In a long-term perspective (baseline to 15-year follow-up), educational level predicted drop-out. The sample became moderately biased towards having more well-educated participants over time, which is in accordance with previous attrition studies finding that socio-demographic variables predict drop-out [
2,
4,
9‐
11].
An important question when examining long-term attrition was whether those who stayed and those who dropped out differed on psychological and social variables at baseline. Some population-based studies have found weak to moderate dependencies between adult psychiatric diagnosis and attrition after adjusting for other variables [
9,
10]. In studies where self-rating measures were used, psychological distress was found to have no effect or a weak to moderate effect after adjusting for other variables [
2,
4]. The results of the present study are more in accordance with the latter, as psychological characteristics of neither the women nor children nor qualities of the spouse/partner relationship predicted long-term attrition. Slightly divergent results may be due to different measures of psychological distress. Our results are also in accordance with previous research showing no associations between baseline child characteristics, such as temperament and anxiety, and attrition in population-based studies [
18,
19].
Even though baseline sociability and educational level predicted attrition, the baseline associations between these variables and mental health were the same among those who later dropped out and those who remained in the study. Of the 15 correlations between psychological distress and other variables examined at baseline, none were significantly different for participants and non-participants at one- or 15-year follow-up. The current findings thus show that even if those who stay and those who drop out of a study differ regarding mean levels of some variables, estimates of associations can be robust to such differences.
The current simulation study provided information about effects of attrition dependent on follow-up as well as baseline variables. The results showed that mean estimates became increasingly biased as attrition rates increased. At 50% and 70% attrition rates, mean estimates became extremely biased, even at weak dependencies between attrition and follow-up variables. Mean estimates became increasingly biased as the dependency between risk of attrition and the study variable got stronger. Therefore, mean estimates from longitudinal studies should be interpreted with caution, even when attrition is only weakly dependent on the variables of interest. These results are in accordance with findings from a study of the effect of selective enrolment in a large population-based study of pregnant women [
8]. Nilsen and colleagues used information about medical conditions among non-responders from a national register and concluded that mean estimates of age, number of cigarettes smoked, birth weight, and other medical variables were biased among participants because of selective participation in the study [
8].
The simulation study further showed that regression estimates were only minimally affected by attrition rate. Regression estimates and their 95% coverage were very similar at both lower and higher attrition rates. In addition, the degree of dependency between attrition and the follow-up variable had only weak effects on regression estimates and their 95% coverages. This was the case both when the population association between predictor and outcome was weak and when it was moderate. Naturally, the proportion of samples that rejected the false null hypothesis of a zero association between the two study variables was higher with stronger population associations. This proportion did not decrease notably as the dependency between attrition and follow-up variables increased. The effect of attrition on estimates of associations between variables thus seemed to be limited to the effect of reduced N when attrition was only dependent on follow-up variables.
However, when attrition became increasingly dependent on both baseline and follow-up variables, the regression estimates were seriously biased, and the 95% coverage dropped dramatically. For weak population associations between variables, the proportion of samples that succeeded in rejecting the false null hypothesis also decreased when attrition became increasingly dependent on both baseline and follow-up variables.
The current results indicate that attrition related to both baseline and follow-up variables has far worse consequences for regression estimates than attrition that is only related to follow-up variables. Being able to account for attrition related to baseline variables can thus reduce the negative consequences of selective attrition on regression estimates. Modern techniques for handling missing data (e.g. full information maximum likelihood and multiple imputation methods) are effective in adjusting for missingness that is dependent on variables with information from all participants [
20]. In longitudinal studies with attrition, the researcher typically has information on baseline variables from all participants, but lack of information on follow-up variables from those who have dropped out. The current results suggest that using such techniques to account for attrition related to baseline variables can reduce the negative effects of selective attrition on regression estimates even if these techniques do not account for attrition related to follow-up variables.
Graham and Donaldson [
24] reported from their simulation study that non-random attrition affected estimation of the effect of an intervention. They concluded that correlation estimates were biased when attrition differed between the control group and the intervention group, but that correlation estimates based on complete cases were unbiased when attrition was the same in both groups, even though attrition was dependent on measured and unmeasured variables. They did not compare different degrees of dependency between attrition and the study variables. Our results thus extend their findings by showing effects of attrition with different degrees of dependency on baseline and follow-up variables.
Limitations
Although the real life study had several strengths-being population-based, extending over a long period of time, and having a relatively large number of participants-there are also some limitations.
First, individuals with the highest levels of mental health problems and alcohol use tend to participate less often than others in population-based studies [
12]. Even though the current results indicate that samples in long-term longitudinal studies may be comparable to those in cross-sectional studies, both kinds of studies face challenges regarding generalizability to persons with high levels of mental health problems. Second, staff at the health care centers organized the data collection at the first three time points, whereas the questionnaires were distributed by mail at later waves. Differences in data collection methods may have influenced attrition in the short-term compared to long-term perspective. Third, some of the measures showed somewhat low reliability, and this may have affected the results. Fourth, the results of this attrition study may be generalizable only to questionnaire studies. Thus, other kinds of studies, such as those employing interviews, need to be examined separately. Fifth, some argue that the Bonferroni correction produces too conservative p-thresholds and therefore too high risk of type II errors [
45]. Working status was a significant predictor of long-term attrition before but not after Bonferroni correction. However, there were no other differences before and after Bonferroni correction in the adjusted solutions. Bonferroni correcting of the results thus had only minimal impact on our conclusions. Moreover, the power analysis conducted showed that relatively small effects in the population could be detected with high probabilities. Thus, the non-findings in the real life study are probably not a result of low statistical power. Sixth, attrition from the TOPP study was mainly due to refusal to participate. Other reasons for attrition from population-based studies, such as death or failure to locate, may have other consequences for generalizability of findings. Finally, the sample size used in the current simulation study was similar to the baseline sample size in the real life study. Different sample sizes can provide different confidence intervals and thus different results regarding statistical significance. Therefore, further simulation studies are needed to examine the effect of attrition under several different conditions.
Conclusions
Together, the findings from the TOPP study and the simulation study suggest that even if estimates of means can be seriously biased in longitudinal studies, estimates of associations seem to be far more robust to selective attrition. Attrition rate affected mean estimates but not regression estimates.
Even at moderate to strong dependencies between attrition and follow-up variables, estimates of associations between variables seem to be generalizable. However, when attrition is dependent on both baseline and follow-up variables, regression estimates tend to be biased. Researchers should therefore use modern missing data techniques to account for attrition related to baseline variables to reduce the negative effects of attrition on regression estimates.
These are important findings both because attrition is common in longitudinal studies and because public health research often aims to study associations between risk/protective factors and health.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
KG performed the statistical analyses. KG drafted the manuscript. TvS, ER, and EK contributed to designing the questionnaires and collecting the data. All authors contributed to the design of this specific study, the interpretation of results, and helped to draft or critically revise the manuscript. All authors read and approved the final manuscript.