Methods
We carried out a scoping review, which is a methodological strategy that enables the results of an exploratory research to be summarized. In this type of review, unlike other systematic reviews, the application of quality filters is not an initial priority [
10]. We performed and reported our study based on the methodological guidance for the conduct of a scoping review from the Joanna Briggs Institute [
11] and the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) Extension guideline for Scoping Reviews [
12]. The protocol for this scoping review is available on request from the corresponding author.
Data-sources and search strategy
An automated search of bibliographic databases was performed, with an initial search in MEDLINE, subsequently supplemented by EMBASE and Web of Science. To avoid duplicated results, in EMBASE and Web of Science we used the option that enables journals indexed in MEDLINE to be excluded. The same free-text search strategy was applied in the 3 databases: (clinical–data* OR health–data* OR medical–data* OR prescription–data* OR administrative–data* OR epidemiologic–data* OR health–claim* OR administrative–claim* OR insurance–claim* OR claims–data* OR health–record* OR medical–record*) AND (confounding OR bias* OR missing–data OR misclassification) AND (observational OR epidemiolog* OR pharmacovigilance OR challenge*) AND drug, from January 1, 2000 to January 1, 2018. All types of research design were considered. Adding restrictive MeSH (Medical Subject Headings) terms according to type of publication was not deemed suitable, since this was found to lead to an excessive reduction in search sensitivity.
Once the references were identified, the titles and the abstracts, when available, were used as a preliminary screening filter, and if deemed potentially relevant, full text articles were retrieved. Other relevant references were identified by manually cross-checking reference lists of selected articles and using the “related articles” option. This full screening was performed by two reviewers (GP-R, AF). Discrepancies were discussed between the two reviewers to achieve consensus. In case of a possible disagreement, a third author (BT) was designated.
Article selection and data abstraction
We included in the review opinion essays, methodological reviews, analyses/reanalyses and simulation studies, as well as letters to the editor or retractions, the principal objective of which, described in their abstracts, was to highlight the existence of some type of bias in pharmacoepidemiologic studies that used secondary health care databases.
In order to reduce the number of identified references and thus simplify the display of the results, the following exclusion criteria were considered that classified dismissed references into subgroups: (1) its principal objective was to describe, compare, evaluate, validate or develop a bias-control strategy for a known bias or limitation (e.g. analytical method, study design, algorithm, framework); (2) it estimated a measurement (e.g. association treatment-effect) or identified risk factors for a disease, with the existence of bias being mentioned as a limitation of the study, regardless of whether or not strategies for its control were used; (3) it had characteristics different from those indicated above (e.g. studies with different objectives, not based on secondary databases, with no drug involved, no bias mentioned) or it was a conference paper with no abstract/full-text available.
A data charting form was jointly developed by two reviewers (GP-R and AF) to determine which variables had to be extracted. One person (GP-R) extracted the information from the articles (i.e. first author, publication date, category under which the journal was indexed −if the journal was indexed under more than one category, the category under which it was best ranked was considered−, type of article, type of bias(es) mentioned) and when further clarification was needed, articles were checked and validated by additional reviewers as a form of quality control (AF and BT). The three reviewers discussed the results and continuously updated the data charting form.
The synthesis included both quantitative analysis (i.e. publication trend of identified/included articles and frequency analysis of the biases mentioned) and qualitative analysis (i.e. content analysis) of the components of the research purpose.
Discussion
This is the first known structured review that explores potential biases in observational studies of pharmacoepidemiologic databases. The results of this review suggest that there is growing concern in the scientific literature about identifying, describing and controlling such biases. This should not be overlooked, since observational epidemiologic database studies currently afford an excellent opportunity for medical research. The results of these studies are to be valid and applicable to decision-making about safety and effectiveness. It is then of paramount importance that proper account be taken of these biases to ensure that they are correctly controlled for.
Confounding bias as such, or in any of its diverse forms of presentation, is mentioned in almost two-thirds of the articles included in the scoping review (see Table
1 for references). Adequate control of confounding poses a challenge in studies that use health care databases, since these were not designed for undertaking epidemiologic studies. The absence or poor quality of data on potential confounding factors in secondary databases (e.g. over-the-counter drugs, frailty of the subject, smoking habit) is a frequent phenomenon [
14‐
17], which renders it difficult or even impossible to adjust for such factors, in order to control for confounding [
18].
If data on confounding variables has been collected, the reviewed articles propose different control methods: (1) in the design stage, through the application of restriction criteria, matching methods, or implementation of a new-user design (see below,
depletion of susceptibles); and (2) in the analysis stage, through stratification of patients across treatment groups according to relevant factors, or multivariate regression techniques, by including these confounding factors as independent variables in regression models. In cases in which the number of variables is very high, adjusting for the disease risk score [
19] or the propensity score to receive treatment may be of interest [
20,
21].
Among the studies dealing with the issue of confounding in pharmacoepidemiology, the most commonly described type of confounding is
confounding by indication for treatment (the treatment decision is associated with an indication, which is in turn a risk factor for the disease), which is mentioned in one-third of the articles reviewed (see Table
1). Confounding by indication, often also referred to as channeling bias, is closely related to selection bias [
22]. Some useful analytical control methods proposed include separating the effects of a drug taken at different times [
23], sensitivity analysis for unmeasured confounding factors (see below), and the use of instrumental variables [
24]. Furthermore, according to the literature reviewed, there seems to be a general agreement that conventional methods for control of confounding factors are inadequate in controlling
time-dependent confounding (mentioned in 6.0% of the articles reviewed, see Table
1). G–estimation [
25] and marginal structural models [
26] are alternative methods for achieving such control.
More than a quarter of the articles included in the scoping review consider the absence of quality data to control for potential confounding variables as an important limitation of observational pharmacoepidemiologic studies using secondary databases (see Table
1). Therefore, the proposed strategies for the control of
unmeasured variables include the performance of sensitivity analyses and use of information external to the database [
27‐
29]. Instrumental variable techniques, proxy measures and propensity scores, excluding from the analysis treated and untreated subjects having extreme values, have also been used [
30]. In the design stage, case-crossover study designs, where each study participant receives all treatments that are being investigated but at different times [
31], and restriction to an active comparison group can be useful. The active comparator design emulates the design of a head to head randomized controlled trial. Instead of using a non-user group, the drug of interest is compared with another drug commonly used for the same indication. By ensuring that treatment groups have similar characteristics, this design potentially helps to mitigate both measured and unmeasured confounding [
32]. At all events, with the exception of crossover designs, where the order in which a study participant receives the treatments is randomized, control for unmeasured variables will never be optimal or, at best, one could never be sure that it would be so. But even in this case, the crossover design may still be affected by time-dependent confounding.
In this context, Hernán has proposed a new approach based on the use of observational data from a large health care database to emulate a hypothetical randomized trial (the target trial) [
33]. Although the emulated target trial helps avoid common methodologic pitfalls, the appropriate adjustment for time-dependent confounders remains critical [
34].
In contrast to clinical trials, an advantage of observational pharmacoepidemiologic studies in which the study populations are constructed on the basis of large health care databases is the inclusion of
frail patients. However, some authors have argued that due to the fact that frailty is difficult to measure and a strong risk factor for unfavorable outcomes, it will lead to unmeasured and residual confounding, and possibly to paradoxical results [
35,
36]. Frailty is an example of an unmeasured confounding variable [
14,
15].
About 5% of the reviewed articles deal with the
healthy user effect (see Table
1), which consists of a type of confounding generated because patients with healthier behaviors generally demand medical attention more frequently for preventive treatments or asymptomatic chronic diseases. These patients are also more likely to be better adherers. Accordingly, part of the apparent efficacy/safety of the treatment will be due, not to the treatment per se, but rather to the healthier behaviors that are associated with those taking it [
18,
37]. In observational studies of pharmacoepidemiologic databases, these types of behavior are seldom measured, thus making it very difficult to control for their effect [
38].
Almost half of the articles included in the scoping review mention some type of selection bias. Within this category, it is worth highlighting the
protopathic bias. Although this bias is not widely mentioned in our review (3.4%, see Table
1), possibly because it is unusual for the treatment to be associated with subclinical states and/or early symptoms of the disease, the impact of this bias may be important. However, controlling protopathic bias is not easy since it is not a confounding bias, and adjustment techniques are thus useless. In this case, we must resort to restriction of the exposure group to patients with indications that are unrelated to the initial states of the disease under study. Another option for controlling protopathic bias is to use the concept of lag–time to define the etiologic window in which the exposure to the drug is assessed [
39].
Consumption of medicines under real conditions is subject to important variations (e.g. variation in the dose, treatment interruptions, dropouts), especially in the management of chronic diseases. This variability may be due to changes in the disease (increasing or decreasing severity) or in the effect of the drug (adverse events or interactions). The traditional approach through an “as-treated” analysis, in which one censors subjects who interrupt their treatment during follow-up, may introduce bias since censored subjects (losses to follow-up) are systematically at higher or lower risk of developing the outcome [
40,
41]. In practice, this
informative censoring (mentioned in only 2.6% of the articles reviewed, see Table
1) leads to a selection bias. For example, if the clinical effects expected are not met then the treatment is suspended or modified. The bias consists in selecting for the analysis data of patients for whom the treatment produces the expected outcome [
42]. This bias may be identified through sensitivity analyses. In this regard, the use of databases represents an important advantage as information on the outcome may be available even when the treatment was suspended. To control the bias introduced by an exposure to the drug that varies with time, it could prove useful to consider that exposure as a time-dependent variable in an appropriate multivariate regression model. Procedures based on the inverse probability censoring weighting have also been proposed [
43].
Judging by the number of articles that mention it (10.3%), greater importance has been given to another type of selection bias known as
depletion of susceptibles, which is caused by the inclusion in the study of both prevalent and incident treatment users (see Table
1). Prevalent users (“survivors” from the first treatment period) may not have the same risk of an adverse event as incident (new) users, i.e., those who tolerate the medication continue using it and those who do not tolerate the medication (susceptible to the adverse event) have stopped using it. This bias can be prevented in the design stage of the study by limiting the follow-up to new users [
44]. The new-user design allows potential confounding factors to be measured just before the start of follow-up. This way, these confounding factors will not be affected by the treatment. Adjustment for differences between treatment groups will then use the baseline values of the confounders [
45].
Apart from ensuring an appropriate adjustment for confounding, the new-user design potentially reduces immortal time bias (see below) when combined with the active comparator design by implementing similar definitions of the index date across comparison groups [
32]. The new user design combined with the active comparator design can also reduce confounding by indication and other unmeasured patient characteristics (e.g. frailty, healthy user) at the design stage [
46].
As our results suggest, one of the major challenges in the analysis of observational data is the
missing data issue [
47], which is mentioned in almost one of every five articles included in the scoping review (see Table
1). If the probability of missing an observation is independent of both observed and missing data, complete cases are assumed to be a random sample of the full dataset (i.e. missing completely at random [
48]). In this case, dropping cases with missing data may give unbiased estimates. However, in the multivariate analysis, observations (or subjects) are eliminated whenever where data of a variable included in the model are missing. As a consequence, observations with missing values may lead to a substantial attrition of the sample size. If this lack of information is associated with an important characteristic (e.g. severity, frailty), an effect equivalent to selection bias is produced.
Sometimes, it is assumed that the probability of missing an observation may be predicted by variables that are measured previously, but which are not further dependent on unmeasured variables (i.e. missing at random [
48]). That is, the probability of dropout will depend on observed values. Although standard analysis of the available cases is potentially biased in this case, methods that can provide valid analysis are available, but these require additional appropriate statistical modeling.
In both circumstances described above, likelihood-based methods (e.g. mixed models), in which missing data can be estimated using the conditional distribution of the other variables, can be useful for controlling bias [
49]. There are alternative techniques, such as multiple imputation, that preserve the natural variability of the data [
50] and incorporate the uncertainty due to missing data [
51], with which similar results are obtained. Inverse probability weighting (where complete cases are weighted by the inverse of their probability of being a complete case) is also a commonly used method to reduce this bias. While multiple imputation requires a model for the distribution of missing data given the observed data, the inverse probability weighting requires a model for the probability that an individual is a complete case [
52]. In any case, it is important that all covariates on which missingness depends be included in the model.
On the contrary, if the fact that an observation is missing is predicted by unmeasured variables, such as the outcome of interest (i.e. missing not at random, sometimes called “non-ignorable non-response” or “informative missingness”), then no statistical approach can give unbiased estimates. When missingness cannot be empirically modelled, the recommended approach is to conduct sensitivity analyses to determine the extent of missingness [
53].
After confounding by indication and unmeasured/residual confounding, our results show that the bias most frequently described in studies using secondary health care databases is that due to systematic misclassification errors which distort the association between treatment and outcome.
Exposure or outcome misclassification, which is mentioned in almost half of the articles included in the scoping review (see Table
1), can give rise to measurement biases and heterogeneity [
17,
54,
55]. To prevent this, a validation study of these variables should first be conducted, followed by the performance of a sensitivity analysis or application of regression techniques [
56]. Medical records are normally considered the
gold standard or reference for intermediate and final outcome variables but display limitations in the recording of all medications taken by patients [
57]. While dispensing records are more detailed in measurement of exposure (though they do not record the over-the-counter or out-of-pocket consumption at an individual level), they nonetheless lack outcome variables [
1,
3,
58,
59]. It is therefore important to link both types of data sources [
60,
61] and consider, when necessary, the use of additional data collected expressly for research purposes [
15,
62,
63], to avoid errors that may generate misleading conclusions [
64,
65].
The last category of bias identified was that related to time. However, it must be taken into account that the mechanism that underlies the generation of a time-related bias may be closely related to the other larger categories described (i.e. confounding, selection or measurement bias). By far, the most frequently described time-related bias is the
immortal time bias, which is mentioned in one of every four articles reviewed (see Table
1). Immortal time bias (where the follow-up includes a time period during which the study event cannot occur or is excluded from the analysis due to an incorrect definition of the start of follow-up) resurged with a number of observational studies that reported surprisingly beneficial effects of drugs [
66,
67] and is increasingly being described in cohort studies of pharmacoepidemiologic databases [
68‐
70]. Suissa warns about the risk of reporting absurd conclusions, if inappropriate data-analysis methods are used [
69‐
75]. To prevent this, the entire follow-up time, including that preceding the start of exposure, must be considered, and exposure during immortal time must be correctly classified [
76]. By applying a Cox model with time-dependent exposures, more reliable estimates can be obtained [
69,
77,
78].
Limitations
This scoping review presents the limitations inherent to this type of study design. In contrast to classical systematic reviews, the aim of which is to provide answers to a clearly defined research question, the scoping studies are less likely to seek very specific research questions nor, consequently, to assess the quality of included studies [
79]. In this sense, a potential reviewer’s bias in the assessment of the restriction criteria cannot be ruled out since they are not based on a measurable quality of the identified references. However, we do not believe that this may hinder the purpose and the conclusions of the review.
Due to the exploratory nature of this review, its purpose was not to obtain all available evidence on a specific topic, but rather that from a subset of the literature on a broad topic (bias in observational pharmacoepidemiologic studies using secondary data sources), where many different study designs might be applicable (opinion essays, methodological reviews, analyses, letters to the editor or retractions). Although a wide-search strategy was employed, some relevant studies may have been missed. Therefore, the existence of some selection bias cannot be ruled out. Furthermore, the search strategy itself, intentionally designed to identify articles that highlight the limitations of secondary databases, does not allow an unbiased comparison with the articles that may show the advantage of secondary databases.
Given the above limitations, and the fact that information on bias was extracted based on the description provided by the original authors, another limitation would be related to the quantification of each type of bias. This should be interpreted as an approximate measure of the impact of the bias on the published literature (i.e. what is prominently talked about), but not as an estimate of the probability of occurrence (or detection) of the bias in the population of pharmacoepidemiologic studies that use secondary databases, since it may be influenced by the ease of describing that specific bias or by the interest that the bias may have raised in the studies of the most prolific authors in the field (e.g. immortal time bias). It is therefore possible that a certain degree of misclassification of some biases exists.