Background
As evidence-based practice has grown over the past two decades, there has been a consistent generation of new randomized controlled trials (RCTs) and systematic reviews in medicine and dentistry. Currently, thousands of RCTs and meta-analyses of these trials are published every year to guide healthcare professionals in their evidence-based decisions in clinical practice. In the field of dentistry alone, nearly 50 new clinical trials and 20 systematic reviews are published every month [
1‐
3]. These trials and systematic reviews, in turn, support much of the treatment modalities and treatment recommendations in dental practice based on the current best-identified evidence. While RCTs, the building blocks of systematic reviews and meta-analyses, are considered to provide reliable evidence for dental decision making, RCTs are susceptible to bias (underestimation or overestimation of treatment effect size (ES) estimates) due to limitations in their design, conduct, and reporting [
4,
5]. For results and outcomes of RCTs to be generalizable and valid to specific patient subsets, they need to be properly designed, carefully conducted, and accurately reported to a standard that warrants the implementation of their results [
4,
6].
Blinding (or “masking”) has been recognized as an important criterion of high methodological quality, particularly with respect to internal validity of RCTs [
7]. Blinding is broadly used in a trial to prevent performance bias (blinding of participants and care providers) and detection bias (blinding of assessors) [
8‐
10]. Blinding can be applied at numerous levels of a trial, including participants, outcome assessors, care providers, data analysts, or other personnel. Thus, several terms (e.g., single-, double-, or triple-blind) have been used to describe blinding types [
6,
11,
12]. However, the use of these terms has been inconsistent among research groups, and this contributed to conceptual and operational ambiguity. While appropriate blinding can reduce performance and detection biases, it is not always feasible to apply blinding in a trial, particularly in an RCT that involves surgical or device interventions such as oral surgery and orthodontics, as participants are often aware of the type of intervention they are receiving. The appropriateness of blinding depends on factors such as type of outcome examined (e.g., objective vs. subjective) [
10] and type of intervention applied (e.g., surgical vs. drug) among others. For example, it is more difficult to implement blinding in RCTs of surgical interventions than to implement blinding in RCTs of drug interventions in which trial investigators can use placebo medications to attain adequate blinding [
13].
Published meta-epidemiological studies focused on the blinding domain have found potential associations between treatment ESs and blinding of participants [
14‐
19], care providers [
15‐
17,
19], assessors [
15,
17‐
21], and “double blinding” [
22,
23]. While those meta-epidemiological investigations were conducted within numerous health fields, the value of their conclusions may be limited, based on numerous factors, when generalized to other healthcare fields. These factors include a failure to evaluate continuous outcomes because of a preference for assessing dichotomous outcomes [
15,
21,
23], emergence of inconsistent methodological findings associated with treatment ESs [
15,
17,
22], and the study being “underpowered” [
24] by lacking adequate sample size, which is needed to properly quantify bias in RCTs. More notably, meta-epidemiological studies have reported that the extent of bias in the treatment ES associated with blinding varied across different medical fields as well as across different types of intervention [
17,
24].
To date, no meta-epidemiological study has examined bias related to blinding in RCTs within any oral health subspecialties or scope of practice in dentistry. Therefore, it is unclear whether the previously mentioned conclusions hold true in the field of oral health research where blinding is sometimes difficult or not feasible, especially in oral health RCTs involving surgical or device interventions, such as orthodontic trials.
Thus, our specific research questions were: (1) Do oral health RCTs with adequate blinding of participants, outcome assessors, and health care providers yield different treatment ESs than trials with lack or unclear blinding? (2) Do specific nonmethodological meta-analysis characteristics (e.g., dental specialty, type of treatment, type of outcome [objective vs. subjective], magnitude of the treatment ES estimate, heterogeneity of meta-analysis) modify the association between blinding and treatment ES estimates? Findings generated from this work could be used to improve the conduct and reporting of oral health RCTs.
Discussion
Our investigation provides empirical evidence of the impact of bias associated with nine blinding-based criteria (related to patient, assessor, care-provider, and principal-investigator blinding) on the treatment ES estimate. This analysis is important to methodologists and researchers in dental, oral, and craniofacial research. To our knowledge, this study is the first meta-epidemiological study conducted in any medical or dental field that examines continuous outcomes of the impact of blinding of both patients and assessors and of patients, assessors, and care-providers on treatment ES estimates in randomized trials.
Our study shows significant differences in treatment ES estimates in oral health RCTs based on different types of blinding. For example, RCTs with lack of patient and assessor blinding had significantly larger treatment ES estimates compared to trials without lack of patient and/or assessor blinding. Patient blinding and assessor blinding were associated with inflated treatment ES estimates (significant at the level of patient blinding), while care-provider and principal-investigator blinding were not related to inflated treatment ES estimates. Interestingly, lack of blinding of both assessors and patients was found to be associated with the largest overestimation in treatment ES estimate (0.19). This measured magnitude of bias represents approximately 1/3 of common treatment ES estimates reported in oral health research [
50], such as clinical outcomes in periodontology [
46]. The fact that treatment ES estimates in oral health trials may have been biased due to lack of blinding is concerning, as clinical decision making related to recommended dental treatments and modalities may therefore not be based on valid findings.
The stratified analyses showed that the extent of bias associated with lack of blinding was not significantly associated with any other factor considered at the meta-analysis level. This agrees with a recent study conducted in the area of physical therapy, and is contrary to other meta-epidemiological studies [
51], which showed that trials with subjective outcomes exaggerated treatment ES estimates compared to trials with objective outcomes. This could be due to having a small number of trials with objective outcomes in our study, or to differences between interventions in different medical disciplines.
Reports examining the impact of lack of blinding of patient, therapist, or assessor on treatment ES estimates were conducted in particular medical fields such as physical therapy [
19], thrombosis and cardiovascular disease [
15,
21], pediatrics [
18], osteoarthritis [
45], and low-back pain [
16]. The studies reported inconsistent findings. The treatment ES estimate was smaller in trials that employed patient blinding [
15] or assessor blinding [
20,
21] in some studies, whereas in other studies the treatment ES estimate was smaller in trials that lacked patient [
17] or assessor blinding [
15]. However, an association between the treatment ES estimate and the presence or lack of blinding was not confirmed in some studies [
16,
45]. Furthermore, while the definition of double blinding varied largely among the meta-epidemiological studies with respect to the level of blinding (patient, assessor, and care-provider blinding), a lack of double blinding was found to be associated with exaggerated treatment ES estimates in general [
22,
23,
52]. The inconsistent findings might be due to the examination of different types of outcome, intervention, and population, to the implementation of different definitions of quality assessment, and to the application of various statistical and modeling approaches [
24]. For example, Schulz et al. [
52] applied a multiple logistic regression model to analyze data on binary outcomes from 250 trials included in 33 meta-analyses; the definition of double blinding was based on whether the trial’s conduct claimed to be double-blinded. Egger et al. [
22] defined “double blinding” based on whether the trial was described as double-blind, or included at least assessor blinding; the study analyzed data from 304 trials included in 39 meta-analyses with binary outcomes in several medical fields (infectious diseases, neurology, among others).
Two recent studies [
18,
19] that examined the association between lack of blinding of patient, therapist, or assessor, and treatment ES using continuous outcomes, also reported inconsistent findings. One study assessed the adequacy of patient and assessor blinding in 287 pediatric trials from 17 meta-analyses [
18], and showed no significant difference in treatment ESs between studies, based on potential bias related to lack of blinding. Another study assessed 165 physical therapy trials included in 17 meta-analyses and found that trials with a lack of patient or assessor blinding tended to underestimate treatment ES estimate when compared with trials with appropriate blinding (although, the differences were not statistically significant) [
19]. It should be noted that in both studies, lack of significant results might be accounted for by the small number of trials, precision of the analyses performed, and/or examination of interventions where blinding is not crucial or fundamental (i.e., outcomes are objective or automated with no assessor involvement).
Because the concept of blinding is implemented at multiple levels of a trial (e.g., patients, assessors, care providers, data analysts, investigators), there is confusion when describing the level of blinding implemented. For example, “double blinding” or “triple blinding” may refer to blinding at any two or three of the previous levels. Failure to clearly report the levels that such terms refer to leads to confusion. Investigators of RCTs conducted in the field of dentistry need to implement and clearly report blinding of patients, assessors, care providers, data analysts, and other personnel when applicable, and explicitly report on mechanisms used to achieve and assure successful blinding, as recommended by the Consolidated Standards of Reporting Trials (CONSORT) statement. In addition, investigators of RCTs should state the levels (e.g., patients, assessors, care providers) and components (e.g. allocation, outcomes assessed, details of interventions) they are referring to when they describe blinding of a trial. In addition, they should avoid using the terms “double” or “triple” blind trial when reporting trial findings, and report who was blinded and to what components blinding was achieved, so the reader can evaluate potential associated bias. As well, editors and peer reviewers of dental journals should require authors of randomized trials to adhere to the CONSORT guidelines and insist on adequate conduct and reporting of blinding in submitted randomized trials.
When we examined the association between double blinding and treatment ES, we performed the analysis on two different criteria: reporting of “double blinding” as a term in a trial, and actual conduct of blinding of both assessors and patients. Haahr and Hróbjartsson [
53], who examined a random sample of RCTs from the Cochrane Central Register of Controlled Trials, suggested that it is incorrect to assume blinding of a trial participant based only on the term “double blind.” The study found that blinding of patients, care providers, and assessors was clearly described in only three (2%) of 200 blinded RCTs, while 56% of trials failed to describe blinding status of any individual involved in a trial. That study concluded that either patients, care providers, or assessors were not blinded in one of five “double blind” RCTs. Another trial study [
54] showed that adequate reporting of blinding was common in some medical journals, and that inadequate reporting of blinding does not necessarily entail a lack of actual blinding. For example, it was reported that RCT authors frequently use blinding, although they fail to describe its methods. For instance, authors of RCTs failed to report the blinding status of patients in 26% of trials, and patients were actually blinded in 20% of trials in which patients were not reported to be blinded. Similar results were found in a recent study by Kahan et al. [
55], who reported that blinding of outcome assessors is uncommonly used and inadequately reported in a cohort of 258 trials published in four high-impact medical journals.
An implication that can be drawn from our meta-epidemiological work is that authors of systematic reviews of oral health interventions should consider excluding dental RCTs with lack of blinding from meta-analyses, or at least perform sensitivity analyses on included trials based on the adequacy of blinding. In all instances, authors should consider the likely level of bias associated with reported (or unreported) blinding status when interpreting the findings of a quantitative analysis.
The above-mentioned implications should be considered with caution, particularly in oral health trials involving surgical or device interventions (such as orthodontic trials) where patient blinding is not feasible; in this case, informing patients with details of the intervention is required, and sometimes ethically compulsory. While these RCTs are prone to biases, particularly when the RCTs examine self-reported outcomes, implementation of blinding in the conduct of these trials is often unacceptable for ethical and practical reasons. For example, in the case of trials comparing surgical interventions to nonsurgical interventions (e.g., comparison of surgical removal of wisdom teeth versus retention or conservative management), patients and surgeons cannot be blinded. However, trialists may consider using “expertise-based” trial design, whereby patients are allocated to multiple surgeons and each surgeon performs a single treatment [
56]. While this design helps to minimize performance bias related to surgeon blinding, it does not ensure patient blinding [
57]. Furthermore, in trials where patients cannot be blinded (e.g., comparison of manual versus electric toothbrushing), trialists may consider using objective outcomes that have established validity and reliability [
56] or blind patients to trial’s hypothesis. When blinding is feasible, trialists should consider blinding as many trial components (participants, assessors, care-providers, statisticians, investigators) as ethically and practically possible.
Based on this evidence, investigators of systematic reviews conducted in dental, oral, and craniofacial trials should perform sensitivity analyses based on the adequacy of blinding in included trials. The potential impact of blinding on bias in treatment ES suggests that dental journal editors and reviewers should insist on adequate blinding (when feasible) with respect to trial conduct and reporting, in published trials’ reports.
Strengths and limitations of the study
This meta-epidemiological study provides an empirical analysis of the association between treatment ES estimates and bias, in the domain of oral health research. The study has several limitations.
First, we examined published studies only (bias was based on reported methodological characteristics), and did not evaluate actual conduct of the RCTs. Accordingly, data extraction and analyses were based on information given by authors in published reports. This approach, although widely used, limits the identification of actual bias if trial authors do not adequately report study elements.
Second, while there are many ways for an RCT planned as blinded to become unblinded, [
58] our study did not use specific mechanisms to look for evidence of unblinding, such as differential (across treatment groups) incidences of specific adverse events that would give away which patients received which interventions and large baseline imbalances indicative of the type of selection bias that may occur with unsuccessful allocation concealment [
59,
60]. Also, our study did not look at how many RCTs reported a valid and reliable method of assessment of the success of blinding such as the Berger-Exner test of selection bias [
58]. Accordingly, future RCTs should routinely conduct and report the results of a valid and reliable method of assessing the success of blinding (such as the Berger-Exner test) based on the extent to which any unblinding led to selection bias [
61,
62].
Third, certain levels of heterogeneity are expected in any meta-epidemiological examination of the impact of bias on treatment ES estimates. Such studies analyse numerous entities (meta-analysis, trials, and participants) that have a distinct potential for heterogeneity [
24]. By applying a cautious methodology to data collection and analysis in this study, and by assembling a large number of meta-analyses and trials, study power was increased and heterogeneity was reduced.
Fourth, because our study did not compare the same treatment with different degrees of blinding, the identified evidence could lead to the conclusion that trials of interventions where blinding is not feasible, such as surgery or devices, have in general higher treatment ES estimates. Future meta-epidemiological studies should further investigate the above-mentioned concept.
Finally, this study did not assess the likely effects of interactions with other design biases. Such an assessment would have to include a multivariate analysis with a larger number of meta-analyses and trials [
17]. Future meta-epidemiological assembling of a greater number of meta-analyses and trials by synthesizing results from different disciplines and datasets should take other design biases into account.