Scolaris Content Display Scolaris Content Display

Clinical assessment to screen for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults

This is not the most recent version

Collapse all Expand all

Abstract

available in

Background

The early detection and excision of potentially malignant disorders (PMD) of the lip and oral cavity that require intervention may reduce malignant transformations (though will not totally eliminate malignancy occurring), or if malignancy is detected during surveillance, there is some evidence that appropriate treatment may improve survival rates.

Objectives

To estimate the diagnostic accuracy of conventional oral examination (COE), vital rinsing, light‐based detection, biomarkers and mouth self examination (MSE), used singly or in combination, for the early detection of PMD or cancer of the lip and oral cavity in apparently healthy adults.

Search methods

We searched MEDLINE (OVID) (1946 to April 2013) and four other electronic databases (the Cochrane Diagnostic Test Accuracy Studies Register, the Cochrane Oral Health Group's Trials Register, EMBASE (OVID), and MEDION) from inception to April 2013. The electronic databases were searched on 30 April 2013. There were no restrictions on language in the searches of the electronic databases. We conducted citation searches, and screened reference lists of included studies for additional references.

Selection criteria

We selected studies that reported the diagnostic test accuracy of any of the aforementioned tests in detecting PMD or cancer of the lip or oral cavity. Diagnosis of PMD or cancer was made by specialist clinicians or pathologists, or alternatively through follow‐up.

Data collection and analysis

Two review authors independently screened titles and abstracts for relevance. Eligibility, data extraction and quality assessment were carried out by at least two authors independently and in duplicate. Studies were assessed for methodological quality using QUADAS‐2. We reported the sensitivity and specificity of the included studies.

Main results

Thirteen studies, recruiting 68,362 participants, were included. These studies evaluated the diagnostic accuracy of COE (10 studies), MSE (two studies). One randomised controlled of test accuracy trial directly evaluated COE and vital rinsing. There were no eligible diagnostic accuracy studies evaluating light‐based detection or blood or salivary sample analysis (which tests for the presence of bio‐markers of PMD and oral cancer). Given the clinical heterogeneity of the included studies in terms of the participants recruited, setting, prevalence of target condition, the application of the index test and reference standard and the flow and timing of the process, the data could not be pooled. For COE (10 studies, 25,568 participants), prevalence in the diagnostic test accuracy sample ranged from 1% to 51%. For the eight studies with prevalence of 10% or lower, the sensitivity estimates were highly variable, and ranged from 0.50 (95% confidence interval (CI) 0.07 to 0.93) to 0.99 (95% CI 0.97 to 1.00) with uniform specificity estimates around 0.98 (95% CI 0.97 to 1.00). Estimates of sensitivity and specificity were 0.95 (95% CI 0.92 to 0.97) and 0.81 (95% CI 0.79 to 0.83) for one study with prevalence of 22% and 0.97 (95% CI 0.96 to 0.98) and 0.75 (95% CI 0.73 to 0.77) for one study with prevalence of 51%. Three studies were judged to be at low risk of bias overall; two were judged to be at high risk of bias resulting from the flow and timing domain; and for five studies the overall risk of bias was judged as unclear resulting from insufficient information to form a judgement for at least one of the four quality assessment domains. Applicability was of low concern overall for two studies; high concern overall for three studies due to high risk population, and unclear overall applicability for five studies. Estimates of sensitivity for MSE (two studies, 34,819 participants) were 0.18 (95% CI 0.13 to 0.24) and 0.33 (95% CI 0.10 to 0.65); specificity for MSE was 1.00 (95% CI 1.00 to 1.00) and 0.54 (95% CI 0.37 to 0.69). One study (7975 participants) directly compared COE with COE plus vital rinsing in a randomised controlled trial. This study found a higher detection rate for oral cavity cancer in the conventional oral examination plus vital rinsing adjunct trial arm.

Authors' conclusions

The prevalence of the target condition both between and within index tests varied considerably. For COE estimates of sensitivity over the range of prevalence levels varied widely. Observed estimates of specificity were more homogeneous. Index tests at a prevalence reported in the population (between 1% and 5%) were better at correctly classifying the absence of PMD or oral cavity cancer in disease‐free individuals that classifying the presence in diseased individuals. Incorrectly classifying disease‐free individuals as having the disease would have clinical and financial implications following inappropriate referral; incorrectly classifying individuals with the disease as disease‐free will mean PMD or oral cavity cancer will only be diagnosed later when the disease will be more severe. General dental practitioners and dental care professionals should remain vigilant for signs of PMD and oral cancer whilst performing routine oral examinations in practice.

Plain language summary

The detection of oral cavity cancers and potentially malignant disorders in apparently healthy adults

Cancer of the mouth is a serious condition and only half of those that develop the disease manage to survive after five years. It is commonly preceded by visible lesions, which if identified early, can be treated and could result in simpler surgery and much better outcomes. As a result, there is a need to understand how good different types of tests are at the early detection of oral cancer and the lesions that precede it. The most common method is an oral visual inspection by a clinician, but other tests include the use of a blue 'dye', illumination with a special light and self examination by the individual. The review found a lot of variety in the ability of the different tests to differentiate between healthy mouths and non‐referable lesions and more serious lesions or oral cancer. Overall, visual examination by a front‐line health worker proved to be the best method. Between 59% and 99% of mouth cancers were detected, although sometimes normal tissue was mistaken for oral cancer. The remaining techniques examined were not as good at detecting mouth cancer and identified less than a third of cases.

Authors' conclusions

Implications for practice

There are known clinical and methodological difficulties associated with screening for potentially malignant disorders (PMDS) and cancer of the lip or oral cavity. These include the relatively low incidence rates, the reluctance of screened positive individuals to attend for follow‐up, a lack of linear transition between pre‐malignant and malignant states (Reibel 2003), disagreement over disease management (Warnakulasuriya 2009) and the relative cost‐effectiveness of mass, selective and opportunistic screening programmes (Brocklehurst 2011). 

A recent systematic review examined whether screening programmes for oral cancer could detect the disease and reduce the associated mortality of the condition (Brocklehurst 2013). One cluster randomised clinical trial was identified from Kerala in India. The screening programme comprised of four cycles over a 15‐year period and involved 13 clusters with 191,873 participants. There was no statistically significant difference in the oral cancer mortality rates between the screened group (15.4/100,000 person‐years) and the control group (17.1/100,000 person‐years). However, when only high risk individuals were included in the analysis (users of tobacco or alcohol or both), there was a reported reduction of 24% in the mortality rate. A statistically significant reduction was also found in the number of individuals diagnosed with late stage disease in the screened group (risk ratio 0.81; 95% confidence interval 0.70 to 0.93). No harms were reported but the study was assessed to be of high risk of bias. Across the four cycles (15 years) of the programme, the reported sensitivity of the visual examination in detecting oral cancer was 67.4% (188/279). No information on the specificity or the positive predictive value of the programme was recorded. However, the latter was calculated based on the published data from the study as the number of screen‐selected oral cancers as a proportion of total screened positive subjects (confirmed by biopsy), which was 86.5% for oral cancer.The cost‐effectiveness of this study was considered to meet the standards of the World Health Organization (Subramanian 2009). Selective screening of high risk groups and opportunistic screening may reduce costs (Speight 2006), but many high risk patients do not attend general dental practices (Netuveli 2006).

The lack of any formal registration for PMD, in contrast to malignancy, makes it difficult to estimate possible reductions in mortality due to a screening programme aimed at precursor lesions. In addition, the efficacy of the early management of PMDs is a controversial area and the evidence base has recently been challenged (Holmstrup 2007; Holmstrup 2009). Holmstrup has demonstrated that even if lesions are surgically removed, the risk of malignant change may remain since the lesion represents only a small area in a field of damaged mucosa, any part of which may progress to malignancy.

The results of this review suggest that using the conventional oral examination (COE) for screening for PMD and oral cancer has a variable degree of sensitivity (greater than 0.70 in six of the 10 studies) and a consistently high value for specificity (greater than 0.90 in all eight studies). However, there was considerable clinical heterogeneity in the participants forming the study samples, the application of the index test and reference standard and the flow and timing of the process. Exploring the primary studies for sources of heterogeneity has not shown any single factor to consistently influence the accuracy of the screening test. In terms of test accuracy, there is limited evidence of performance in each of the different settings, with clinicians or non‐clinicians carrying out the index test etc. which means that the current evidence base is limited, though COE has been shown to have good estimates of both sensitivity and specificity in some studies. Further, even though the evidence of accuracy is not consistently strong, there is some evidence (Brocklehurst 2013) that implementing COE as a component of a population screening programme can reduce mortality and produce stage‐shift in a high risk population. Should such findings be replicated in other studies then it could be argued that explicit testing of the test accuracy per se of the COE is unnecessary, given the positive outcomes on mortality. Emphasis could be placed on the effectiveness of screening programmes, of which COE is a component, in reducing morbidity and mortality. This should be supplemented with information on the consequences of false negative and false positive screens.

There is insufficient evidence to deviate from the conclusions of the American Dental Association that oral cancer screening may detect PMDs and cancer of the lip or oral cavity (Rethman 2010). General dental practitioners and dental care professionals should remain vigilant for signs of PMD and oral cancer whilst performing routine oral examinations in practice.

The sensitivity estimates for mouth self examination were lower than for COE, though these studies were on different participant samples and should not be directly compared. There is insufficient evidence to satisfactorily determine the diagnostic test accuracy of mouth self examination as part of an organised screening programme.

Implications for research

It is clear that there are some methodological shortcomings in the studies included in this review. The QUADAS‐2 tool has provided a robust means of assessing the methodological quality of the included studies. There is now an opportunity to use this framework, along with the guidance from the Cochrane Diagnostic Test Accuracy Editorial Group, to ensure that future studies are conducted in a robust manner, with particular attention paid to the design of the study in the four domains of the QUADAS‐2 tool. It is imperative that studies are reported with sufficient information to allow judgement of the merits of the study and its applicability to the review being undertaken. Reporting according to the STARD checklist should facilitate this process. In particular, results have been promising in the workplace setting, and for some opportunistic screening studies.

The population and participant selection should be clearly stated and carried out to reduce the possibility of sampling bias, preferably using a consecutive sample. The index test should be undertaken by trained and calibrated screeners, whose threshold for agreement should be stated a priori. The reference standard should be both accurate and pragmatic to account for the practical considerations involved in establishing the initial diagnostic test accuracy component of large population screening programmes. For such programmes it is not necessary to apply the reference standard to the entire programme's participants, rather an initial evaluation of test accuracy should be established on a sizeable number of participants prior to commencement of the screening programme proper. It is also important to utilize reference standards that capture all the target conditions under question, not just those that are likely to be identified through cancer registries. Finally, the flow and timing of the diagnostic test accuracy study should ensure that the reference standard is undertaken within a short time frame after the index test, given the potential for PMD to undergo malignant transformation and for it to be applied after the index test to avoid bias being introduced. Where long‐term follow‐up is used as a reference standard, measures should be taken to minimise attrition. Further research on ways to maximise initial participation rates and also follow‐up rates for those screened positive is warranted.

Summary of findings

Open in table viewer
Summary of findings 1.

What is the performance of conventional oral examination for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults?

Population: Oral cavity cancer or potentially malignant disorder symptom‐free individuals screened opportunistically, or through an organised screening programme

Index test: Conventional oral examination

Target condition: Oral cavity cancer or potentially malignant disorder

Reference standard: Examination and clinical evaluation by a physician with specialist knowledge or training. Long‐term follow‐up was accepted as a suitable reference standard for those participants screened negative

Studies: Cross‐sectional (consecutive sample) (9) or validation sample in a randomised controlled trial of screening intervention (1)

No. of participants (studies)

Effect (95% CI)

Population: Individuals attending for opportunistic screening (2), organised screening programme (4), validation as part of an organised screening programme or randomised controlled trial (3), screening as part of a routine surveillance appointment (1)

Index test: Conventional oral examination

Prevalence: Range from 1.4% to 50.9%

25,568 (10)

No pooled analysis

Range:

Sensitivity 0.50 (95% CI 0.07 to 0.93) specificity 0.98 (95% CI 0.92 to 1.00)

Sensitivity 0.99 (95% CI 0.97 to 1.00) specificity 0.99 (95% CI 0.99 to 0.99)

CI = confidence interval

Open in table viewer
Summary of findings 2.

What is the performance of mouth self examination for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults?

Population: Oral cavity cancer or potentially malignant disorder symptom‐free individuals screened through an organised screening programme

Index test: Mouth self examination

Target condition: Oral cavity cancer or potentially malignant disorder

Reference standard: Examination and clinical evaluation by a physician with specialist knowledge or training or trained health worker

Studies: Cross‐sectional studies (or consecutive series) (2)

No. of participants (studies)

Effect (95% CI)

Population: Individuals attending for organised screening programme (2)

Index test: Mouth self examination

Prevalence: 0.6% and 22.6%

34,819 (2)

No pooled analysis

Sensitivity 0.18 (95% CI 0.13 to 0.24) specificity 1.00 (95% CI 1.00 to 1.00)

Sensitivity 0.33 (95% CI 0.10 to 0.65) specificity 0.54 (95% CI 0.37 to 0.69)

CI = confidence interval

Open in table viewer
Summary of findings 3.

What is the performance of vital rinsing (Toluidine blue) as an adjunct to conventional oral examination compared to conventional oral examination alone?

Population: Oral cavity cancer or potentially malignant disorder symptom‐free individuals with tobacco habits

Index test: Conventional oral examination plus vital rinsing (Toluidine blue) compared to conventional oral examination alone

Target condition: Oral pre‐malignant lesions and malignant lesions

Reference standard: Biopsy. Long‐term follow‐up through the National Cancer Registry

Studies: RCT (1)

No. of participants (studies)

Effect (95% CI)

Population: Individuals attending an organised screening programme

Study: Direct RCT

Index tests: Conventional oral examination plus vital rinsing (Toluidine blue) compared with conventional oral examination alone

Prevalence: 4.6% and 4.4%

7975 (1)

Detection rate of oral pre‐malignant lesions and malignant lesions after referral was 4.6% in conventional oral examination plus vital rinsing arm; 4.4% in conventional oral examination alone. Rate ratio of 1.05 (95% CI 0.74 to 1.41). Incidence rate of oral cancer (x10‐5) of 28 compared to 35.4. Relative incidence rate of 0.79 (95% CI 0.24 to 1.23)

* Initial screen positive rate higher in the vital rinsing arm (9.5% and 8.3%)

CI = confidence interval; RCT = randomised controlled trial

Background

Target condition being diagnosed

The target conditions of interest are oral cavity cancer and potentially malignant disorders (PMD) of the lip and oral cavity. PMD is a term used to describe a range of lesions that present in the mouth and have the potential for malignant transformation. These include: erythroplakia, non‐homogeneous leukoplakia, erosive lichen planus, oral submucous fibrosis and actinic keratosis (van der Waal 2009; Warnakulasuriya 2007).

The natural history of oral cancer is not fully understood (Napier 2008; Scully 2009). Carcinogenesis is a complex disease process; not all oral cancers will be preceded by PMD and not all PMD undergoes malignant transformation. Erythroplakia, non‐homogeneous leukoplakia, erosive lichen planus, oral submucous fibrosis and actinic keratosis are the most important PMDs (Warnakulasuriya 2007) to proceed to carcinoma. Oral leukoplakia is the most common form of PMD and is defined as "white plaques of questionable risk having excluded (other) known diseases or disorders that carry no increased risk for cancer" (Warnakulasuriya 2007). Between < 1% and 18% of oral leukoplakias undergo malignant transformation. The presence of epithelial dysplasia can help predict malignant development in oral leukoplakia but the process is not linear; some mild dysplastic lesions undergo malignant transformation, whilst some severe lesions resolve (Jaber 2003; Reibel 2003). Carcinoma can also develop from lesions in which epithelial dysplasia was not previously diagnosed (Jaber 2003; Reibel 2003). As a result, most authorities regard leukoplakia as a dynamic rather than a static process (Napier 2008). In contrast, PMDs that are red or predominantly red in colour (e.g. erythroplakia and erythroleukoplakias) undergo malignant transformation more readily (Mashberg 1988; Mashberg 1995; Scully 2009).

Estimates of malignant transformation rates (MTR) vary enormously, from site to site within the mouth, from population to population and from study to study (Napier 2008). The MTR of hospital‐based surveys are consistently higher than community‐based studies because of sampling bias. Petti 2003 calculated a global MTR of oral leukoplakia of 1.36% per year (95% confidence interval (CI) 0.69% to 2.03%) based on the prevalence of oral leukoplakia, but this far exceeds the numbers of actual cases of malignancy. Virtually all studies emphasize the chronicity of oral PMD, with an increasing tendency to malignant change in the first five years. For example, the incidence of oral squamous cell carcinoma (OSCC) arising from leukoplakia in Californians was greatest in the second year of follow‐up (11 out of 45; 24%) (Silverman 1984). The proportion of PMD that will develop OSCC is uncertain but low; best estimates suggest a rate of less than 2% per annum (Napier 2008).

The early detection and management of potentially malignant disorders of the lip and oral cavity that require intervention may reduce malignant transformations (though will not totally eliminate malignancy occurring), or if malignancy is detected during surveillance, there is some evidence that appropriate treatment may improve survival rates (Brocklehurst 2010; van der Waal 2009; Warnakulasuriya 2007). However, Lodi 2006 investigated the effectiveness of different management strategies for oral leukoplakia and found a lack of evidence for surgical interventions, including laser therapy and cryotherapy. Vitamin A, retinoids, beta carotene, carotenoids, bleomycin, mixed tea and ketorolac have also been tried, but none of the treatments tested showed a benefit when compared with the placebo. Lodi et al concluded that there was no evidence of effective treatment in preventing the malignant transformation of leukoplakia (Lodi 2006). There is also debate in the literature about the impact "field change" (Holmstrup 2006; Holmstrup 2009). Holmstrup argues that even if early lesions are surgically removed, the risk of malignant change can remain as a result of the lesion representing a small area of a wider field of damaged mucosa (Holmstrup 2006; Holmstrup 2009).

Technologies to treat and manage oral cancer have progressed substantially, as shown by systematic reviews of randomised controlled trials of interventions (e.g. Bessell 2011; Furness 2011; Glenny 2010). Once frank malignancy has been detected, the traditional management of oral cancer is through surgery and radiotherapy. More recently, systemic chemotherapy has been included as part of the treatment regimen before or during radiotherapy. Surgery for the treatment of oral cancer is followed by exacting reconstructive surgery to restore form and function. Debilitating side effects can occur as a result of both the surgery and radiotherapy and chemotherapy, adversely affecting an individual's quality of life. The five‐year survival following diagnosis has remained at around 50% for the past 30 years in most countries (Parkin 2001; Warnakulasuriya 2009). Recent US data show a statistically significant improvement among patients treated for oral squamous cell carcinoma from 55% in 1984 to 1986 to 60% in the 1996 to 2003 time frame (Jemal 2008). This is in marked contrast to the improved survival rates in many other cancers, such as those of the breast and the colon (Cancer Research UK), but may be explained at least in some part by the fact that oral cancer is more often diagnosed at a late stage of the disease, when prognosis is poorer and the risks of significant morbidity and mortality are substantially higher (Rogers 2009; Rusthoven 2010).

Index test(s)

Reviews of primary studies of diagnostic test accuracy in this area have identified a number of index tests which could be used as adjuncts to the conventional visual and tactile oral examination (COE) to improve earlier detection of lip and oral cavity cancer and PMD (Fedele 2009; Leston 2010; Lingen 2008; Patton 2008; Rethman 2010). These include:

  • vital rinsing or staining (Toluidine blue, Tolonium chloride)

  • light‐based detection (such as ViziLite and ViziLite Plus, Microlux/DL, VELscope, Orascoptic DK, Identafi 3000)

  • mouth self examination

  • blood and saliva analyses.

Vital rinsing and oral cytology are long available adjuncts to a conventional oral examination (Leston 2010; Lingen 2008). Other tests such as light‐based detection systems have become commercially available only more recently. Mouth self examination is a simple technique with world wide application. Blood analysis and saliva analysis are more novel tests at an early stage of evaluation. It is worth noting that for an index test to obtain the US Food and Drug Administration (FDA) 'clearance' (the term reserved for non‐invasive devices) a demonstration of efficacy is not required, only a demonstration of safety.

Of the index tests listed above, vital rinsing, light‐based detection, mouth self examination and blood and saliva analyses could be used as screening adjuncts to the COE (Additional Table 1). Where access to general dental practitioners or general medical practitioners is limited, either as a result of geographical location or barriers to uptake of healthcare provision, screening using the index tests listed above could, in principle, be undertaken by trained healthcare workers; all have the potential to be used as adjuncts to the COE by healthcare workers or clinicians undertaking screening of the general population. Adding any one of the proposed index tests to the COE, the tests could have a triage role in detecting lesions of uncertain significance with referral where appropriate. For instance, traumatic keratoses are common, and referring each patient with a white patch to a specialist to undergo a scalpel biopsy is excessive, and incurs increased financial cost and patient worry, and potentially delays the more urgent referrals being seen. A non‐invasive index test or combination of tests adjunctive to the COE that provides a frontline clinician with a high degree of accuracy would not only reduce the number of patients with benign disease being referred, but could avoid the need for invasive biopsy in patients testing negative.

Open in table viewer
Table 1. Screening tests for PMDs and oral cavity cancer

Test

Characteristics

Classification of response

Other information

Conventional oral examination (COE)

A standard visual and tactile examination of the oral mucosa under normal (incandescent) light

The presence of an oral mucosal abnormality is classified as a positive test result; the absence of any oral mucosal abnormalities is classified as a negative test result

Traditionally been used as an oral cancer screen, but its utility is debated (Lingen 2008)

Advantages: quick and easy once trained, minimally invasive
Disadvantages: oral mucosal abnormalities are not necessarily clinically or biologically malignant; only a small percentage of leukoplakias are progressive or become malignant; COE cannot distinguish between those that are or are not; some pre‐cancerous lesions may exist within oral mucosa that appears clinically normal by COE alone (Lingen 2008)

Vital rinsing (e.g. Toluidine blue, Tolonium chloride)

Vital rinsing refers to the use of dyes such as Toluidine blue or Tolonium chloride to stain oral mucosa tissues for PMD or malignancy (Leston 2010; Lingen 2008; Patton 2008). The procedure is as follows

  • Pre‐rinse with acetic acid

  • Rinse with water

  • Apply Toluidine blue

  • Post‐rinse with acetic acid

  • Rinse with water

  • Observe mucosa to check for staining

The result of the test is classified as positive if tissue is stained and negative if no tissue is stained, or equivocal if no definitive result can be obtained

Advantages: ability to define areas that could be malignant or abnormal but cannot be seen; assess the extent of the PMD for excision
Disadvantages: benign inflammatory lesions subject to stain; failure of some cancerous lesions to stain; variation in test performance depending on how thorough the test procedures are followed; contraindicated in those who are known to be allergic to iodine

Light‐based detection (e.g. ViziLite and ViziLite plus, Microlux/DL, VELscope, Identafi 3000)

Light‐based systems to identify pre‐malignant and malignant lesions and to highlight their presence through tissue autofluorescence or reflectance (Leston 2010; Lingen 2008; Patton 2008). E.g. using ViziLite Plus or Microlux/DL, the procedure is as follows (Lingen 2008)

  • Pre‐rinse with acetic acid

  • Use blue‐light source to visually assess the oral cavity

ViziLite Plus also provides a tolonium chloride solution (TBlue) to aid in the marking of the lesion for biopsy once the light source is removed

The result of the test is classed as negative if the appearance of the epithelium is lightly bluish white and positive if the appearance of the epithelium is distinctly white (acetowhite)

For systems based on autofluorescence the result of the test is classed as negative if fluorescence is maintained and positive if fluorescence is lost

Advantages: simple to use; non‐invasive; do not require consumable re‐agents; provide real time results; can be performed by a wide range of operators after a short training period
Disadvantages: the necessity of a dark environment; high initial set up (for VELscope) or recurrent costs (for ViziLite in low‐income countries); lack of permanent record unless photographed; inability to objectively measure visualisation results

Mouth self examination

Self examination, usually in the home setting in accordance with instructional material

Usually the presence of any lesion

Advantages: simple to carry out and low cost. Can be carried out in an individual's own home

Disadvantages: target condition is the presence or absence of oral lesions. Cannot differentiate between potentially malignant and non‐malignant lesions

Blood and saliva analyses

These novel technologies are at an early stage of development and evaluation
Analysis of blood or saliva samples which tests for the presence of bio‐markers of PMD and oral cancer (Brinkmann 2011; Lee 2009; Li 2006)

Cut‐off probabilities vary widely and are dependent on the individual bio‐marker or combination of bio‐markers examined
Molecular markers for diagnosis include changes in cellular DNA, altered mRNA transcripts, altered protein levels

Advantages: non‐invasive (saliva tests) or minimally invasive (blood tests)
Disadvantages: there is a tendency for the estimated diagnostic accuracy of new health technologies to decline over time as evidence from independent evaluations accumulate (Wyatt 1995). This bias, which can be substantial, has been demonstrated in other domains, e.g. acute abdominal pain (Liu 2006) and clinical decision support systems (Garg 2005). Promising bio‐marker tests in several clinical areas were eventually been shown to be disappointing (Buchen 2011). It remains to be seen whether this is the case with oral cancer and PMDs

PMDs = potentially malignant disorders

A companion Cochrane systematic review evaluates the diagnostic accuracy of index tests in individuals presenting with clinically evident lesions (Liu 2012).

Clinical pathway

The standard process of screening apparently healthy adults for PMD and cancer of the lip or oral cavity is by a systematic and thorough visual inspection of the oral mucosa and palpation of the neck under normal (incandescent) light for lymphadenopathy. In most instances this is carried out by a frontline clinician opportunistically as part of a routine recall examination by a dentist. This conventional visual and tactile oral examination (COE) used can be conducted with the minimum of effort and distress to the individual (Additional Table 1). Screening can be carried out opportunistically, for instance when an individual presents to their dentist for a check‐up, or as part of an organised screening programme. The COE is usually followed by referral for further investigation if this is deemed necessary. The form that further investigation takes is variable nationally and internationally; it could be an examination/biopsy by a specialist in oral medicine or oral surgery at a secondary or tertiary clinic.

Rationale

Oral cancer is a significant global health problem with increasing incidence and mortality rates (Ferlay 2010; Warnakulasuriya 2009). Cancer of the lip or oral cavity is a relatively common cancer worldwide, with an estimated 263,000 new cases and 127,000 deaths in 2008, and an increasing incidence in recent years (Ferlay 2010). There is wide geographic variation in disease incidence and mortality, with almost double the incidence in low‐income and middle‐income countries as in high‐income countries, and a threefold increase in mortality. Tobacco use, alcohol consumption, betel quid chewing and low socio‐economic status have traditionally been thought to be the most important risk factors of oral cancer (Conway 2008; Faggiano 1997; La Vecchia 1997; Macfarlane 1995; Ogden 2005). Men have had a higher incidence of oral cancer than women (Ferlay 2010), but this disparity can be explained by men having a higher exposure to the above risk factors (Freedman 2007). The gender difference has narrowed in recent decades from a ratio of five males to one female diagnosed with oral cancers in the 1960s to less than two to one in 2008 (Ferlay 2010). Although traditionally the risk of oral cancer increases with age, the incidence among younger adults has been increasing in the European Union and the United States (Warnakulasuriya 2009). In the United Kingdom, one in 10 cases are now below the age of 45 years (Cancer Research UK).The five‐year survival rate depends on multiple factors, including patient and tumour characteristics, treatment received and stage at diagnosis. Oral cancer incidence and mortality can be reduced using three approaches: (i) primary prevention, (ii) secondary prevention, screening and early detection, and (iii) improved treatment (Scully 2000b).

Successful early detection of oral cavity cancer or PMD is highly dependent on whether individuals with the disease present for a screen. Early detection relies on the awareness and motivation of the clinician or patient in identifying a suspicious lesion or symptom while it is still at an early stage. Whilst many organisations advocate cancer‐related checks, including the American Cancer Society for individuals of all risk groups (American Cancer Society 1992) and the US Preventive Health Services Task Force for high risk individuals (US Preventive Services Task Force in discussion), there is much global variation in the provision and promotion of routine oral cancer examinations. Currently, no national population‐based screening programmes for oral cancer have been implemented in the high‐income countries, although opportunistic screening has been advocated (Brocklehurst 2013). Consequently, individuals will often present for examination at a later stage of the disease, when the risks of significant morbidity and mortality are substantially higher. The British Columbia Oral Cancer Prevention Program (BC OCPP) is addressing this challenge in several ways: by linking community dental practices and referral centres, by creating partnerships between scientists and clinicians that already have resulted in new technologies to enhance early diagnosis, by involving a broad range of stakeholders to ensure population‐based screening and by engaging in provincial, national and international outreach (Rosin 2006). Brocklehurst et al's systematic review identified only one randomised controlled trial using visual examination with a follow‐up period of 15 years which was carried out in India. The authors of the review concluded that opportunistic screening of high risk groups may potentially improve outcomes, although the risk of bias of the included study was high (Brocklehurst 2013).

There is some debate in the literature on anticipated differences in diagnostic accuracy of prospective population‐based invitational screening programmes and a more opportunistic approach (when patients attend their general (dental) practitioner for routine examination or for treatment). In Downer et al's systematic review of test performance in screening for oral cancer and PMD, only prospective investigations of population screening with specified reference standards were included. The pooled sensitivities and specificities were 0.85 (95% CI 0.730 to 0.919) and 0.97 (95% CI 0.930 to 0.982) respectively (Downer 2004). An opportunistic approach that focuses on high risk groups is also possible (McGurk 2010; Sankaranarayanan 1997). A simulation study which used neural network and machine learning techniques suggested opportunistic screening aimed at high risk groups may be both effective and cost‐effective (Speight 2006). However, many individuals with risk factors may not attend the dentist and are therefore not amenable to an opportunistic approach (Netuveli 2006; Yusof 2006).

Reviews assessing the test accuracy of a conventional oral examination as a population screening tool (e.g. Downer 2004; Moles 2002) have highlighted methodological flaws in the primary diagnostic test accuracy studies, although explicit methodological quality assessment of these studies using a validated and widely used checklist was not undertaken.

In this review we have identified screening tests for PMD and cancer of the lip or oral cavity to evaluate the diagnostic accuracy of the COE and the accuracy of the other index tests (Additional Table 1) used as adjuncts to the oral examination in asymptomatic adults. The index tests proposed for evaluation in this review are suitable for use in the community or as part of a dental examination in a general dental practitioners' office. The review includes both prospective investigations of organised screening programmes and prospective opportunistic screening. It is important that this review considered both as individuals screened opportunistically are self selecting and may not be representative of the population of interest. In either scenario, screening may be carried out by dental professionals or healthcare workers. The purpose of the screening is to identify the presence or absence of PMD which require referral to secondary care for definitive diagnosis and possibly treatment. The proposed index tests cannot confirm whether a PMD is cancerous before deciding on referral to secondary care; biopsy with histopathology is currently the only confirmatory method of oral cancer diagnosis.

The Cochrane Oral Health Group has undertaken a number of intervention reviews in the field of treatment of oral and oropharyngeal cancers (Bessell 2011; Furness 2011; Glenny 2010) and screening programmes for the early detection and prevention of oral cancer (Brocklehurst 2013). This screening test accuracy review complements the intervention reviews.

Objectives

To estimate the diagnostic accuracy of conventional oral examination (COE), vital rinsing, light‐based detection, biomarkers and mouth self examination (MSE), used singly or in combination, for the early detection of potentially malignant disorders (PMD) or cancer of the lip and oral cavity in apparently healthy adults.

Secondary objectives

To estimate the accuracy of the different index tests with COE, when compared with each other.

Methods

Criteria for considering studies for this review

Types of studies

We included studies of cohorts of apparently healthy adults which evaluated the diagnostic accuracy of the conventional oral examination (COE) used singly or in combination with an index test listed in Additional Table 1, in screening for potentially malignant disorders (PMD) and cancer of the lip or oral cavity. These included cross‐sectional studies (or consecutive series) and randomised controlled trials (RCTs) of test accuracy. We excluded case series and case‐control studies which could lead to inflated estimates of prevalence and test accuracy (Whiting 2004). We also excluded studies reported in abstract form alone, uncontrolled reports and randomised controlled trials of the effectiveness of screening programmes (intervention studies). Where randomised or paired comparative designs were available these were included in the review and analysed separately. Only studies reporting data for true positives, false positives, true negatives and false negatives at an individual level (as opposed to a lesion level) for each test were included. No language restrictions were imposed.

Participants

Apparently healthy adults not reporting symptoms attending an organised screening programme or screened during attendance at a dental or other clinical practice examination. We did not exclude specific subgroups of patients in this review, such as high risk cohorts or cohorts with previous suspicions on PMD or cancer of the lip or oral cavity.

Index tests

The COE used as a screen, alone or in combination with any other screening tests previously listed (Additional Table 1). The COE (conventional testing test) was the initial point of the screen, which all individuals received. The index test was used as an adjunct following the COE irrespective of whether oral cancer or PMD was suspected by the COE alone (i.e. a positive test result is a positive result from either the COE or the index test or both).

Target conditions

Following the consensus views of the expert working group of the World Health Organization (WHO) Collaborating Center for Oral Cancer and Precancer, the target conditions of the lip or oral cavity of interest are noted as.

‐ Carcinoma of the lip or oral cavity.

‐ Potentially malignant disorders.

  • Leukoplakia.

  • Erythroplakia.

  • Lichen planus.

  • Lupus erythematosus.

  • Submucous fibrosis.

  • Actinic keratosis.

  • Hereditary disorders such as dyskeratosis congenita or epidermolysis bullosa.

Reference standards

The reference standard was examination and clinical evaluation by a physician with specialist knowledge or training, working to the current diagnostic guidelines of their locality. At the most experienced level this would be an oral and maxillofacial pathologist or oral medicine specialist possibly utilising biopsy with histology where considered clinically appropriate. More commonly this was expected to include general dental physicians in receipt of supplementary training in the detection and identification of PMD and carcinoma of the lip or oral cavity or other physicians with dedicated training. We included studies where confirmation of individuals screened negative by the index test was done by extended follow‐up. To be eligible for inclusion in the review, at least a proportion of the screened negatives were required to be verified. Where reported, for each study we noted the diagnostic protocol, guidelines or registry used for follow‐up in the Characteristics of included studies table. Studies with confirmatory biopsy of individuals who screened negative by the index test were eligible for inclusion although ethically questionable (Downer 2004).

Search methods for identification of studies

Electronic searches

We searched the following electronic databases.

  • The Cochrane Diagnostic Test Accuracy Studies Register (to 30 April 2013).

  • The Cochrane Oral Health Group's Trials Register (to 30 April 2013).

  • MEDLINE via OVID (1946 to 30 April 2013).

  • EMBASE via OVID (1980 to 30 April 2013).

  • MEDION (2003 to 30 April 2013).

See Appendix 1 for the search strategies used. There were no restrictrions on language in the searches of the electronic databases.

We constructed the electronic search strategy in accordance with this review and that of a companion Cochrane diagnostic test accuracy review (Liu 2012) which was undertaken concurrently by the same review team.

Searching other resources

We sought to locate further studies through citation searches and reference lists of key articles, and by contacting authors of identified articles to request information of any unpublished or ongoing studies.

Data collection and analysis

Selection of studies

We did not limit the screening of the search results by publication language or status. Non‐English articles were translated. Titles and abstracts of all articles identified from the searches were independently assessed by two review authors. For articles appearing to meet the inclusion criteria, or where a clear decision was unable to be made from scanning the title and abstract alone, full reports were obtained. Where disagreements occurred, these were resolved by discussion with the review team.

Data extraction and management

Two review authors independently extracted data using a piloted data collection form. Discrepancies were resolved through discussion with the review team. Study authors were contacted to obtain relevant missing data if these were not available in the printed report.

From each study, we extracted the following data.

  • Sample characteristics (age, sex, socio‐economic status, risk factors (e.g. human papillomavirus (HPV) status, prevalence of tobacco use and alcohol consumption), number of participants/lesions).

  • Setting (country, disease prevalence, type of screening).

  • Type of index test(s) (category, name, positivity threshold).

  • Study information (design, reference standard, case definition, training and calibration of personnel).

  • Study results (true positive, true negative, false positive, false negative, any equivocal results, withdrawal or exclusions).

This information was documented in the Characteristics of included studies table for each study.

Assessment of methodological quality

We used the QUADAS‐2 tool (Whiting 2011) to assess the quality of the included studies over four key domains: patient selection, index test, reference standard and flow and timing of participants through the study. The QUADAS‐2 tool was tailored specifically for this review (Additional Table 2). Review specific guidance was used to facilitate documentation of the pertinent descriptive information contained in the studies. Customised instructions to aid judgement of the signalling questions were given (following Patton 2008). Two core signalling questions were removed: 'Was a case‐control design avoided?' (this study design was excluded from the review); 'Did all patients receive a reference standard?' (this was a criterion for inclusion). Two additional signalling items relating to commercial funding and multiple index tests were added to the core signalling questions. Responses to the signalling questions, risk of bias and applicability judgements are presented in the Characteristics of included studies tables and summarised graphically (Figure 1).


Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.

Open in table viewer
Table 2. Indicators for the assessment of methodological quality

Domain

Patient selection  

Index test 

Reference standard

Flow and timing 

Description

Describe methods of patient selection. Describe included patients (characteristics, prior testing, presentation, intended use of index test and setting)

Describe the index test and how it was conducted and interpreted. Describe the sequence of tests, any training or calibration of assessors (levels of agreement should be reported. Where this is measured by the kappa statistic*, acceptable values range from 0.61 (moderate agreement) to 1.00 (almost perfect agreement) (Landis 1977)), any procedures taken to ensure blinding of examiners, post‐hoc or a priori threshold specification, any conflict of interest or commercial funding

*This statistic is a measure of inter‐rater agreement of observations measured at a categorical level

Describe the reference standard and how it was conducted and interpreted. Any measures taken to ensure assessors were blinded to the results of the index tests should be documented, along with the sequence of reference and index tests

Describe the characteristics and proportion of patients who did not receive the index test(s) and/or reference standard, who received a reference standard other than examination and clinical evaluation by a specialist physician, or who were excluded from the 2 x 2 table (refer to flow diagram). Describe the time interval and any interventions between index test(s) and reference standard. The length of time between the index test and reference standard should be short in the majority of cases. If the period elapsed between initial screening and reference standard (examination and clinical evaluation) is greater than 6 weeks then this was considered an unacceptable delay

Signalling questions

(Yes/No/Unclear)

Was a consecutive or random sample of patients enrolled?

Classify as Yes if consecutive patients or a random sample of individuals were recruited

Classify as No if non‐consecutive patients or a non‐random sample of individuals were recruited

Classify as Unclear if patient selection was not clearly described

Were the index test results interpreted without knowledge of the results of the reference standard?

Classify as Yes if interpreters of index test results clearly do not know results of reference standard
Classify as No if interpreters of index test results clearly know results of reference standard
Classify as Unclear if study did not provide any information on whether interpreters of index tests were blinded to reference standard

Is the reference standard likely to correctly classify the target condition? The reference standard is an examination and clinical evaluation by a physician with specialist knowledge which if stated as such should be acceptable. Ideally this should be undertaken independently by more than one specialist. Alternatively an acceptable reference standard is extended follow‐up

Classify as Yes if the test is examination and clinical evaluation by a physician with specialist knowledge and/or training, or a non‐specialist with dedicated training to an acceptable standard

Classify as No if the test result is examination and clinical evaluation by a non‐specialist physician in the absence of dedicated training

Classify as Unclear if the study does not report the experience and training of those carrying out the reference standard

Was there an appropriate time interval between the index test(s) and reference standard?

Classify as Yes if the delay between the index test(s) and reference standard is considered acceptable for the majority of participants

Classify as No if the delay between the index test(s) and reference standard is considered unacceptable for the majority of participants

Classify as Unclear if the delay between the index test(s) and reference standard is not explicitly stated

Did the study avoid inappropriate exclusions?

Classify as Yes if the sample consisted of apparently healthy individuals

Classify as No if only individuals with existing PMDs were recruited

Classify as Unclear if exclusions were not clearly described

If a threshold was used, was it pre‐specified?

Classify as Yes if the threshold was pre‐specified

Classify as No if the threshold was not pre‐specified

Classify as Unclear if it is unclear whether the threshold was pre‐specified

Were the reference standard results interpreted without knowledge of the results of the index test?

Classify as Yes if personnel clearly do not know index test results when performing the examination and clinical evaluation or evaluating follow‐up data

Classify as No if personnel clearly know index test results when performing the examination and clinical evaluation or evaluating follow‐up data

Classify as Unclear if study did not provide any information on whether personnel were blinded to the index test results

Did all patients receive the same reference standard?

Classify as Yes if the same reference standard was used in all participants

Classify as No if the same reference standard was not used in all participants

Classify as Unclear if it is unclear whether different reference standards were used

Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?

Classify as Yes if index test results were interpreted without knowledge

Classify as No if the index test results were interpreted with knowledge

Classify as Unclear if it is unclear whether the results of the second index test were interpreted without knowledge of the results of the first index test

Were all patients included in the analysis?

Classify as Yes if all patients were included in the analysis

Classify as No if only some patients were included in the analysis

Classify as Unclear if it is unclear whether all patients were included in the analysis

Were any conflicts of interest stated?

Classify as Yes if the study declared no conflict of interest

Classify as No if the study declared a conflict of interest

Classify as Unclear if there was no information on conflict of interest

Risk of bias: High/Low/Unclear

Could the selection of individuals have introduced bias?

Could the conduct or interpretation of the index test have introduced bias?      

Could the reference standard, its conduct, or its interpretation have introduced bias?

Could the patient flow have introduced bias? 

Concerns regarding applicability: High/Low/Unclear

Are there concerns that the included individuals do not match the review question?

Are there concerns that the index test, its conduct, or interpretation differ from the review question?

Are there concerns that the target condition as defined by the reference standard does not match the review question?

Assessment of overall risk of bias and applicability

An overall judgement of risk of bias and applicability to the review (high, low or unclear) was undertaken based on the judgements given to each domain. If the answers to all signalling questions within a domain were judged as yes indicating low risk of bias, then the domain was judged to be at low risk of bias. A no response to a signalling question was taken as an indication of the potential for risk of bias and the authors considered this risk within the context of the study before making a decision on whether the study was a high/low risk of bias for that domain

 

If any of the 4 domains was judged to be at high risk of bias then the study was judged to have a high risk of bias overall. If any of the 3 applicability domains was judged to be at high concern regarding applicability then the study was judged to be of high concern regarding applicability overall

PMDs = potentially malignant disorders.

Statistical analysis and data synthesis

Data for the true positive, true negative, false positive and false negative values for each test in each study was entered into Review Manager (RevMan 2012). For each index test, estimates of the diagnostic accuracy were expressed as sensitivity and specificity with 95% confidence intervals. This information was displayed as coupled forest plots (Figure 2), and plotted in receiver operating characteristic (ROC) space.


Forest plot of 1. Conventional oral examination.

Forest plot of 1. Conventional oral examination.

For the primary analysis we had intended to undertake a meta‐analysis to combine the results of the studies for each index test. However, the substantial diversity of characteristics of the included studies meant that this was not appropriate.

We were only able to include one study (Su 2010) that directly evaluated the comparative accuracy of more than one index test with the reference standard, i.e. randomising individuals to different index tests. This study was reported separately.

Investigations of heterogeneity

We planned to explore possible sources of heterogeneity through meta‐regression including the following covariates: characteristics of the study sample (prevalence of carcinoma or PMD in the study (> 50% prevalence); inclusion of HPV + adults; tobacco users/high alcohol consumption); target condition (oral squamous cell carcinoma alone or oral squamous cell carcinoma and PMD); aspects of study design (prospective organised or opportunistic); type of reference standard (examination and clinical evaluation by physician with specialist knowledge or extended follow‐up) and operator (dental or general medical practice professionals or other healthcare workers). Given the diversity of the studies this was not undertaken.

Sensitivity analyses

No sensitivity analyses were undertaken.

Assessment of reporting bias

Tests for reporting bias were not conducted because current tests are misleading when applied to systematic reviews of diagnostic test accuracy (Leeflang 2008).

Results

Results of the search

After de‐duplication the initial electronic search conducted in April 2013 retrieved 4220 records.These were screened independently and in duplicate according to eligibility criteria; 33 records were considered potentially eligible for inclusion. Of this number, 17 records with 13 studies were included in the review. The main reasons for exclusion were ineligible study design or no reference standard data for individuals screened negative. Ten studies reported on the diagnostic accuracy of conventional oral examination (COE) alone; two studies reported on mouth self examination and one randomised controlled trial directly compared COE alone with COE plus a vital rinsing agent (Toluidine blue). No diagnostic test accuracy studies meeting the review inclusion criteria evaluating any other pre‐specified index test were found.

Four studies are still awaiting classification and one is ongoing.

Methodological quality of included studies

The assessment of methodological quality is presented graphically in Figure 1.

Conventional oral examination

The nature of the screening of participants can be broadly categorized into opportunistic screening (Chang 2011; Julien 1995), organised screening programmes (Downer 1995; Jullien 1995a; Warnakulasuriya 1990; Warnakulasuriya 1991), validation as part of an organised screening programme or randomised controlled trial (Ikeda 1995; Mathew 1997; Mehta 1986) and screening as part of a routine surveillance appointment (Sweeny 2011).

The accuracy of detecting potentially malignant disorders (PMDs) and oral cavity cancer was evaluated in a variety of different settings: In Tokoname, Japan, all residents of 60 years of age were invited by mail to attend a dental screening programme at a health centre (Ikeda 1995). In Kerala, India, basic healthcare workers incorporated screening into their routine house visits (Mathew 1997; Mehta 1986) as in Sri Lanka (Warnakulasuriya 1990; Warnakulasuriya 1991). In the United Kingdom, the feasibility and accuracy of workplace screening was evaluated in one study (Downer 1995), of screening patients at a medical practice in another (Julien 1995a), and opportunistically in patients attending a dental hospital for an out‐patient appointment (Julien 1995). In Taiwan, screening was offered to individuals attending a tertiary referral centre (Chang 2011). In the USA, screening was part of the routine surveillance visit of patients attending an otolaryngology clinic (Sweeny 2011).

Risk of bias for the patient selection domain was low for all studies with one exception (Julien 1995). This study was judged as unclear as the method of patient selection for this opportunistic screening study was not reported. Two studies were judged to be of low concern for applicability (Julien 1995; Julien 1995a); five studies of unclear applicability as a result of not fully reporting the participant characteristics or risk factors of the study sample or both (Downer 1995; Ikeda 1995; Mathew 1997; Warnakulasuriya 1990; Warnakulasuriya 1991). Three studies were selective in their sampling, targeting a 'high risk' population. These were all male patients attending the otolaryngology or dental department (Chang 2011), previous cancer patients attending the otolaryngology clinic for a routine surveillance visit (Sweeny 2011) and individuals over 35 years of age with "tobacco habits" (Mehta 1986).

The COE index test was carried out by clinicians (general dental practitioners, community dental officers, otolaryngologists) in six studies (Chang 2011; Downer 1995; Ikeda 1995; Julien 1995; Julien 1995a; Sweeny 2011) and by health workers in the studies in India and Sri Lanka (Mathew 1997; Mehta 1986; Warnakulasuriya 1990; Warnakulasuriya 1991). The risk of bias judgements for this domain were judged to be low in nine studies. The index test was carried out prior to the reference standard and a positivity threshold for the target condition was specified a priori. One study (Sweeny 2011) was judged to be at unclear risk of bias as there was a lack of clear definition of the target condition and the positivity threshold. All studies were judged to be at low concern regarding applicability.

Four studies (Downer 1995; Ikeda 1995; Julien 1995; Julien 1995a) were judged to be at low risk of bias for the reference standard domain. In these studies the reference standard was carried out by experienced specialist physicians and the results were interpreted without knowledge of the results of the index tests. For the remaining studies it was unclear whether the reference standard personnel were unaware of the results of the index test when interpreting the reference standard. One study (Sweeny 2011) was judged to be at unclear concern regarding applicability as the target definition was recurrence of head and neck cancer; all other studies were judged as low concern.

For the flow and timing domain, two studies were judged to be at high risk of bias as a result of attrition following positive screen (37.5% of screen positive) and differential verification (Chang 2011) and time from screen positive to receiving reference standard (Warnakulasuriya 1990). Two studies were judged to be at unclear risk of bias (Sweeny 2011; Warnakulasuriya 1991), the remainder at low risk of bias (Downer 1995; Ikeda 1995; Julien 1995; Julien 1995a; Mathew 1997; Mehta 1986).

Two studies (Chang 2011; Warnakulasuriya 1990) were judged as being at overall high risk of bias resulting from the flow and timing domain; three studies were at overall low risk of bias (Downer 1995; Ikeda 1995; Julien 1995a). For the remaining five studies an unclear risk of bias for at least one of the four domains resulted in an overall risk of bias judgement of unclear (Julien 1995; Mathew 1997; Mehta 1986; Sweeny 2011; Warnakulasuriya 1991).

Three studies (Chang 2011; Mehta 1986; Sweeny 2011) were judged as having high overall concerns regarding applicability, arising from patient selection of high‐risk groups. Two studies (Julien 1995; Julien 1995a) were judged as having low overall concerns regarding applicability. For the remaining five studies an unclear concern regarding applicability in the patient selection domain resulted in an overall applicability judgement of unclear (Downer 1995; Ikeda 1995; Mathew 1997; Warnakulasuriya 1990; Warnakulasuriya 1991).

Mouth self examination

Two studies (Elango 2011; Scott 2010) evaluated mouth self examination as part of an organised screening programme. Risk of bias for patient selection was judged to be low for both studies. Concerns regarding applicability for this domain were judged as low for one study (Elango 2011) and high for the other (Scott 2010). In this study, the study sample consisted of participants older than 45 years of age with tobacco habits.

We gave a judgement of unclear risk of bias to both studies for the index test domain, as it was not reported whether the results of the index test were interpreted without knowledge of the reference test. We gave a judgement of low concerns regarding applicability for this domain.

The risk of bias judgement for the reference standard domain was low for one study (Scott 2010), being evaluated by a dentist with training and the reference test being carried out prior to the index test. We judged the other study (Elango 2011) as unclear risk of bias as it was unclear whether the conduct of the reference standard would be likely to correctly classify the condition and also whether the reference standards were interpreted without knowledge of the index test. The manuscript states that "the competence of the health workers [reference standard] was confirmed by a trained oral cancer specialist" but not reported. Consequently the judgements of concerns regarding applicability for this domain were low (Scott 2010) and unclear (Elango 2011).

Risk of bias was judged to be low for the flow and timing domain (Scott 2010) and high (Elango 2011) due to a significant number of withdrawals and exclusions for non‐compliance.

The overall risk of bias was judged to be unclear (Scott 2010) and high (Elango 2011). Concern regarding the overall applicability of the studies to the review question was high (Scott 2010) due to patient selection and unclear (Elango 2011) due to the reference standard being carried out by general health workers specifically trained for the study rather than a specialist or experienced clinician.

Conventional oral examination compared to conventional oral examination plus vital rinsing (Toluidine blue)

We judged this study (Su 2010) which directly compared two index tests in a randomised controlled trial to be at low risk of bias for patient selection and index test. Concerns regarding applicability were judged as high for the patient selection domain as individuals who "lacked oral habits" such as smoking or betel quid chewing were eligible for the trial. We judged that there were low concerns regarding applicability of the index tests. We judged the study to be at unclear risk of bias whether this was interpreted without knowledge of the results of the index tests is unclear. There was low concern regarding applicability of the reference standard. Risk of bias for the flow and timing domain was judged as low.

Overall risk of bias for this study was judged as unclear, based on the interpretation of the reference standard. Concern regarding the overall applicability of the study was high, arising from patient selection.

Findings

Conventional oral examination

Diagnostic accuracy of COE by a non‐specialist compared to a reference standard was evaluated in 10 studies including 25,568 participants in total, where the target condition was PMD and cancer of the lip or oral cavity. Pooling of the studies was considered inappropriate due to the diversity of study and participant characteristics. The prevalence of PMD or oral cavity cancer in the diagnostic test accuracy study samples ranged from 1.4% to 50.9%. For the eight studies with prevalence of 10% or lower, the sensitivity estimates were highly variable, and ranged from 0.50 (95% confidence interval (CI) 0.07 to 0.93) to 0.99 (95% CI 0.97 to 1.00) with uniform specificity estimates around 0.98 (95% CI 0.97 to 1.00). Estimates of sensitivity and specificity were 0.95 (95% CI 0.92 to 0.97) and 0.81 (95% CI 0.79 to 0.83) for one study with prevalence of 21.6% and 0.97 (95% CI 0.96 to 0.98) and 0.75 (95% CI 0.73 to 0.77) for one study with prevalence of 51%.

Study prevalence is shown in the coupled forest plot (Figure 2) along with estimates of sensitivity and specificity and also plotted in ROC space (Figure 3). All studies for this index test used a common threshold, the presence of PMDs and oral cancer.


Summary ROC plot of 1. Conventional oral examination.

Summary ROC plot of 1. Conventional oral examination.

A summary is given in the summary of findings Table 1.

Mouth self examination

Two studies (Elango 2011; Scott 2010) provided data from 34,819 individuals. The prevalence was very different in the two studies: 0.6% and 22.6% respectively. Values of sensitivity were low in both studies (0.18 (95% CI 0.13 to 0.24) (Elango 2011) and 0.33 (95% CI 0.10 to 0.65) (Scott 2010)) but values of specificity were higher (1.00 (95% CI 1.00 to 1.00) (Elango 2011) and 0.54 (95% CI 0.37 to 0.69) (Scott 2010)) (Figure 4; Figure 5).


Forest plot of 2. Mouth self examination.

Forest plot of 2. Mouth self examination.


Summary ROC plot of 2. Mouth self examination.

Summary ROC plot of 2. Mouth self examination.

A summary is given in the summary of findings Table 2.

Conventional oral examination compared to conventional oral examination plus vital rinsing (Toluidine blue)

We included one randomised controlled trial which directly compared the performance of COE alone (3895 individuals) with COE plus vital staining (4080 individuals) with biopsy and long‐term follow‐up through a National Cancer Registry (Su 2010). This study found a higher detection rate for oral cavity cancer in the conventional oral examination plus vital rinsing adjunct trial arm.The detection rate of oral pre‐malignant lesions and malignant lesions after referral was 4.6% in the conventional oral examination plus vital rinsing arm; 4.4% in conventional oral examination alone. This resulted in a ratio of 1.05 (95% CI 0.74 to 1.41); an incidence rate of oral cancer (x10‐5) of 28 compared to 35.4 and relative incidence rate of 0.79 (95%CI 0.24 to 1.23). However, the initial screen positive rate was higher in the vital rinsing arm of the trial (9.5% and 8.3%).

When we consider the trial arms independently, the estimates of sensitivity and specificity for the target condition of oral cancer in the trial arm of COE alone were 0.50 (95% CI 0.12 to 0.88) and 0.92 (0.91 to 0.93) with a prevalence of 0.15%; the corresponding values for the COE with vital rinsing adjunct were 0.40 (95% CI 0.05 to 0.85) and 0.91 (0.90 to 0.91) with a prevalence of 0.13%.

A summary is given in the summary of findings Table 3.

Discussion

Summary of main results

Thirteen studies were identified for inclusion evaluating the diagnostic accuracy of conventional oral examination (COE), vital rinsing and mouth self examination. The studies were diverse in nature with substantial variations in sample prognostic risk factors, nature of screening test, the experience of personnel conducting the index test, verification of screen negative and screen positive individuals, exclusion of individuals from the analysis and large variation in incidence of disease (including register‐based studies) across included studies. Consequently, the decision was taken that a meta‐analysis of the included studies by index test was inappropriate. This is in contrast to some previously published systematic reviews (e.g. Downer 2004; Moles 2002).

Taken as a body of evidence, the overall quality of the studies was variable both within and between index tests with only one study (Julien 1995a) of COE being judged as overall low risk of bias and overall low concern regarding applicability (Figure 1). Many of the studies did not fully report on the characteristics and risk factors of the study sample, particularly important when assessing the applicability of the results. In five studies the participants could be considered as 'high risk' individuals and consequently their findings are of concern to the applicability of the review question.

Prevalence of potentially malignant disorders (PMD) or malignancy in the diagnostic test accuracy study samples ranged from 1.4% to 50.9% over the different index tests. Estimates should be interpreted with respect to the diagnostic test accuracy study prevalence levels. A low prevalence of the target condition effectively results in a lower sample size for diseased participants and for the calculation of sensitivity. For COE, sensitivity estimates were highly variable for study level prevalence analogous to those in the population, and ranged from 0.50 (95% confidence interval (CI) 0.07 to 0.93) to 0.99 (95% CI 0.97 to 1.00). The lower specificity values observed in the two studies where prevalence was significantly higher than would normally be observed (20% and 50%) the comparably lower specificity estimates can be explained at least in part by the higher prevalence. The variation in prevalence is reflective of the flow and timing of participants through the studies, particularly the process of investigation which was quite different from the flow and timing of the remaining included studies. All screened positive participants were offered the reference standard and all participants who attended the referral centre for subsequent verification received the reference standard. A random sample of participants screened negative received differential verification by the project dentist (diagnostic test accuracy evaluation samples of 2193 screen positive and 1350 screen negative (Warnakulasuriya 1991) and 660 screen positive 1212 and screen negative (Warnakulasuriya 1990)). For the two studies of mouth self examination, sensitivity values were 0.18 (95% CI 0.13 to 0.24) and 0.33 (95% CI 0.10 to 0.65) for mouth self examination. The one study that directly compared COE with COE plus vital rinsing in a randomised controlled trial found a higher detection rate for PMD in the trial arm with the vital rinsing adjunct.

Index tests at a prevalence reported in the population (between 1% and 5%) were better at correctly classifying the absence of PMD or oral cavity cancer in disease‐free individuals than classifying the presence in diseased individuals. A false negative result from a screening programme would mean that the individuals with PMD or oral cavity cancer would not be referred for further investigations; a false positive result would mean a number of individuals without PMD or oral cavity cancer would receive a positive screening result, possibly resulting in further excisional investigations for the patient. Whereas the false positive results could and would no doubt have financial and other resource implications following inappropriate referral, the false negative results indicate that people with PMD or oral cavity cancer will be missed, possibly to be diagnosed at a later date when the disease will be more severe. For mouth self examination, the evidence is equivocal, with poor values of both sensitivity and specificity in one study. In the other study, a high value of specificity was accompanied by a very low sensitivity value. The prevalence of PMD or oral cavity cancer was high (10.6% and 22.6%) in both studies.

Strengths and weaknesses of the review

The utility of this review is limited in part by the number of included studies. A small number of potentially eligible studies were excluded on the basis that the screened negative individuals did not receive or report a reference standard. As a result, the number of false negatives could not be determined. Primary studies of more recently developed index tests were case‐control studies and consequently ineligible for inclusion through study design. We took the decision to exclude case‐control studies at the protocol stage owing to the potential for over estimation of diagnostic accuracy with this design. However, this has meant that the index tests evaluated in this review do not include those based on newer technologies. We would anticipate that those index tests showing promise at this present time, would be further evaluated with a more robust study design and therefore be eligible for inclusion in updates of this review.

Following on from previous systematic reviews in this area (e.g. Downer 2004), a further five diagnostic accuracy studies have been identified and were eligible for inclusion in this review. The main strength of this review is that it evaluated the diagnostic accuracy of conventional oral examination, vital rinsing and mouth self examination. All included studies were assessed for methodological quality using the QUADAS‐2 tool which we specifically adapted for this review. This enabled the quality of the evidence to be considered in conjunction with the diagnostic estimates.

Due to the substantial diversity in the nature of the included studies and the characteristics of the participants it was not appropriate to pool the data. Whilst this is not a weakness of the review, the failure to provide summary estimates of sensitivity and specificity, in contrast to previous systematic reviews, could be regarded as a limitation. The range of sensitivity values is likely to have been influenced by the considerable heterogeneity across the studies. In future updates should more homogeneous studies be included in the review, it would be informative to evaluate the influence of risk factors on estimates of diagnostic accuracy. However, we acknowledge that there was a lack of reported detail in a number of the included studies regarding the presence or absence of important risk factors such as smoking, betel quid chewing and alcohol consumption.

Participants were recruited into studies that had used a wide range of criteria from opportunistic screening programmes in company headquarters to mass screening programmes in South East Asia. The World Health Organization defines screening as "the application of a test or tests to people who are apparently free from the disease in question in order to distinguish between those that have the disease from those who probably do not" (Wilson 1968). A difficulty with a number of the included studies was determining how representative the screened population were given the settings for recruitment such as: a company's headquarters, hospital out‐patient departments and tertiary treatment centres. It could be argued that the latter sample represents a distinct population with a much higher risk of developing new disease due to field change and one where clinicians are likely to have a higher index of suspicion. Prevalence of the included studies was in line with what would be expected; Napier 2008 argues that most authorities agree that this lies between 1% and 5%. However, the sample prevalence was particularly high in two studies of COE (Mathew 1997 10.3%, Ikeda 1995 9.7%) and one study of mouth self examination (Scott 2010 22.6%). In two studies of COE (Warnakulasuriya 1990; Warnakulasuriya 1991) the sample prevalence calculated from the two by two table evaluating the diagnostic test accuracy was particularly high at 21.6% and 50.9%. The screen positive prevalence for these studies was more in line with population prevalence at 2.25% and 6.23%.

The definition of a positive lesion was relatively consistent across all the studies, although in some studies (e.g. Mehta 1986), a positive screen could include 'growths suggestive of oral cancer' or referable lesions that were neither oral cavity cancer or PMD. Similarly, the definition of the target condition in the index test differed from that in the reference standard in some studies. In another study there was a lack of consistent definition and use of the target condition for the index and reference tests. As a potential source of bias, it was not always clear whether the reference standard had been interpreted with or without knowledge of the index test.

The use of cancer registries or other registries as a reference standard (e.g. Chang 2011; Su 2010) can be methodologically problematic, particularly if there is a mismatch in the target condition being evaluated and the outcome documented in the registry. For example cancer registries are unlikely to hold data on PMDs that have not undergone malignant transformation, inducing a mismatch in the target condition being detected by the index test and the outcome recorded in the registry. Differential verification bias can occur if screened positive participants receive biopsy as a reference standard whilst the screened negative participants are assessed through a national cancer registry alone. If there is potential for malignant transformation within the duration of follow‐up then follow‐up through registry could be appropriate. Careful thought should be given to the target condition of the index and reference standard and whether this information will be adequately recorded in the registry.

Applicability of findings to the review question

Concerns regarding applicability arose from targeted patient selection of high risk groups for the patient selection domain, where participants in five of the 13 had either a previous history of head and neck cancer or were older tobacco smokers. For example, participants in one study conducted in a tertiary care clinic (Chang 2011) were all males; and another study recruited former head and neck cancer patients undergoing routine surveillance visits (Sweeny 2011). Studies with unclear concerns over in this domain were those that had omitted important information on patient or study characteristics which meant that we were unable to determine whether the participants and settings matched the review question. There was low concern regarding applicability for the index test domain for all studies. An unclear judgement for applicability for the reference standard was given to one study where six people had been identified from the target population to act as the reference standard (Elango 2011). Although exposed to training, it is questionable whether trained lay people could act as a reference standard and there was some concern that the index test and reference test may have been conducted simultaneously for those who had not responded initially. A second study (Sweeny 2011) was also judged to be at unclear applicability on this domain. There was low concern regarding applicability for the remaining 11 studies.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.
Figures and Tables -
Figure 1

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.

Forest plot of 1. Conventional oral examination.
Figures and Tables -
Figure 2

Forest plot of 1. Conventional oral examination.

Summary ROC plot of 1. Conventional oral examination.
Figures and Tables -
Figure 3

Summary ROC plot of 1. Conventional oral examination.

Forest plot of 2. Mouth self examination.
Figures and Tables -
Figure 4

Forest plot of 2. Mouth self examination.

Summary ROC plot of 2. Mouth self examination.
Figures and Tables -
Figure 5

Summary ROC plot of 2. Mouth self examination.

Conventional oral examination.
Figures and Tables -
Test 1

Conventional oral examination.

Mouth self examination.
Figures and Tables -
Test 2

Mouth self examination.

What is the performance of conventional oral examination for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults?

Population: Oral cavity cancer or potentially malignant disorder symptom‐free individuals screened opportunistically, or through an organised screening programme

Index test: Conventional oral examination

Target condition: Oral cavity cancer or potentially malignant disorder

Reference standard: Examination and clinical evaluation by a physician with specialist knowledge or training. Long‐term follow‐up was accepted as a suitable reference standard for those participants screened negative

Studies: Cross‐sectional (consecutive sample) (9) or validation sample in a randomised controlled trial of screening intervention (1)

No. of participants (studies)

Effect (95% CI)

Population: Individuals attending for opportunistic screening (2), organised screening programme (4), validation as part of an organised screening programme or randomised controlled trial (3), screening as part of a routine surveillance appointment (1)

Index test: Conventional oral examination

Prevalence: Range from 1.4% to 50.9%

25,568 (10)

No pooled analysis

Range:

Sensitivity 0.50 (95% CI 0.07 to 0.93) specificity 0.98 (95% CI 0.92 to 1.00)

Sensitivity 0.99 (95% CI 0.97 to 1.00) specificity 0.99 (95% CI 0.99 to 0.99)

CI = confidence interval

Figures and Tables -

What is the performance of mouth self examination for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults?

Population: Oral cavity cancer or potentially malignant disorder symptom‐free individuals screened through an organised screening programme

Index test: Mouth self examination

Target condition: Oral cavity cancer or potentially malignant disorder

Reference standard: Examination and clinical evaluation by a physician with specialist knowledge or training or trained health worker

Studies: Cross‐sectional studies (or consecutive series) (2)

No. of participants (studies)

Effect (95% CI)

Population: Individuals attending for organised screening programme (2)

Index test: Mouth self examination

Prevalence: 0.6% and 22.6%

34,819 (2)

No pooled analysis

Sensitivity 0.18 (95% CI 0.13 to 0.24) specificity 1.00 (95% CI 1.00 to 1.00)

Sensitivity 0.33 (95% CI 0.10 to 0.65) specificity 0.54 (95% CI 0.37 to 0.69)

CI = confidence interval

Figures and Tables -

What is the performance of vital rinsing (Toluidine blue) as an adjunct to conventional oral examination compared to conventional oral examination alone?

Population: Oral cavity cancer or potentially malignant disorder symptom‐free individuals with tobacco habits

Index test: Conventional oral examination plus vital rinsing (Toluidine blue) compared to conventional oral examination alone

Target condition: Oral pre‐malignant lesions and malignant lesions

Reference standard: Biopsy. Long‐term follow‐up through the National Cancer Registry

Studies: RCT (1)

No. of participants (studies)

Effect (95% CI)

Population: Individuals attending an organised screening programme

Study: Direct RCT

Index tests: Conventional oral examination plus vital rinsing (Toluidine blue) compared with conventional oral examination alone

Prevalence: 4.6% and 4.4%

7975 (1)

Detection rate of oral pre‐malignant lesions and malignant lesions after referral was 4.6% in conventional oral examination plus vital rinsing arm; 4.4% in conventional oral examination alone. Rate ratio of 1.05 (95% CI 0.74 to 1.41). Incidence rate of oral cancer (x10‐5) of 28 compared to 35.4. Relative incidence rate of 0.79 (95% CI 0.24 to 1.23)

* Initial screen positive rate higher in the vital rinsing arm (9.5% and 8.3%)

CI = confidence interval; RCT = randomised controlled trial

Figures and Tables -
Table 1. Screening tests for PMDs and oral cavity cancer

Test

Characteristics

Classification of response

Other information

Conventional oral examination (COE)

A standard visual and tactile examination of the oral mucosa under normal (incandescent) light

The presence of an oral mucosal abnormality is classified as a positive test result; the absence of any oral mucosal abnormalities is classified as a negative test result

Traditionally been used as an oral cancer screen, but its utility is debated (Lingen 2008)

Advantages: quick and easy once trained, minimally invasive
Disadvantages: oral mucosal abnormalities are not necessarily clinically or biologically malignant; only a small percentage of leukoplakias are progressive or become malignant; COE cannot distinguish between those that are or are not; some pre‐cancerous lesions may exist within oral mucosa that appears clinically normal by COE alone (Lingen 2008)

Vital rinsing (e.g. Toluidine blue, Tolonium chloride)

Vital rinsing refers to the use of dyes such as Toluidine blue or Tolonium chloride to stain oral mucosa tissues for PMD or malignancy (Leston 2010; Lingen 2008; Patton 2008). The procedure is as follows

  • Pre‐rinse with acetic acid

  • Rinse with water

  • Apply Toluidine blue

  • Post‐rinse with acetic acid

  • Rinse with water

  • Observe mucosa to check for staining

The result of the test is classified as positive if tissue is stained and negative if no tissue is stained, or equivocal if no definitive result can be obtained

Advantages: ability to define areas that could be malignant or abnormal but cannot be seen; assess the extent of the PMD for excision
Disadvantages: benign inflammatory lesions subject to stain; failure of some cancerous lesions to stain; variation in test performance depending on how thorough the test procedures are followed; contraindicated in those who are known to be allergic to iodine

Light‐based detection (e.g. ViziLite and ViziLite plus, Microlux/DL, VELscope, Identafi 3000)

Light‐based systems to identify pre‐malignant and malignant lesions and to highlight their presence through tissue autofluorescence or reflectance (Leston 2010; Lingen 2008; Patton 2008). E.g. using ViziLite Plus or Microlux/DL, the procedure is as follows (Lingen 2008)

  • Pre‐rinse with acetic acid

  • Use blue‐light source to visually assess the oral cavity

ViziLite Plus also provides a tolonium chloride solution (TBlue) to aid in the marking of the lesion for biopsy once the light source is removed

The result of the test is classed as negative if the appearance of the epithelium is lightly bluish white and positive if the appearance of the epithelium is distinctly white (acetowhite)

For systems based on autofluorescence the result of the test is classed as negative if fluorescence is maintained and positive if fluorescence is lost

Advantages: simple to use; non‐invasive; do not require consumable re‐agents; provide real time results; can be performed by a wide range of operators after a short training period
Disadvantages: the necessity of a dark environment; high initial set up (for VELscope) or recurrent costs (for ViziLite in low‐income countries); lack of permanent record unless photographed; inability to objectively measure visualisation results

Mouth self examination

Self examination, usually in the home setting in accordance with instructional material

Usually the presence of any lesion

Advantages: simple to carry out and low cost. Can be carried out in an individual's own home

Disadvantages: target condition is the presence or absence of oral lesions. Cannot differentiate between potentially malignant and non‐malignant lesions

Blood and saliva analyses

These novel technologies are at an early stage of development and evaluation
Analysis of blood or saliva samples which tests for the presence of bio‐markers of PMD and oral cancer (Brinkmann 2011; Lee 2009; Li 2006)

Cut‐off probabilities vary widely and are dependent on the individual bio‐marker or combination of bio‐markers examined
Molecular markers for diagnosis include changes in cellular DNA, altered mRNA transcripts, altered protein levels

Advantages: non‐invasive (saliva tests) or minimally invasive (blood tests)
Disadvantages: there is a tendency for the estimated diagnostic accuracy of new health technologies to decline over time as evidence from independent evaluations accumulate (Wyatt 1995). This bias, which can be substantial, has been demonstrated in other domains, e.g. acute abdominal pain (Liu 2006) and clinical decision support systems (Garg 2005). Promising bio‐marker tests in several clinical areas were eventually been shown to be disappointing (Buchen 2011). It remains to be seen whether this is the case with oral cancer and PMDs

PMDs = potentially malignant disorders

Figures and Tables -
Table 1. Screening tests for PMDs and oral cavity cancer
Table 2. Indicators for the assessment of methodological quality

Domain

Patient selection  

Index test 

Reference standard

Flow and timing 

Description

Describe methods of patient selection. Describe included patients (characteristics, prior testing, presentation, intended use of index test and setting)

Describe the index test and how it was conducted and interpreted. Describe the sequence of tests, any training or calibration of assessors (levels of agreement should be reported. Where this is measured by the kappa statistic*, acceptable values range from 0.61 (moderate agreement) to 1.00 (almost perfect agreement) (Landis 1977)), any procedures taken to ensure blinding of examiners, post‐hoc or a priori threshold specification, any conflict of interest or commercial funding

*This statistic is a measure of inter‐rater agreement of observations measured at a categorical level

Describe the reference standard and how it was conducted and interpreted. Any measures taken to ensure assessors were blinded to the results of the index tests should be documented, along with the sequence of reference and index tests

Describe the characteristics and proportion of patients who did not receive the index test(s) and/or reference standard, who received a reference standard other than examination and clinical evaluation by a specialist physician, or who were excluded from the 2 x 2 table (refer to flow diagram). Describe the time interval and any interventions between index test(s) and reference standard. The length of time between the index test and reference standard should be short in the majority of cases. If the period elapsed between initial screening and reference standard (examination and clinical evaluation) is greater than 6 weeks then this was considered an unacceptable delay

Signalling questions

(Yes/No/Unclear)

Was a consecutive or random sample of patients enrolled?

Classify as Yes if consecutive patients or a random sample of individuals were recruited

Classify as No if non‐consecutive patients or a non‐random sample of individuals were recruited

Classify as Unclear if patient selection was not clearly described

Were the index test results interpreted without knowledge of the results of the reference standard?

Classify as Yes if interpreters of index test results clearly do not know results of reference standard
Classify as No if interpreters of index test results clearly know results of reference standard
Classify as Unclear if study did not provide any information on whether interpreters of index tests were blinded to reference standard

Is the reference standard likely to correctly classify the target condition? The reference standard is an examination and clinical evaluation by a physician with specialist knowledge which if stated as such should be acceptable. Ideally this should be undertaken independently by more than one specialist. Alternatively an acceptable reference standard is extended follow‐up

Classify as Yes if the test is examination and clinical evaluation by a physician with specialist knowledge and/or training, or a non‐specialist with dedicated training to an acceptable standard

Classify as No if the test result is examination and clinical evaluation by a non‐specialist physician in the absence of dedicated training

Classify as Unclear if the study does not report the experience and training of those carrying out the reference standard

Was there an appropriate time interval between the index test(s) and reference standard?

Classify as Yes if the delay between the index test(s) and reference standard is considered acceptable for the majority of participants

Classify as No if the delay between the index test(s) and reference standard is considered unacceptable for the majority of participants

Classify as Unclear if the delay between the index test(s) and reference standard is not explicitly stated

Did the study avoid inappropriate exclusions?

Classify as Yes if the sample consisted of apparently healthy individuals

Classify as No if only individuals with existing PMDs were recruited

Classify as Unclear if exclusions were not clearly described

If a threshold was used, was it pre‐specified?

Classify as Yes if the threshold was pre‐specified

Classify as No if the threshold was not pre‐specified

Classify as Unclear if it is unclear whether the threshold was pre‐specified

Were the reference standard results interpreted without knowledge of the results of the index test?

Classify as Yes if personnel clearly do not know index test results when performing the examination and clinical evaluation or evaluating follow‐up data

Classify as No if personnel clearly know index test results when performing the examination and clinical evaluation or evaluating follow‐up data

Classify as Unclear if study did not provide any information on whether personnel were blinded to the index test results

Did all patients receive the same reference standard?

Classify as Yes if the same reference standard was used in all participants

Classify as No if the same reference standard was not used in all participants

Classify as Unclear if it is unclear whether different reference standards were used

Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test?

Classify as Yes if index test results were interpreted without knowledge

Classify as No if the index test results were interpreted with knowledge

Classify as Unclear if it is unclear whether the results of the second index test were interpreted without knowledge of the results of the first index test

Were all patients included in the analysis?

Classify as Yes if all patients were included in the analysis

Classify as No if only some patients were included in the analysis

Classify as Unclear if it is unclear whether all patients were included in the analysis

Were any conflicts of interest stated?

Classify as Yes if the study declared no conflict of interest

Classify as No if the study declared a conflict of interest

Classify as Unclear if there was no information on conflict of interest

Risk of bias: High/Low/Unclear

Could the selection of individuals have introduced bias?

Could the conduct or interpretation of the index test have introduced bias?      

Could the reference standard, its conduct, or its interpretation have introduced bias?

Could the patient flow have introduced bias? 

Concerns regarding applicability: High/Low/Unclear

Are there concerns that the included individuals do not match the review question?

Are there concerns that the index test, its conduct, or interpretation differ from the review question?

Are there concerns that the target condition as defined by the reference standard does not match the review question?

Assessment of overall risk of bias and applicability

An overall judgement of risk of bias and applicability to the review (high, low or unclear) was undertaken based on the judgements given to each domain. If the answers to all signalling questions within a domain were judged as yes indicating low risk of bias, then the domain was judged to be at low risk of bias. A no response to a signalling question was taken as an indication of the potential for risk of bias and the authors considered this risk within the context of the study before making a decision on whether the study was a high/low risk of bias for that domain

 

If any of the 4 domains was judged to be at high risk of bias then the study was judged to have a high risk of bias overall. If any of the 3 applicability domains was judged to be at high concern regarding applicability then the study was judged to be of high concern regarding applicability overall

PMDs = potentially malignant disorders.

Figures and Tables -
Table 2. Indicators for the assessment of methodological quality
Table Tests. Data tables by test

Test

No. of studies

No. of participants

1 Conventional oral examination Show forest plot

10

25568

2 Mouth self examination Show forest plot

2

34819

Figures and Tables -
Table Tests. Data tables by test