CSF tau and the CSF tau/ABeta ratio for the diagnosis of Alzheimer's disease dementia and other dementias in people with mild cognitive impairment (MCI)

Summary of findings Performance of CSF biomarkers in early diagnosis of dementia

What is the diagnostic accuracy of CSF biomarker levels for detecting Alzheimer's disease pathology in people with mild cognitive impairment (MCI), and identifying those MCI participants who would convert to Alzheimer’s disease dementia or other forms of dementia over time
Descriptive
Patient population	Participants diagnosed with MCI at baseline using any of the Petersen criteria or CDR = 0.5 or any 16 definitions included by Matthews (Matthews 2008)
Sampling procedure	Consecutive or random (n = 5) Not consecutive or random (n = 3) Unclear (n = 7)
Sources of recruitment	University memory clinic (n = 8); European multicentre memory clinics (n = 2); inpatients (n = 2); General Hospital memory clinic (n = 1); Research centre outpatient memory clinic (n = 1); not reported (n = 1)
Prior testing	The only testing prior to performing the plasma and CSF biomarkers was the application of diagnostic criteria for identifying participants with MCI.
MCI criteria	Petersen criteria (n = 14) Global Deterioration Scale (GDS) (n = 1)
Index tests	CSF t‐tau or CSF p‐tau or CSF p‐tau/ABeta ratio or CSF t‐tau/ABeta ratio
Reference standard	NINCDS‐ADRDA and/or DSM and/or ICD criteria for Alzheimer's disease dementia (n = 12); Global Dementia Scale (GDS) & Research criteria (n = 1); CDR = 1 criteria (n = 1); not specified (n = 1) McKeith criteria for Lewy body dementia; Lund criteria for frontotemporal dementia; and NINDS AIREN criteria for vascular dementia
Target condition	Alzheimer’s disease dementia or any other types of dementia
Included studies	Prospectively well‐defined cohorts of MCI participants (n = 7), nested case‐control studies with a prospectively defined MCI group (n = 6) and studies with a retrospectively defined MCI group with longitudinal data (n = 2). Fifteen studies (N = 1282 participants) were included. Number included in analysis: 1172
Quality concerns	Patient selection and conduct of the reference standard were poorly reported. Applicability concerns were generally low. Regarding the inclusion criteria set in the review, the majority of included studies did match the review question: 'Could CSF t‐tau and CSF t‐tau/ABetaratio biomarkers identify those MCI participants with Alzheimer’s disease pathology at baseline who would convert clinically to dementia at follow up?' However, due to a limited number of included studies and levels of heterogeneity, it is difficult to determine to what extent the findings from a meta‐analysis can be applied to clinical practice.
Limitations	Limited investigation of heterogeneity due to insufficient number of studies. There was a lack of common thresholds.
Test Median percentage converting (range) ²	Studies	Cases/participants	Median specificity from included studies	Sensitivity (95% CI)¹ at median specificity	Consequences in a cohort of 100
					Median percentage converting²	Missed cases	Overdiagnosed
Alzheimer's disease dementia
CSF t‐tau	7	436/709	72	77 (67, 85)	37	9	18
Alzheimer's disease dementia
CSF p‐tau	6	164/492	47.5	81 (64, 91.5)	37	7	33
Alzheimer's disease dementia
CSF p‐tau/ ABeta ratio	5	140/433	No meta‐analysis	No meta‐analysis
All types of dementia
CSF t‐tau	4	166/319	No meta‐analysis	No meta‐analysis
Investigation of heterogeneity: the planned investigations were not possible due to the limited number of studies available for each analysis. We were unable to investigate the effect of duration of follow‐up due to substantial variation in length and reporting.
Conclusions: Given the insufficient evidence to evaluate the diagnostic value in MCI of CSF t‐tau, CSF p‐tau, CSF t‐tau/ABeta ratio and CSF p‐tau/ABeta ratio for Alzheimer's disease dementia and other forms of dementias examined in this review, particular attention should be paid to the risk of misdiagnosis and overdiagnosis of dementia (and therefore overtreatment) in clinical practice. Future studies with more uniform approaches to thresholds, analysis and study conduct may provide a more homogenous estimate than the one that has been available from the included studies we have identified.
¹Meta‐analytic estimate of sensitivity derived from the HSROC model at a fixed value of specificity. Summary estimates of sensitivity and specificity were not computed because the studies that contributed to the estimation of the summary ROC curve used different thresholds. ²The median percentage converting was calculated using all the studies that reported 'conversion from MCI to Alzheimers' disease dementia' (Table 2)

Background

Dementia is a progressive syndrome of global cognitive impairment with resultant functional decline. In the United Kingdom (UK), it affects 5% of the population over 65 and 25% of those over 85 (Knapp 2007). Worldwide, there were estimated to be 36 million people living with dementia in 2010 (Wilmo 2010), and this will increase to over 115 million by 2050 (Prince 2013). The greatest increases in prevalence are likely to be seen in the developing regions. By 2040, China and its western‐Pacific neighbours are predicted to have 26 million people living with dementia (Ferri 2005).

Dementia encompasses a group of neurodegenerative disorders that are characterised by progressive loss of cognitive function and ability to perform activities of daily living, that can be accompanied by neuropsychiatric symptoms and challenging behaviours of varying type and severity. The underlying pathology is usually degenerative and subtypes of dementia include Alzheimer’s disease dementia, vascular dementia, dementia with Lewy bodies, and frontotemporal dementia. There may be considerable overlap in the clinical and pathological presentations (MRC CFAS 2001), and there is often coexistence of Alzheimer’s disease dementia, vascular dementia and other causes of neuronal atrophy (Matthews 2009; Savva 2009).

Alzheimer’s disease dementia is an incurable, progressive, neurodegenerative condition which accounts for over 50% of all dementias, afflicting 5% of men and 6% of women over the age of 60 worldwide (World Health Organization 2010). Its prevalence increases exponentially with age, with Alzheimer’s dementia affecting fewer than 1% of people aged from 60 to 64 years, but 24% to 33% of those over the age of 85 (Ferri 2005).

There have been over a dozen different definitions used to describe cognitive impairment that is somehow qualitatively different from so‐called ‘normal’ ageing. The first complaints in people with Alzheimer’s disease spectrum are often cognitive problems such as problems with planning and judgement, as well as the more characteristic memory complaints. This may lead to a diagnosis of Mild Cognitive Impairment (MCI) if formal testing reveals objective evidence of cognitive impairment. It has not been previously mandated which psychometric tests should be used to objectively define cognitive impairment. However, the objectivity of the cognitive impairment diagnosis is critical, as it differentiates this population from a group with subjective cognitive impairment, which is more likely to have a non‐neurodegenerative aetiology. MCI is a heterogeneous condition, the diagnosis of which holds very little prognostic significance. There are four outcomes for those within an MCI population: progression to Alzheimer’s disease dementia, progression to another dementia, maintaining stable MCI, and recovery. Currently, 16 different classifications are used to define MCI (Matthews 2008). In this protocol, MCI refers to this extended definition of MCI or to the clinical criteria defined by Petersen criteria or revised Petersen criteria (Petersen 1999; Petersen 2004; Winbald 2004) or to the Cognitive Dementia Rating (CDR = 0.5) scale (Morris 1993).

Studies indicate that an annual average of 5% to 15% of people with MCI progress to Alzheimer’s disease dementia (Petersen 1999; Bruscoli 2004; Mattson 2009; Petersen 2009). This all depends on clinical profile, settings and investigation for vascular disease. At the present time, there is no clinical method to determine accurately which of those people with MCI will develop Alzheimer’s disease dementia or other forms of dementia.

Recent consensus guidelines have been developed, e.g. the second iteration of International Working Group (IWG2) on 'prodromal dementia', which seeks to improve prognostic accuracy in the prodromal phase of Azheimer's dementia by the incorporation in criteria of Alzheimer's disease‐related biomarkers (Dubois 2014). It is in this context, that reviews such as this one become especially relevant and timely.

Research suggests that measurable change in proton emission tomography (PET), magnetic resonance (MRI) and cerebrospinal fluid (CSF) biomarkers occurs years in advance of the onset of clinical symptoms (Beckett 2010). In this review, we aimed to assess the ability of CSF total tau (t‐tau), CSF phosphorylated tau (p‐tau), the CSF t‐tau/ABeta ratio, and the CSF p‐tau/ABeta ratio, to enable the detection of Alzheimer’s dementia and other forms of dementia in people with MCI. These biomarkers have been chosen as they are considered to be the most intimately expressed biomarkers of the Alzheimer's disease core pathology; namely, the aggregation and fibrilisation of the amyloid plaque and hyperphosphorylation of tau. Consequentially, these biomarkers have been proposed as important in new criteria for Alzheimer's disease dementia that incorporate biomarker abnormalities. PET imaging of amyloid is now approved by both the FDA and EMA to rule out Alzheimer's disease as the aetiology of MCI, especially in individuals with unusual clinical presentations. However, manufacturers of these tracers have ongoing 'appropriate use criteria' ongoing post‐marketing studies to learn where these tests have greatest usage and utility for the person's accurate diagnosis. Recent improvements to CSF sampling and the relatively inexpensive nature of this test compared with PET scanning means that it will remain the test of choice for documenting CSF protein abnormalities in neurodegenerative disease. Side effects are increasingly rare but include headache and local reactions at the site of the lumbar puncture. Patients on anticoagulative therapies (except aspirin) are considered at too high a risk by most practitioners to undergo this procedure for the diagnosis of Alzheimer’s dementia.

Target condition being diagnosed

In this review, there are two target conditions: i) Alzheimer's disease dementia and ii) other forms of dementia, both of which were assessed at follow‐up.

We compared the index test results obtained at baseline with the results of the reference standard (clinical criteria) obtained at follow‐up (delayed verification of clinical diagnosis).

Index test(s)

This review is part of a suite of reviews for assessing the accuracy of CSF ABeta (Ritchie 2014), PET Amyloid (Zhang 2014; Smailagic 2015), MMSE (Arevalo‐Rodriguez 2015), and other index tests in identifying those people with MCI without clinical onset of dementia, who would develop Alzheimer's disease dementia or other forms of dementia during follow‐up. We planned to consider the following:

Total tau (t‐tau) and phosphorylated tau (p‐tau) CSF biomarker tests

Tau is a microtubule‐associated protein located primarily in neuronal axons. There are six different human isoforms, each of which has multiple phosphorylation sites. Physiologically tau interacts with tubulin and plays an important role in the organisation and stabilisation of microtubules. Independent of phosphorylation status, slightly increased levels of CSF total tau (t‐tau) have been associated with ageing, vascular dementia, multiple sclerosis, AIDS dementia, head injury and tauopathy; significant increases with Creutzfeldt‐Jakob disease and meningoencephalitis; and a threefold increase has been seen in Alzheimer’s disease compared to normal controls (Shoji 2002). A systematic review of CSF biomarkers for Alzheimer’s disease analysing 41 studies of CSF t‐tau, demonstrated a specificity of 90% and sensitivity of 81% in diagnosing the condition (Blennow 2003).

The p‐tau protein also has a number of potential phosphorylation sites (Billingsley 1997) and abnormal hyperphosphorylation has been shown to be associated with microtubule disruption and the formation of neurofibrillary tangles, dystrophic neurites surrounded by neuritic plaques, and neuropil threads, major components of Alzheimer’s disease pathophysiology (see Mandelkow 1998). A systematic review of 11 studies of CSF p‐tau in Alzheimer’s disease indicated a diagnostic specificity and sensitivity of 92% and 80% respectively (Blennow 2003).

There is great interest around the use of biomarkers and imaging techniques for the prediction of progression from MCI populations to Alzheimer’s disease dementia and other forms of dementia. The international consortium study Alzheimer Disease Neuroimaging Initiative (ADNI), performed between 2004 and 2009, has so far been a key cohort study for predicting the progression from MCI to Alzheimer’s disease using biomarkers, and demonstrated a sensitivity and specificity of CSF t‐tau of 70% and 92% and CSF p‐Tau181 of 68% and 73% respectively (Petersen 2010).

T‐tau/ABeta ratio and p‐tau/ABeta ratio CSF biomarker tests

ABeta is produced mainly by neurons, secreted into the CSF and then cleared through the blood‐brain barrier and degraded by the reticuloendothelial system. ABeta levels are thus regulated in strict equilibrium between the brain, CSF and blood (Shoji 1992), but, in Alzheimer’s disease patients, ABeta42 forms insoluble amyloid and accumulates as intracerebral fibrils, resulting in decreased levels of CSF ABeta42 (Shoji 2001).

ABeta in CSF has only modest potential as a test for delayed verification of Alzheimer’s disease (Ritchie 2014), with meta‐analysis of studies being hampered by poor methodological quality (Noel‐Storr 2013) and multiple thresholds being reported between studies (Ritchie 2011).

In 2001, the American Academy of Neurology produced practical guidelines for dementia, including three Class II or III reports in a systematic review of a combination study of ABeta42 and t‐tau CSF levels. The sensitivity and specificity for diagnosis of Alzheimer’s disease were 85% and 87% (Knopman 2001), supported by the 2001 systematic review revealing 83% to 100% sensitivity and 85% to 95% specificity for the CSF ABeta42 and t‐tau combination assay (Blennow 2003). Again, the ADNI cohort study demonstrated that the t‐tau/ABeta42 ratio could be used to predict conversion from MCI to Alzheimer’s disease dementia, revealing a sensitivity of 86% and specificity of 85% (Petersen 2010).

Clinical pathway

Dementia develops over several years and there is a presumed period when people are asymptomatic, although disease pathology may have accumulated. Individuals or their relatives may first notice subtle impairments of short‐term memory when the completion of complex tasks such as management of finances or medications becomes increasingly difficult. In the UK, people usually present to their general practitioner who may then refer them to a specialist following a brief cognitive test, clinical examination and exclusion of relevant physical illness. The biomarkers may then be administered by a specialist. There is, however, much regional variability in this, with Spain and Nordic countries favouring CSF sampling in their routine clinical work‐up, whereas other countries, such as the UK, do not. However, many people with dementia do not present until much later in the disorder and they will, therefore, follow a different pathway to diagnosis, for example, being identified during an admission to general hospital for a physical illness. Thus, the pathway influences the accuracy of the diagnostic test. The accuracy of the test will vary with the experience of the administrator, and the accuracy of the subsequent diagnosis will vary with the history of referrals to the particular healthcare setting. Diagnostic assessment pathways may vary in other countries and diagnoses may be made by a variety of specialists including psychiatrists, neurologists, and geriatricians.

Role of index test(s)

The sampling of CSF and assay for levels of tau and ABeta could have a role when applied in specialist clinics. Due to the costs, risks, and complexity of the testing, CSF tests will not be applied in a primary care setting. The roles of these index tests are as add‐on biomarker tests which have been proposed in new research diagnostic criteria to compliment clinical examination and cognitive tests.

Alternative test(s)

We did not include alternative tests in this review, because there are currently no standard practice tests available for the diagnosis of dementia.

Rationale

Recently proposed research diagnostic criteria for ‘prodromal dementia’/’pre‐dementia stage’/‘MCI due to Alzheimer's disease pathology’ and for 'Alzheimer’s disease' and for the 'preclinical states of Alzheimer's disease' (Albert 2011; Dubois 2010; Dubois 2014), incorporate biomarkers based on imaging or CSF measures within the diagnostic rubric. These tests are core to the criteria, assuming they will improve the specificity of the traditional solely clinical criteria. It is crucial that each of these biomarkers is assessed for their diagnostic accuracy before they are adopted as routine tests in clinical practice. It is worth noting that in each of these criteria, a single abnormality in any of the proposed biomarker/imaging tests is considered sufficient to make a diagnosis of prodromal Alzheimer’s disease dementia.

Underpinning the new criteria is the assumption that if Alzheimer’s disease pathology can be diagnosed at an earlier, pre‐dementia stage, this could open critical windows for interventions that will have a greater likelihood of success in affecting disease pathways and thereby improving clinical symptoms. Earlier accurate diagnosis will also help people with pre‐dementia cognitive impairment, their families and potential carers make timely plans for the future. Coupled with appropriate contingency planning, proper recognition of the disease may also help to prevent inappropriate and potentially harmful admissions to hospital or institutional care (Bourne 2007). In addition, the accurate early identification of a dementia syndrome may improve opportunities for the use of newly evolving interventions designed to delay or prevent progression to more debilitating stages of dementia.

Objectives

Secondary objectives

To investigate the amount and associations of heterogeneity in the included studies of test accuracy.

We expected heterogeneity to be an important component of the review. We planned to use target population, index test, target disorder and study quality as a framework for the investigation of heterogeneity.

Methods

Criteria for considering studies for this review

Types of studies

We considered longitudinal cohort studies in which index test results were obtained at baseline and the reference standard results at follow‐up (see Index tests; Reference standards). These studies necessarily employ delayed verification of conversion to dementia and are sometimes labelled as ‘delayed verification cross‐sectional studies’ (Bossuyt 2008,; Knottnerus 2002). This approach recognises the challenges of concurrent application of the reference test and index test. In reality, the reference standard for dementia is tissue sampling and histological examination, either at post mortem or from brain biopsy. Brain biopsy is not undertaken in any setting and a post mortem is so distant an event from the index test being conducted that there is the possibility that disease may have developed in the years after the index test. The Dementia DTA group chose to use later diagnosis of dementia (using standardised criteria) as evidence of delayed verification. This methodology has been published by our group (Mason 2010) and also reflects the approach taken in most of the primary research in this area.

We included nested case‐control studies if they incorporated a delayed verification design. We believe this can only occur in the context of a cohort study, so these studies are invariably diagnostic nested cohort studies. We only included data on performance of the index test to discriminate between people with MCI who converted to dementia and those who remained stable from those studies. We did not consider data from healthy controls or any other control group.

Participants

Participants recruited and clinically classified as those with mild cognitive impairment (MCI) at baseline were eligible for inclusion in this review. The diagnosis for MCI was established using the Petersen criteria or revised Petersen criteria (Petersen 1999; Petersen 2004; Winbald 2004) and/or Matthews criteria (Matthews 2008) and/or 'CDR = 0.5' (Morris 1993). These criteria include: subjective complaints; a decline in memory objectively verified by neuropsychological testing in combination with a history from the patient; a decline in other cognitive domains; no or minimal impairment of activities of daily living; and not meeting the criteria for dementia. Therefore, the eligible participants had a number of tests, e.g. neuropsychological tests for cognitive deficit and checklists for activities of daily living, before study entry. Participants were defined either as amnestic single domain, amnestic multiple domain, non‐amnestic single domain, non‐amnestic multiple domain, or nonspecified MCI participants.

We included participants from secondary and tertiary settings. Although demographic and clinical characteristics of MCI, as well as sources of recruitment, might differ in those settings, we decided not to limit our review by setting; instead, we planned to look for variation within and between settings, and examined the potential influence of the setting on diagnostic performance of the index test in the analyses.

We excluded those studies that included people with MCI possibly caused by: i) a current or history of alcohol/drug abuse; ii) central nervous system (CNS) trauma (e.g. subdural haematoma), tumour, or infection; iii) other neurological conditions, e.g. Parkinson’s or Huntington’s diseases.

Because detail of the causes of study dropouts is crucial, and, if such data are missing, the reliability of the conclusions must be questioned, we planned to take this into consideration.

Index tests

Studies that assessed the accuracy of CSF measurements of CSF t‐tau, CSF p‐tau, CSF t‐tau/ABeta ratio, or CSF p‐tau/ABeta ratio were included.

There are currently no generally accepted standards for the plasma or CSF ABeta test threshold, and therefore it was not possible to prespecify what constituted a positive or negative result. We used the criteria which were applied in each included primary study to classify participants as either test positive or test negative.

Measure of index test: t‐tau and p‐tau and ABeta level in CSF (ng.l^‐1 or pg.ml^‐1)

The assays most commonly used were conventional Innogenetics, Ghent, Belgium kit or INNOTEST Phospho‐Tau₍₁₈₁₎ kit or INNOTEST ABeta₄₂ or INNOTEST the multiplexing INNO‐BIA AlzBio3 for CSF.

We did not include a comparator test because there are currently no standard practice tests available for the diagnosis of dementia. We compared the index tests with a reference standard.

Target conditions

There were two target conditions in this review:

Alzheimer’s disease dementia (conversion from MCI to Alzheimer’s disease dementia)
Any other forms of dementia (conversion from MCI to any other forms of dementia)

Reference standards

For the purpose of this review, several definitions of Alzheimer’s disease dementia were acceptable. Included studies could apply probable or possible NINCDS‐ADRDA (National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association) criteria (McKhann 1984). The Diagnostic and Statistical Manual of Mental Disorders (DSM) (DSMIII 1987; DSMIV 1994) and International Classification of Diseases (ICD) (World Health Organization 2010) definitions for Alzheimer’s disease dementia were also acceptable. It should be noted that different iterations of these standards may not be directly comparable over time (e.g. DSM‐IIIR versus DSM‐IV). Moreover, the validity of the diagnoses may vary with the degree or manner in which the criteria have been operationalised (e.g. individual clinician versus algorithm versus consensus determination). We planned to consider all these issues in interpreting the results, using sensitivity analyses as appropriate.

Similarly, differing clinical definitions of other forms of dementias were acceptable. For Lewy body dementia, the reference standard is the McKeith criteria (McKeith 1996; McKeith 2005). For frontotemporal dementia, the reference standard is the Lund criteria (LMG 1994, Neary 1998, Boxer 2005). DSM (DSMIII 1987; DSMIV 1994) and ICD (World Health Organization 2010) were also acceptable for frontotemporal and vascular dementias.

The time interval over which progression from MCI to Alzheimer’s disease dementia or other forms of dementia happened is also important. As age is the principal risk factor for Alzheimer's dementia and other forms of dementias, the longer the duration of follow‐up, the more likely the possibility of generating false positive findings for the index test. To this end, no limits were put on the length of follow‐up in the included studies, though this important variable was captured so we could examine between‐study variations. This change reflected an alteration to the original thinking in the published protocol and is noted in the Differences between protocol and review section of this review.

We planned to segment analyses into separate follow‐up periods for the delay in verification: less than one year, one year to less than two years; two to less than four years; and more than four years.

Search methods for identification of studies

Electronic searches

The main search for this review was performed in January 2013. However, we ran a top‐up search in December 2015. We searched MEDLINE (OvidSP), Embase (OvidSP), BIOSIS Previews (Thomson Reuters Web of Science), Web of Science Core Collection, including Conference Proceedings Citation Index (Thomson Reuters Web of Science), PsycINFO (OvidSP), and LILACS (BIREME) (see Appendix 1 for details of the sources searched, the search strategies used, and the number of hits that were retrieved for the search carried out in January 2013). The results of the top‐up search that were carried out in December 2015 have not yet been fully incorporated into the review (please see Results of the search for more details).

We also requested a search of the Cochrane Register of Diagnostic Test Accuracy Studies (managed by the Cochrane Renal Group).

We did not apply any language or date restrictions to the electronic searches. We did not use methodological search filters (collections of terms aimed at reducing the number needed to screen by filtering out irrelevant records and retaining only those that are relevant) in the main bibliographic databases (MEDLINE, Embase and PsycINFO) as a single‐stranded method to restrict the search overall because available filters have not yet proved sensitive enough for systematic review searches (Beynon 2013). Instead, we used a multi‐stranded approach in order to maximise sensitivity, including some searches run in parallel, that included specific terms designed to capture diagnostic studies (see search narrative in Appendix 1)

Searching other resources

We checked the reference lists of all relevant studies for additional studies. We also conducted searches in the MEDION database (Meta‐analyses van Diagnostisch Onderzoek) at www.mediondatabase.nl, Database of Abstracts of Reviews of Effects (DARE) at http://www.crd.york.ac.uk/CRDWeb, Health Technology Assessment Database (HTA Database) at http://www.crd.york.ac.uk/CRDWeb, and Aggressive Research Intelligence Facility (ARIF) database at www.arif.bham.ac.uk for other related systematic diagnostic accuracy reviews; we searched for systematic reviews of diagnostic studies from the International Federation of Clinical Chemistry and Laboratory Medicine Committee for Evidence‐based Laboratory Medicine database (C‐EBLM). We checked reference lists of any relevant systematic reviews for additional studies. We also contacted researchers involved in relevant studies for applicable and usable but unpublished data.

Data collection and analysis

Selection of studies

Two researchers (EL and AN‐S) screened all titles and abstracts generated by the electronic database searches for relevance.

Two researchers (EL and AN‐S) independently reviewed the remaining abstracts of selected titles and selected all potentially‐eligible studies for full text review. Four researchers (NS, AN‐S, SM and EL ) independently further assessed full manuscripts against the inclusion criteria (see Criteria for considering studies for this review). Where necessary, a third arbitrator (CWR) resolved disagreements that the two researchers could not resolve through discussion.

Where a study included useable data but these were not presented in the published manuscript, we contacted the authors directly to request further information. If the same data set was presented in more than one paper, we included only the primary paper.

We detailed the numbers of studies selected at each point in a study flow diagram (Figure 1).

Figure 1

Study flow diagram

Note: a top‐up search performed in December 2015 revealed 6134 records

85 records retained after de‐duplication and assessment by one experienced reviewer

81 records excluded after further assessment performed by two review authors

4 studies identified for possible inclusion (Characteristics of studies awaiting classification)

Data extraction and management

We extracted data onto a study‐specific form which included the following:

Author, year of publication, and journal.
The index test and assay type used (thresholds used to define positive and negative tests).
The criteria used for clinical definition for the baseline population.
Baseline demographics of the study population (age, gender, apolipoprotein E (ApoE) status, MMSE and clinical setting).
The duration of follow‐up (mean, minimum, maximum, and median).
The proportion of participants developing the outcome of interest (Alzheimer's disease dementia using NINCDS‐ADRDA criteria) as well as other forms of dementias where standard criteria were used.
The sensitivity and specificity of the index test in defining Alzheimer's disease dementia (these were used to back‐translate into a 2 x 2 table (Appendix 2)).
Other data relevant for creating 2 x 2 tables (TP = true test positive; FP = false test positive; FN = false test negative; TN = true test negative) e.g. the number of 'abnormal' and 'normal' tests and baseline variables; the number of disease 'presence' and disease 'absence' at follow‐up, as well as through scrutiny of scatter plots.

We also extracted data necessary for the assessment of quality as defined below.

Data extraction was performed independently by two blinded review authors (NS and AN‐S). Disagreement in data extraction was resolved by discussion, with the potential to involve a third author (CWR) as arbitrator, if necessary.

Assessment of methodological quality

Two review authors (NS and AN‐S), blinded to each other’s scores, independently performed methodological quality assessments of each study using the QUADAS‐2 tool (Whiting 2011), as recommended by the Cochrane Collaboration. Disagreement was resolved by further review and discussion with the potential to involve a third author (CWR) as arbitrator, if necessary.

The tool is made up of four domains: participant selection, index test, reference standard and participant flow. Each domain was assessed in terms of risk of bias, with the first three domains also considered in terms of applicability concerns (Quadas‐2) (Appendix 3).The components of each of these domains and a rubric which details how judgments concerning risk of bias are made are detailed in Appendix 4. Certain key areas important for this review regarding quality assessment were participant selection, index test, and blinding.

We did not use QUADAS‐2 data to form a summary quality score. We produced a narrative summary describing numbers of studies that were found to have high/low/unclear risk of bias, as well as concerns regarding applicability.

Statistical analysis and data synthesis

We evaluated test accuracy according to the target condition. There are no accepted thresholds to define what constitutes a positive or negative CSF index test for identifying those people with MCI who would convert to Alzheimer’s disease dementia or other forms of dementia over time. Therefore, the estimates of diagnostic accuracy reported in primary studies were likely to be based on data‐driven threshold selection (Leeflang 2008). We conducted exploratory analyses by plotting estimates of sensitivity and specificity from each study on forest plots and in receiver operating characteristic (ROC) space. We did not meta‐analyse pairs of sensitivity and specificity using the bivariate model, as originally planned, because the results were not clinically interpretable when studies with mixed thresholds were included in the analysis. Instead, we fitted HSROC meta‐analysis models to estimate summary ROC curves using SAS (Statistical Analysis Software), version 9.2 (SAS Institute 2011). We derived estimates of sensitivity and likelihood ratios at a fixed value of specificity (chosen a priori as the median specificity for the studies that were analysed when fitting the model) from the HSROC models for the illustrative purposes. Confidence intervals for sensitivity and the likelihood ratios were calculated using the delta method (Davison 2003), using the 'estimate' command after fitting the HSROC models in SAS. HSROC models were only fitted for analyses where data for 2 x 2 tables were provided by at least six studies, given the need to estimate five parameters. Where HSROC models were fitted, we summarised the post‐test probability of conversion from MCI to dementia given a positive test result and given a negative test result for a range of prevalences of conversion (pretest) probabilities. This was done by plotting the post‐test probabilities against the pretest probabilities, calculating the former based on the pretest probabilities and the likelihood ratios estimated from the HSROC model at the median of the observed specificity values from the included studies. A positive predictive value (PPV) and a negative predictive value (NPV) were also reported, based on the median prevalence (pretest probability) of conversion across studies. We caution that these post‐test probabilities and PPV and NPV values related to likelihood ratios for hypothetical values of sensitivity and specificity for which the true threshold value of the index test was not known.

Investigations of heterogeneity

Heterogeneity was investigated through visual examination of forest plots of sensitivities and specificities and through visual examination of the ROC plot of the raw data. The main sources of heterogeneity were thought likely to be reference standards used, participant sampling, index test methodology and aspects of study quality (particularly inadequate blinding).

There were insufficient studies, therefore we did not perform meta‐regression (by including each potential source of heterogeneity as a covariate in the HSROC model) as planned (Differences between protocol and review).

Sensitivity analyses

We planned to investigate the effect of quality items (such as pre‐specifying threshold) on the accuracy of index tests by undertaking sensitivity analyses. Due to the limited number of studies, we did not perform any sensitivity analyses (Differences between protocol and review)

Assessment of reporting bias

We did not investigate reporting bias because of current uncertainty about how it operates in test accuracy studies and the interpretation of existing analytical tools such as funnel plots.

Results

Results of the search

The total number of records identified by the searches up to January 2013 was 20,446. After de‐duplication, a small team of assessors performed a first assessment of the remaining records. After a second assessment, 255 records were retained, of which 178 were excluded after assessment performed by two review authors. Seventy‐seven references were identified as possible eligible studies and were assessed for inclusion (Figure 1). Fifteen papers were included, and 40 were discarded for the following reasons: i) data not suitable for analysis or insufficient data for creating two by two tables (n = 28) (Characteristics of excluded studies); ii) not a delayed verification study (n = 2); iii) not MCI participants at baseline (n = 2); iv) unsuitable index test (n = 2); v) reference not obtained (n = 5). In addition, twenty two papers were identified as multiple publications. One paper was not in English (Urakami 2004). No extra studies were found through reference checking. We obtained usable data for five papers (Amlien 2013; Galluzzi 2010; Hansson 2006; Visser 2009; Vos 2013) through contacting the authors.

We ran a top‐up search in December 2015. The results of this search will be fully incorporated into the review at update. However, readers may wish to know that this search identified a total of 6314 results. After screening, four new studies were identified for inclusion within the review (please see Additional Tables: Table 1 for more details). The characteristics of these four new studies and their heterogeneity were all consistent with the fully incorporated studies.

Table 1. Studies awaiting classification

Conversion from MCI to Alzheimer’s disease dementia
Study	Participants n/N (included in analysis)	Index test (number and % of positive tests)	Threshold (test abnormal) (prespecified Yes/No)	Number of converters (%) FP and FN	Test accuracy at study level		Duration of follow‐up
					Sensitivity	Specificity
*Balasa 2014	51/51	CSF ABeta42/p‐tau ratio 25/51 (49%)	< 6.43 (Yes)	24/51 (47%) FP =1; FN =0	100%	96%	41 months for MCI‐AD; 30 months for MCI‐MCI
*Ewers 2012	130/130	CSF t‐tau 65/130 (50%)	Not reported	58/130 (45%) FP = 30; FN = 23	60.7%	58.9%	24 months
		CSF p‐tau 67/130 (51.5%)	Not reported	58/130 (45%) FP = 30; FN = 21	63.9%	58.9%
*Leuzy 2015	33/33	CSF t‐tau 15/33 (45%)	˃ 400 pg/mL(Yes)	12/33 (36%) FP = 7; FN = 4	67%	67%	Not reported
		CSF t‐tau/ABeta ratio 12/33 (36%)	< 1.14 (Yes)	12/33 (36%) FP = 6; FN = 6	50%	71%
Conversion from MCI to all dementias
*Eckerstrom 2015	73/73	CSF p‐tau 15/73 (20.5%)	73 pg/mL (No)	27/73 (36.9%) FP = 3; FN = 15	75%	92%	43.1 ± 23 months MCI‐stable; 33.7 ± 24 months MCI converters
Study awaiting translation
Urakami 2004

^{AD: Alzheimer's disease; FN: false negative; FP: false positive; MCI: mild cognitive impairment}

^{*Authors need to be contacted in order to obtain missing data/relevant information. Data presented are provisional.}

Included Studies

The Characteristics of included studies table lists the characteristics of the 15 included studies containing a total of 1282 participants with MCI at baseline of whom 1172 had analysable data. Two studies (Buchhave 2012; Hansson 2006) involved the same cohort. Buchhave 2012 reported the data for the CSF p‐tau/ABeta ratio index test from a new follow‐up period.

Study designs were seven prospectively well‐defined cohorts of participants with MCI (Buchhave 2012; Fellgiebel 2007; Galluzzi 2010; Herukka 2007; Kester 2011; Palmqvist 2012; Vos 2013), six nested case‐control studies with a prospectively defined MCI group (Amlien 2013; Hansson 2006; Koivunen 2008; Monge‐Argiles 2011; Parnetti 2012; Visser 2009) and two studies with a retrospectively defined MCI group with longitudinal data (Eckerstrom 2010; Hampel 2004).

A majority of studies (n = 9) were published between 2010 and 2013. The remaining six studies were published from 2004 to 2008. All of the included studies were conducted in Europe (five in Sweden, two in Italy and two in Finland, one in The Netherland, one in Spain, one in Norway, one in Germany and two were European multi‐centre studies). They used one version or another of the Petersen criteria for MCI. Twelve studies applied NINCDS‐ADRDA criteria or NINCDS‐ADRDA and DSM criteria as a reference standard for Alzheimer’s disease dementia. Amlien 2013 used Global Dementia Scale (GDS) & Research criteria, Fellgiebel 2007 used 'CDR = 1 criteria' and Parnetti 2012 did not specify the reference standard at follow‐up.

Study sizes varied and ranged from 15 (Koivunen 2008) to 231 participants (Vos 2013). Nine papers had included participants with a mean age of 70 years or under. The mean (range) age of the youngest sample was 64 years (45 to 76) (Amlien 2013) and the mean (SD) age of the oldest sample was 73.4 (6.6) years (Monge‐Argiles 2011). Sampling procedure and APOE ɛ4 gene carriers were poorly reported. Participants were mainly recruited from university memory clinics (n = 8), while one study did not report sources of recruitment (Koivunen 2008).

Different CSF biomarker level values were used as a threshold in the included studies (Additional tables: Table 2). The threshold was prespecified in only five studies (Amlien 2013; Herukka 2007; Kester 2011; Koivunen 2008; Vos 2013). A percentage of converters to Alzheimer’s disease dementia ranged from 22% (Visser 2009) to 56% (Hampel 2004). CSF index test positivity ranged from 23% (Amlien 2013) to 69% (Vos 2013) . Duration of follow‐up was reported as mean and standard deviation (SD), or median, or range. Most studies had follow‐up between 12 and 36 months. Some participants were followed up for less than one year in three of the included studies (Fellgiebel 2007; Hampel 2004; Monge‐Argiles 2011), and for more than four years in five of the included studies (Buchhave 2012; Herukka 2007; Palmqvist 2012; Parnetti 2012). Participants in the remaining seven studies (Amlien 2013; Eckerstrom 2010; Galluzzi 2010; Kester 2011; Koivunen 2008; Visser 2009) were followed up from one to three years.

Table 2. Conversion from MCI to Alzheimer's disease dementia

Included studies, index test and test accuracy at study level for conversion from MCI to Alzheimer’s disease dementia
Study	Participants n/N (included in analysis)	Index test (number and % of positive tests)	Threshold (test abnormal) (prespecified Yes/No)	Number of converters (%) FP and FN	Test accuracy at study level		Duration of follow‐up
Study	Participants n/N (included in analysis)	Index test (number and % of positive tests)	Threshold (test abnormal) (prespecified Yes/No)	Number of converters (%) FP and FN	Sensitivity	Specificity	Duration of follow‐up
Amlien 2013	49/39	CSF t‐tau 9/39 (23%)	≥ 300 ng/L for age younger than 50 years; ≥ 450 ng/L for age 50 to 69 years; ≥ 500 ng/L for age older than 70 years (Sjogren 2001) (Yes)	9/39 (23%); FP = 4; FN = 4	56%	87%	mean 2.6 ± 0.5 years (range 1.6 to 4 years)
Buchhave 2012*	137/134	CSF p‐tau/ABeta ratio 69/134 (51%)	˂ 6.2 ng/L (No)	72/134 (54%) FP = 6; FN = 9	88%	90%	median: 9.2 years (range 4 to 12 years)
Fellgiebel 2007	16/16	CSF p‐tau 12/16 (75%)	≥ 50 pg/mL (No)	4/16 (25%) FP = 8; FN = 0	100%	33%	mean 19.6 ± 9.0 months
Hampel 2004	52/52	CSF t‐tau 38/52 (73%)	≥ 479 ng/L (No)	29/52 (56%); FP = 12; FN = 3	90%	48%	mean 8.4 ± 5.1 months (range 2 to 24 months)
Hansson 2006*	137/134	CSF t‐tau 38/134 (28%)	> 350 ng/L (No)	57/134 (42%); FP = 9; FN = 28	51%	88%	Total sample: median 5.2 years (range 4.0 to 6.8 years); MCI‐AD: median: 4.3 years (range 1.1 to 6.7 years) MCI‐other dementias: median 4.2 years (range 1.5 to 3 years)
		CSF p‐tau 50/134 (37%)	≥ 60 ng/L (No)	57/134 (42%); FP = 11; FN = 18	68%	86%
		CSF p‐tau/ABeta ratio 74/134 (55%)	˂ 6.5 ng/L (No)	57/134 (42%); FP = 19; FN = 2	96%	75%
Kester 2011	153/100	CSF t‐tau 64/100 (64%)	> 356 pg/mL (Yes)	42/100 (42%) FP = 29; FN = 7	83%	50%	median 18 months (IQR 13 ‐ 24)
Koivunen 2008	15/14	CSF p‐tau 9/14 (64%)	≥ 70 pg/mL (Yes)	5/14 (36%) FP = 7; FN = 3	40%	22%	2 years
Koivunen 2008	15/14	CSF p‐tau/ABeta ratio 9/14 (64%)	˂ 6.5 pg/mL (yes)	5/14 (36%) FP = 6; FN = 1	80%	33%	2 years
Monge‐Argiles 2011	37/37	CSF t‐tau 16/37 (43%)	≥ 77.5 pg/mL (No)	11/37 (28%) FP = 8; FN = 3	73%	69%	6 months
		CSF p‐tau 20/37 (54%)	≥ 54.5 pg/mL (No)	11/37 (28%) FP = 11; FN = 2	82%	58%
		CSF p‐tau/ABeta ratio 18/37 (49%)	0.17 (No)	11/37 (28%) FP = 9; FN = 2	82%	66%
		CSF t‐tau/ABeta ratio 23/37 (62%)	0.18 (No)	11/37 (28%) FP = 13; FN = 1	91%	50%
Palmqvist 2013	133/133	CSF t‐tau 65/133 (49%)	> 87 pg/mL (No)	52/133 (39%) FP = 23; FN = 10	81%	72%	mean 5.9 years (range 3.2 to 8.8 years)
Palmqvist 2013	133/133	CSF p‐tau 46/133 (34%)	> 39 pg/mL (No)	52/133 (39%) FP = 11; FN = 17	67%	86%	mean 5.9 years (range 3.2 to 8.8 years)
Parnetti 2012	90/90	CSF p‐tau/ABeta ratio 29/90 (32%)	1074.0 (No)	32/90 (35%) FP = 3; FN = 6	81%	95%	maximum: 4 years; mean 3.40 ± 1.01 years
Visser 2009	168/158	CSF p‐tau 108/158 (68%)	≥ 51 pg/mL (used in clinical practice) (No)	35/158 (22%) FP = 77; FN = 4	88%	37%	range 1 to 3 for MCI
		CSF p‐tau 45/158 (28%)	≥ 85pg/mL (> 90th percentile of controls after correction for age) (No)	35/158 (22%) FP = 25; FN = 15	57%	80%
		CSF p‐tau/ABeta ratio 77/158 (49%)	˂ 9.92 (< 10th percentile of reference group after correction for age) (No)	35/158 (22%); FP = 49; FN = 7	80%	60%
Vos 2013	231/214	CSF t‐tau 93/214 (43%)	> 450 pg/mL for age less than 70 years; > 500 pg/mL for age older than 70 years (Yes)	91/214 (42%) FP = 28; FN = 26	71%	77%	mean 2.5 ± 1.0 years
Vos 2013	231/214	CSF t‐tau/ABeta ratio 147/214 (69%)	ABeta1–42/(240 1 [1.18 3 t‐tau]) ˂ 1.0 (Yes)	91/214 (42%) FP = 60; FN = 4	96%	51%	mean 2.5 ± 1.0 years

^{AD: Alzheimer's disease; FN: false negative; FP: false positive; MCI: mild cognitive impairment}

^{*Studies involved the same participants. Only Hansson 2006 is included in the meta‐analysis}

Excluded studies

Twenty‐nine studies, nine of which were ADNI studies, were excluded as they failed to meet the inclusion criteria for participants, index test, target condition, or they didn't have diagnostic accuracy data (Characteristics of excluded studies). We contacted the authors of two of the ADNI studies (Landau 2010; Westman 2012) in order to obtain additional data for creating two by two tables. Further information was not available for the Landau 2010 study at the time this review was prepared. The author of the Westman 2012 study informed us that the accuracy of combined, not individual, CSF biomarkers was assessed in their study.

Studies awaiting classifications

The Characteristics of studies awaiting classification table lists the characteristics of four studies which might be considered for the inclusion in an updated review. The authors of all those studies need to be contacted in order to obtain missing data/relevant information. Regarding the target condition ‘Conversion from MCI to Alzheimer’s disease’, provisional data from two studies (Ewers 2012; Leuzy 2015) might be used for the analysis of CSF t‐tau; data for the analysis of CSF p‐tau ABeta42/p‐tau ratio index tests might be available only from Ewers 2012 and Balasa 2014, respectively.

Additional Tables: Table 1 shows that the percentage of converters to Alzheimer’s disease dementia ranged from 36% to 47%. Duration of follow‐up was between 24 and 41 months. Leuzy 2015 did not report duration of follow‐up and Ewers 2012 did not report a threshold value. The heterogeneity of results in these four studies was consistent with that observed in the fully incorporated studies.

Methodological quality of included studies

Methodological quality was assessed using the QUADAS‐2 tool (Whiting 2011).

Review authors’ judgements about each methodological quality item for each included study are presented in the Characteristics of included studies table and Figure 2. The overall methodological quality of included study cohorts is summarised in Figure 3.

Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Figure 3

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies

In the participant selection domain, we considered five studies (Eckerstrom 2010; Hampel 2004; Herukka 2007; Kester 2011; Koivunen 2008) to be at high risk of bias because the participants were not consecutively or randomly enrolled or both the sampling procedure and exclusion criteria were not described. We stated that all included studies avoided a case‐control design because we only considered data on performance of the index test to discriminate between people with MCI who converted to dementia and those who remained stable. We considered four studies (Amlien 2013; Buchhave 2012; Galluzzi 2010; Hansson 2006) to be at low risk of bias. We considered the remaining six studies to be at unclear risk of bias, due to poor reporting on sampling procedure or exclusion criteria

In the index test domain, we considered eight studies (Buchhave 2012; Eckerstrom 2010; Fellgiebel 2007; Hampel 2004; Hansson 2006; Monge‐Argiles 2011; Palmqvist 2012; Parnetti 2012) to be at high risk of bias because the threshold used was not prespecified and the optimal cutoff level was determined from ROC analyses; therefore, the accuracy of the CSF biomarkers reported in these studies appeared to be overestimated. We considered two studies (Amlien 2013; Galluzzi 2010) to be at unclear risk of bias, due to poor reporting. We considered the remaining five studies to be at low risk of bias.

In the reference standard domain, we considered nine studies (Amlien 2013; Eckerstrom 2010; Fellgiebel 2007; Galluzzi 2010; Hampel 2004; Kester 2011; Koivunen 2008; Monge‐Argiles 2011; Parnetti 2012) to be at unclear risk of bias, mainly because it was not reported whether clinicians conducting follow‐up were aware of initial CSF biomarker analysis results. Three of those nine studies did not clearly report the reference standards used for diagnosing Alzheimer’s disease dementia. We were not able to obtain the information about how the reference standard was obtained and by whom, due to poor reporting. We considered the remaining six studies to be at low risk of bias.

In the flow and timing domain, we judged nine studies (Amlien 2013; Eckerstrom 2010; Fellgiebel 2007; Galluzzi 2010; Koivunen 2008; Monge‐Argiles 2011; Parnetti 2012; Visser 2009; Vos 2013) to be at unclear risk of bias because not all participants were included in the analysis and/or the follow‐up period was shorter than one year and/or reporting was poor. We judged three studies (Galluzzi 2010; Hansson 2006; Kester 2011) to be at high risk of bias because a large number of participants with non‐Alzheimer's disease dementia were excluded from the analysis. We considered the remaining three studies to be at low risk of bias.

For assessment of applicability concerns, for the majority of the studies there was no concern that the included participants and setting, the conduct and interpretation of the index test, and the target condition (as defined by the reference standard) in each of the included studies did not match the review question. We judged two studies (Eckerstrom 2010; Koivunen 2008) to be of unclear applicability because of concerns regarding the participant characteristics or setting. We also judged four studies (Amlien 2013; Eckerstrom 2010; Fellgiebel 2007; Parnetti 2012) to be of unclear applicability because of concerns with respect to the reference standard.

It should be noted that the lack of concern about applicability of the three domains mentioned above was based on the inclusion criteria set in the review, and therefore the judgment about applicability may be overstated.

Findings

The key characteristics of each study are summarised in Additional Tables: Table 2 and Table 3. Included studies used a range of different thresholds. The number of positive CSF index tests at baseline varied across studies. The summary of main results for the fifteen included studies is presented in summary of findings Table.

Table 3. Conversion from MCI to All dementia

Included studies, index test and test accuracy at study level for conversion from MCI to All dementias
Study	Participants n/N (included in analysis)	Index test (Number and % of positive tests)	Threshold (test abnormal) (pre‐specified Yes / No)	Number of converters (%) FP and FN	Test accuracy at study level		Duration of follow‐up
Study	Participants n/N (included in analysis)	Index test (Number and % of positive tests)	Threshold (test abnormal) (pre‐specified Yes / No)	Number of converters (%) FP and FN	Sensitivity	Specificity	Duration of follow‐up
Eckerstrom 2010	42/42	CSF t‐tau 15/42 (36%)	≥ 500 ng/L (No)	21/42 (50%) FP = 1 FN = 7	67%	95%	Total sample: 19.6 ± 9.0 months; MCI‐MCI: 19.5 ± 9.3 months; MCI‐progressive: 17.6 ± 8.8 months (4/8 MCI‐AD: 23.7 ± 2.0 months)
Galluzzi 2010	90/64	CSF t‐tau 24/64 (37.5%)	> 450 pg/mL for subjects with an age range between 51 and 70 determined; > 500 pg/mL for subjects with an age range between 71 and 93 (Yes)	34/64 (53%) FP = 5 FN = 15	56%	83%	Total sample: 8.4 ± 5.1 months (range 2 to 24 months); follow‐up interval for converters was 9.6 ± 5.4, and for non‐converters 7.0 ± 4.3 months
Hansson 2006	137/134	CSF t‐tau 38/134 (28%)	> 350 pg/mL (No)	78/134 (58%) FP = 5 FN = 45	42%	91%	Total sample: median 5.2 years (range 4.0 to 6.8); MCI‐AD: median: 4.3 years (range 1.1 to 6.7); MCI‐other dementias: median 4.2 (1.5 to 6.3)
Herukka 2007	79/79	CSF t‐tau 43/79 (54%)	> 400 pg/mL (Yes)	33/79 (42%) FP = 17 FN = 7	79%	63%	Mean 3.52 ± 1.95 years in MCI converters; mean 4.56 ± 3.09 years in MCI‐stable

^{AD: Alzheimer's disease; FN: false positive; FP: false negative; MCI: mild cognitive impairment}

CSF t‐tau for Alzheimer's disease dementia

Individual study estimates of sensitivity and specificity are shown in Figure 4 for each of the seven studies (291 cases and 418 non‐cases) that evaluated Alzheimer’s disease dementia. The sensitivity values ranged from 51% to 90% while the specificity values ranged from 48% to 88%. The thresholds used ranged from ≥ 77 to ≥ 500 pg/mL (ng/L).

Figure 4

Forest plot of 1 CSF t‐tau conversion to AD dementia.

The summary ROC curve summarising the accuracy of CSF t‐tau across the seven studies is shown in Figure 5. Because of the variation in thresholds, we did not estimate a summary sensitivity and specificity. However, we derived estimates of sensitivity and likelihood ratios at fixed values of specificity from the HSROC model we fitted to produce the summary ROC curve. At the median specificity of 72%, the estimated sensitivity was 77% (95% CI 67 to 85), the positive likelihood ratio was 2.72 (95% CI 2.43 to 3.04), and the negative likelihood ratio was 0.32 (95% CI 0.22 to 0.47).

Figure 5

Summary ROC Plot of 1 CSF t‐tau conversion to AD dementia.

At the median specificity (72%) and the median prevalence of Alzheimer's disease dementia (37%) (pretest probability, Figure 6), the positive predictive value was 62%, which means on average 62 out of 100 people with MCI and a positive index test result would convert to Alzheimer's disease dementia, but 38 would not. The negative predictive value of 84% means that on average 84 out of 100 people with MCI and with a negative index test result would not convert to Alzheimer's disease dementia, but 16 would.

Figure 6

Post‐test probability plots (Analysis 1): Conversion from MCI to Alzheimer’s disease for CSF t‐tau as a diagnostic test

In a hypothetical cohort of 100 people with MCI taking the CSF t‐tau test, there would be on average nine false negatives (participants who convert but incorrectly tested negative) and 18 false positives (participants who did not convert but incorrectly tested positive) (summary of findings Table).

CSF p‐tau for Alzheimer's disease dementia

Six studies (164 cases and 328 non‐cases) evaluated the accuracy of CSF p‐tau for conversion to Alzheimer’s disease dementia (Figure 7). The sensitivities were between 40% and 100%, while the specificities were between 22% and 86%. The thresholds used ranged from ≥ 39 to ≥ 85 pg/mL (ng/L).

Figure 7

Forest plot of 2 CSF p‐tau conversion to AD dementia.

Figure 8 shows the summary ROC space. We derived the summary estimates at different points on the fitted HSROC curve. At the median specificity of 48%, the estimated sensitivity was 81% (95% CI 64 to 91), the positive likelihood ratio was 1.55 (CI 1.31 to 1.84), and the negative likelihood ratio was 0.39 (CI 0.19 to 0.82).

Figure 8

Summary ROC Plot of 2 CSF p‐tau conversion to AD dementia.

At the median specificity (48%) and the median prevalence of Alzheimer's disease dementia (37%) (pretest probability, Figure 9), the positive predictive value was 48%, which means on average 48 out of 100 people with MCI and with a positive index test result would convert to Alzheimer's disease dementia but 52 would not. The negative predictive value of 81% means that on average 81 out of 100 people with MCI with a negative index test result would not convert to Alzheimer's disease dementia, but 19 would.

Figure 9

Post‐test probability plots (Analysis 2): Conversion from MCI to Alzheimer’s disease for CSF p‐tau as a diagnostic test

In a hypothetical cohort of 100 people with MCI taking the CSF p‐tau test, there would be on average seven false negatives (participants who convert but incorrectly tested negative) and 33 false positives (participants who did not convert but incorrectly tested positive) (summary of findings Table).

CSF p‐tau/ABeta ratio for Alzheimer's disease dementia

Five studies (140 cases and 293 non‐cases) evaluated the accuracy of the CSF p‐tau/ABeta ratio for conversion to Alzheimer’s disease dementia (Figure 10). The sensitivities were between 80% and 96%, while the specificities were between 33% and 95%. We were not able to report the range of thresholds due to different measurements: < 6.6 pg/mL (ng/L); 0.18; 1074.0; < 9.92. Figure 11 shows the summary ROC space. We did not conduct a meta‐analysis because the studies were few and small.

Figure 10

Forest plot of 3 CSF p‐tau/ABeta ratio to AD dementia.

Figure 11

Summary ROC Plot of 3 CSF p‐tau/ABeta ratio to AD dementia.

CSF t‐tau/ABeta ratio for Alzheimer's disease dementia

Only two studies (Monge‐Argiles 2011; Vos 2013) evaluated the accuracy of the CSF t‐tau/ABeta ratio for conversion to Alzheimer’s disease dementia.The sensitivities were 50% and 51%, and specificities were 91% and 96%, respectively. We were not able to conduct the meta‐analysis.

CSF t‐tau for all forms of dementia (combined Alzheimer's disease dementia and non‐Alzheimer's disease dementia)

Only four studies (166 cases and 153 non‐cases) evaluated the accuracy of CSF t‐tau for conversion to all forms of dementia (Figure 12 and Figure 13). The sensitivity values ranged from 42% to 79%, while the specificity values ranged from 63% to 95%. The thresholds used ranged from ˃ 350 to ≥ 500 pg/mL (ng/L). As above, we did not conduct a meta‐analysis because the studies were few and small.

Figure 12

Forest plot of 4 CSF t‐tau conversion to all forms of dementia.

Figure 13

Summary ROC Plot of 4 CSF t‐tau conversion to All dementias.

Investigation of heterogeneity

We were not able to formally assess the effects of each potential source of heterogeneity as planned, due to the small number of studies available to be included.

Sensitivity analyses

Due to the limited number of studies evaluating each of four CSF biomarkers for Alzheimer’s disease dementia or other types of dementia, we did not perform any sensitivity analyses, as planned.

Discussion

We performed a review of the available evidence on the diagnostic accuracy of CSF biomarker levels for detecting Alzheimer's disease pathology in people with MCI, and identifying those MCI participants who would convert to Alzheimer’s disease dementia or other forms of dementia over time. In the absence of a contemporaneous reference standard for Alzheimer's disease diagnosis relative to the application of the index test, the decision to use a delayed verification design was taken for all DTA reviews by our group. This, however, creates problems when the length of follow‐up in studies varies, as the longer a study, in a chronic disorder where age is the principal risk factor, could create false positive findings. To address this, length of follow‐up was collected to help interpret between‐study variations in accuracy.

There is, however, a paucity of evidence in relation to the accuracy of CSF biomarkers. Where data were available for conversion to Alzheimer’s disease dementia, there was a wide range of sensitivity (51% to 90%; 40% to 100%; 80% to 96%) and specificity (48% to 88%; 22% to 86%; 33% to 95%) values for the CSF t‐tau, CSF p‐tau and CSF p‐tau/ABeta ratio index tests, respectively.

Due to the wide variations in thresholds, we did not estimate a summary sensitivity and specificity. Although, subject to considerable uncertainty of a statistical approach, in order to illustrate the potential strengths and weaknesses of CSF biomarker levels we estimated from the fitted summary ROC curve that the sensitivity was 77% (95% CI 67 to 85) and 81% (95% CI 64 to 91) at the included study median specificity of 72% and 48% for the CSF t‐tau and CSF p‐tau respectively. Assuming a conversion rate of MCI to Alzheimer’s dementia of 37%, for every 100 CSF t‐tau level, nine individuals with a negative test would progress and 18 with a positive test would not progress to Alzheimer’s dementia; for every 100 CSF p‐tau level, seven individuals with a negative test would progress and 33 with a positive test would not progress to Alzheimer’s dementia. The estimation of predictive values and consequences in a cohort of 100 (‘missed cases’ and ‘over‐diagnosed’) were based on hypothetical sensitivity and specificity values for which the threshold of the test is unknown; therefore, these findings should be interpreted with caution.

We were not able to evaluate the accuracy of CSF biomarkers for conversion from MCI to other forms of dementia (non‐Alzheimer’s disease dementia). As a result of the information available from four studies (Eckerstrom 2010; Galluzzi 2010; Hansson 2006; Herukka 2007), we evaluated the accuracy of CSF t‐tau for conversion to all types of dementia (combined Alzheimer's disease dementia and non‐Alzheimer’s disease dementia). The sensitivity values ranged from 42% to 79% while the specificity values ranged from 63% to 95%. We did not conduct a meta‐analysis because the studies were few and small.

Previous reviews of tests of amyloid in CSF and plasma (Ritchie 2014) and evidenced through PET imaging (Zhang 2014) have been published. They highlighted that as a test, there was consistently better sensitivity than specificity whereby the absence of evidence of amyloid pathology (low levels in CSF and high levels in the cortices) was likely to exclude a later diagnosis of Alzheimer's disease dementia, whereas the presence of amyloid pathology did not add much incremental benefit to diagnostic accuracy. Considering the findings of this systematic review, we have demonstrated again that the NPV is greater than the PPV which is a reflection of the higher sensitivity of these tests compared to their specificity. That is, a test indicating absence of biomarker abnormality and hence suggesting absence of disease is of more value than a positive biomarker indicating disease. CSF biomarkers are better at ruling out Alzheimer’s disease than ruling it in as a cause of the clinical symptoms, and therein progression to Alzheimer’s dementia in people described as having MCI. However, the reported optimal thresholds in individual papers tended to yield better sensitivities than specificities and this was reflected in our sROC analysis; therefore, those results should be interpreted with caution.

Given the insufficient evidence to evaluate the diagnostic value in MCI of CSF t‐tau, CSF p‐tau and the CSF p‐tau/ABeta ratio for Alzheimer's disease dementia and other dementias examined in this review, particular attention should be paid to the risk of misdiagnosis and overdiagnosis of dementia (and therefore overtreatment) in clinical practice. Our findings are consistent with the expert opinion conveyed by Molinuevo et al (Molinuevo 2014) where it was recognised that negative tests results were more clinically useful than positive ones. They still saw a routine use for these tests in clinical practice, and our review will help describe the degree of accuracy to help inform clinicians using this test in their current practice. As sensitivity of this test was better than specificity, the risk of a missed diagnosis, or a false‐negative test was lower. False reassurance given to a patient that they don't have or will not get Alzheimer's dementia would also have serious clinical consequences; however, appropriate pretest counselling for what can and cannot be revealed through CSF testing would mitigate the risk of an inappropriate level of salience being afforded to this particular test.

Summary of main results

In total, 1282 participants with MCI at baseline were identified in the fifteen included studies, of which 1172 had analysable data; 430 participants converted to Alzheimer’s disease dementia and 130 participants to other forms of dementia at follow‐up. It was possible to undertake a summary analysis of the CSF t‐tau and p‐tau markers but not the ratio, as too few studies presented results for the ratio. Consistent with the findings from the amyloid reviews, CSF t‐tau and p‐tau were reasonably sensitive tests for later diagnosis of Alzheimer's disease dementia, but had poor specificity. This is illustrated in Figure 6 and Figure 9 where the small positive likelihood ratio for both CSF t‐tau and p‐tau has very little impact on the change from pretest probability to post‐test probability. With respect to the CSF t‐tau/Abeta ratio, it was not possible to generate likelihood ratios, due to only one study (Monge‐Argiles 2011) reporting data . However, from Figure 10, it can be seen that for all but one study (Parnetti 2012), the sensitivity exceeded the specificity for the p‐tau/ABeta ratio. Figure 12 though demonstrates across four studies, that the specificity of CSF t‐tau is improved when the outcome is 'all forms of dementia', suggesting that the elevation of tau is a nonspecific marker of neurodegeneration and not tightly tethered to Alzheimer's disease pathology.

Our findings were based on studies with poor reporting and most included studies had an unclear risk of bias, mainly for reference standard and participant selection domains. Nine studies (56%) had unclear risk of bias for the flow and timing domain, mainly due to not including all participants in the analysis or inappropriate duration of the follow‐up period. According to the assessment of the index test domain, 50% of studies were of poor methodological quality.

The main sources of heterogeneity were thought likely to be index test thresholds, reference standards used for the target disorders, sources of recruitment, participant sampling and aspects of study quality (particularly inadequate blinding). We were not able to formally assess the effects of each potential source of heterogeneity, as planned, due to the small number of studies available to be included.

Strengths and weaknesses of the review

There were a number of strengths to this review. This review was conducted in adherence to the inclusion criteria and methods described in a published protocol (Ritchie 2011). We searched a number of electronic databases, using an extensive range of appropriate database indexing terms and equivalent text words covering the index test, how it was measured, and the target condition. The multi‐stranded search approach that we adopted to combine different search concepts in searches run in parallel, some including a more specific diagnostic component, has successfully increased the overall sensitivity of the search and is a strength of this review. Our searches were not limited by language. We contacted 12 study authors and usable data were obtained for five studies (Amlien 2013; Galluzzi 2010; Hansson 2006; Visser 2009; Vos 2013).

There were, however, also a number of limitations to this review. There was limited published information and substantial variation in the quality of the papers and caution is needed when interpreting these findings. Most included studies provided little data on participants at baseline. Several studies reported high or unclear dropout and withdrawal rates. Studies also contained wide variations in thresholds. It is also a weakness of the review that variability in length of follow‐up in the various cohorts was so great. It would stand to reason that a longer follow‐up period would more likely yield more cases of dementia, given that age is the principal risk factor for dementia. On the other hand, short follow‐up periods might increase false negative results. This topic is of great interest to the field where determination of proximal and distal biomarkers are being considered. In an MCI population presenting to a clinician, it is the question of proximity to a decline to dementia which is the most relevant; in this regard, follow‐up periods of over five years lose clinical meaningfulness. Standardisation of the follow‐up period would help reviews like this; this has been suggested in our group's recent STARDdem proposals (Noel‐Storr 2014). In our review, we were unable to formally test what affect length of follow‐up had on the accuracy of the test. The various contributors to the heterogeneity across the studies may affect the study results. Given the poor reporting within the included studies, it is difficult to determine the underlying difference or differences among the included studies. This highlights a shortfall of large‐scale, high‐quality empirical research conducted in this area. Future studies should provide clearer reporting of the participants, equipment, usage and the implications of implementing the tests. As the current research area is rapidly changing, further research exploring the impact of the CSF p‐tau/ABeta ratio on clinical outcomes is needed. To this end, we conducted a very recent literature review which revealed four new studies that will be fully incorporated in our next planned update. These four studies demonstrated the same between‐study heterogeneity in results and methodology that we had observed in the included studies, with the implication that there will not be an impact of incorporation on our existing conclusions.

Applicability of findings to the review question

These findings can be considered a reasonable answer to the question being set in this review. Caution, though, should still apply because of the quality and reporting issues highlighted from the included papers and the small data set. This is especially true when drawing conclusions from the analysis of the p‐tau/ABeta ratio. This is particularly important as it this ratio that is often favoured in clinical practice as being most accurate. However, this review and the previous published reviews of amyloid tests and Alzheimer's disease pathology consistently demonstrate reasonable sensitivity and poor specificity; accordingly, it is likely that the ratio of two sensitive tests will generate greater sensitivity than specificity.

Figure 1

Study flow diagram

Note: a top‐up search performed in December 2015 revealed 6134 records

85 records retained after de‐duplication and assessment by one experienced reviewer

81 records excluded after further assessment performed by two review authors

4 studies identified for possible inclusion (Characteristics of studies awaiting classification)

Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Figure 3

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies

Figure 4

Forest plot of 1 CSF t‐tau conversion to AD dementia.

Figure 5

Summary ROC Plot of 1 CSF t‐tau conversion to AD dementia.

Figure 6

Post‐test probability plots (Analysis 1): Conversion from MCI to Alzheimer’s disease for CSF t‐tau as a diagnostic test

Figure 7

Forest plot of 2 CSF p‐tau conversion to AD dementia.

Figure 8

Summary ROC Plot of 2 CSF p‐tau conversion to AD dementia.

Figure 9

Post‐test probability plots (Analysis 2): Conversion from MCI to Alzheimer’s disease for CSF p‐tau as a diagnostic test

Figure 10

Forest plot of 3 CSF p‐tau/ABeta ratio to AD dementia.

Figure 11

Summary ROC Plot of 3 CSF p‐tau/ABeta ratio to AD dementia.

Figure 12

Forest plot of 4 CSF t‐tau conversion to all forms of dementia.

Figure 13

Summary ROC Plot of 4 CSF t‐tau conversion to All dementias.

Test 1

CSF t‐tau conversion to AD dementia.

Test 2

CSF p‐tau conversion to AD dementia.

Test 3

CSF p‐tau/ABeta ratio to AD dementia.

Test 4

CSF t‐tau conversion to All dementias.

Summary of findings Performance of CSF biomarkers in early diagnosis of dementia

What is the diagnostic accuracy of CSF biomarker levels for detecting Alzheimer's disease pathology in people with mild cognitive impairment (MCI), and identifying those MCI participants who would convert to Alzheimer’s disease dementia or other forms of dementia over time
Descriptive
Patient population	Participants diagnosed with MCI at baseline using any of the Petersen criteria or CDR = 0.5 or any 16 definitions included by Matthews (Matthews 2008)
Sampling procedure	Consecutive or random (n = 5) Not consecutive or random (n = 3) Unclear (n = 7)
Sources of recruitment	University memory clinic (n = 8); European multicentre memory clinics (n = 2); inpatients (n = 2); General Hospital memory clinic (n = 1); Research centre outpatient memory clinic (n = 1); not reported (n = 1)
Prior testing	The only testing prior to performing the plasma and CSF biomarkers was the application of diagnostic criteria for identifying participants with MCI.
MCI criteria	Petersen criteria (n = 14) Global Deterioration Scale (GDS) (n = 1)
Index tests	CSF t‐tau or CSF p‐tau or CSF p‐tau/ABeta ratio or CSF t‐tau/ABeta ratio
Reference standard	NINCDS‐ADRDA and/or DSM and/or ICD criteria for Alzheimer's disease dementia (n = 12); Global Dementia Scale (GDS) & Research criteria (n = 1); CDR = 1 criteria (n = 1); not specified (n = 1) McKeith criteria for Lewy body dementia; Lund criteria for frontotemporal dementia; and NINDS AIREN criteria for vascular dementia
Target condition	Alzheimer’s disease dementia or any other types of dementia
Included studies	Prospectively well‐defined cohorts of MCI participants (n = 7), nested case‐control studies with a prospectively defined MCI group (n = 6) and studies with a retrospectively defined MCI group with longitudinal data (n = 2). Fifteen studies (N = 1282 participants) were included. Number included in analysis: 1172
Quality concerns	Patient selection and conduct of the reference standard were poorly reported. Applicability concerns were generally low. Regarding the inclusion criteria set in the review, the majority of included studies did match the review question: 'Could CSF t‐tau and CSF t‐tau/ABetaratio biomarkers identify those MCI participants with Alzheimer’s disease pathology at baseline who would convert clinically to dementia at follow up?' However, due to a limited number of included studies and levels of heterogeneity, it is difficult to determine to what extent the findings from a meta‐analysis can be applied to clinical practice.
Limitations	Limited investigation of heterogeneity due to insufficient number of studies. There was a lack of common thresholds.
Test Median percentage converting (range) ²	Studies	Cases/participants	Median specificity from included studies	Sensitivity (95% CI)¹ at median specificity	Consequences in a cohort of 100
					Median percentage converting²	Missed cases	Overdiagnosed
Alzheimer's disease dementia
CSF t‐tau	7	436/709	72	77 (67, 85)	37	9	18
Alzheimer's disease dementia
CSF p‐tau	6	164/492	47.5	81 (64, 91.5)	37	7	33
Alzheimer's disease dementia
CSF p‐tau/ ABeta ratio	5	140/433	No meta‐analysis	No meta‐analysis
All types of dementia
CSF t‐tau	4	166/319	No meta‐analysis	No meta‐analysis
Investigation of heterogeneity: the planned investigations were not possible due to the limited number of studies available for each analysis. We were unable to investigate the effect of duration of follow‐up due to substantial variation in length and reporting.
Conclusions: Given the insufficient evidence to evaluate the diagnostic value in MCI of CSF t‐tau, CSF p‐tau, CSF t‐tau/ABeta ratio and CSF p‐tau/ABeta ratio for Alzheimer's disease dementia and other forms of dementias examined in this review, particular attention should be paid to the risk of misdiagnosis and overdiagnosis of dementia (and therefore overtreatment) in clinical practice. Future studies with more uniform approaches to thresholds, analysis and study conduct may provide a more homogenous estimate than the one that has been available from the included studies we have identified.
¹Meta‐analytic estimate of sensitivity derived from the HSROC model at a fixed value of specificity. Summary estimates of sensitivity and specificity were not computed because the studies that contributed to the estimation of the summary ROC curve used different thresholds. ²The median percentage converting was calculated using all the studies that reported 'conversion from MCI to Alzheimers' disease dementia' (Table 2)

Summary of findings Performance of CSF biomarkers in early diagnosis of dementia

Table 1. Studies awaiting classification

Conversion from MCI to Alzheimer’s disease dementia
Study	Participants n/N (included in analysis)	Index test (number and % of positive tests)	Threshold (test abnormal) (prespecified Yes/No)	Number of converters (%) FP and FN	Test accuracy at study level		Duration of follow‐up
					Sensitivity	Specificity
*Balasa 2014	51/51	CSF ABeta42/p‐tau ratio 25/51 (49%)	< 6.43 (Yes)	24/51 (47%) FP =1; FN =0	100%	96%	41 months for MCI‐AD; 30 months for MCI‐MCI
*Ewers 2012	130/130	CSF t‐tau 65/130 (50%)	Not reported	58/130 (45%) FP = 30; FN = 23	60.7%	58.9%	24 months
		CSF p‐tau 67/130 (51.5%)	Not reported	58/130 (45%) FP = 30; FN = 21	63.9%	58.9%
*Leuzy 2015	33/33	CSF t‐tau 15/33 (45%)	˃ 400 pg/mL(Yes)	12/33 (36%) FP = 7; FN = 4	67%	67%	Not reported
		CSF t‐tau/ABeta ratio 12/33 (36%)	< 1.14 (Yes)	12/33 (36%) FP = 6; FN = 6	50%	71%
Conversion from MCI to all dementias
*Eckerstrom 2015	73/73	CSF p‐tau 15/73 (20.5%)	73 pg/mL (No)	27/73 (36.9%) FP = 3; FN = 15	75%	92%	43.1 ± 23 months MCI‐stable; 33.7 ± 24 months MCI converters
Study awaiting translation
Urakami 2004
^{AD: Alzheimer's disease; FN: false negative; FP: false positive; MCI: mild cognitive impairment} ^{*Authors need to be contacted in order to obtain missing data/relevant information. Data presented are provisional.}

Table 1. Studies awaiting classification

Table 2. Conversion from MCI to Alzheimer's disease dementia

Included studies, index test and test accuracy at study level for conversion from MCI to Alzheimer’s disease dementia
Study	Participants n/N (included in analysis)	Index test (number and % of positive tests)	Threshold (test abnormal) (prespecified Yes/No)	Number of converters (%) FP and FN	Test accuracy at study level		Duration of follow‐up
Study	Participants n/N (included in analysis)	Index test (number and % of positive tests)	Threshold (test abnormal) (prespecified Yes/No)	Number of converters (%) FP and FN	Sensitivity	Specificity	Duration of follow‐up
Amlien 2013	49/39	CSF t‐tau 9/39 (23%)	≥ 300 ng/L for age younger than 50 years; ≥ 450 ng/L for age 50 to 69 years; ≥ 500 ng/L for age older than 70 years (Sjogren 2001) (Yes)	9/39 (23%); FP = 4; FN = 4	56%	87%	mean 2.6 ± 0.5 years (range 1.6 to 4 years)
Buchhave 2012*	137/134	CSF p‐tau/ABeta ratio 69/134 (51%)	˂ 6.2 ng/L (No)	72/134 (54%) FP = 6; FN = 9	88%	90%	median: 9.2 years (range 4 to 12 years)
Fellgiebel 2007	16/16	CSF p‐tau 12/16 (75%)	≥ 50 pg/mL (No)	4/16 (25%) FP = 8; FN = 0	100%	33%	mean 19.6 ± 9.0 months
Hampel 2004	52/52	CSF t‐tau 38/52 (73%)	≥ 479 ng/L (No)	29/52 (56%); FP = 12; FN = 3	90%	48%	mean 8.4 ± 5.1 months (range 2 to 24 months)
Hansson 2006*	137/134	CSF t‐tau 38/134 (28%)	> 350 ng/L (No)	57/134 (42%); FP = 9; FN = 28	51%	88%	Total sample: median 5.2 years (range 4.0 to 6.8 years); MCI‐AD: median: 4.3 years (range 1.1 to 6.7 years) MCI‐other dementias: median 4.2 years (range 1.5 to 3 years)
		CSF p‐tau 50/134 (37%)	≥ 60 ng/L (No)	57/134 (42%); FP = 11; FN = 18	68%	86%
		CSF p‐tau/ABeta ratio 74/134 (55%)	˂ 6.5 ng/L (No)	57/134 (42%); FP = 19; FN = 2	96%	75%
Kester 2011	153/100	CSF t‐tau 64/100 (64%)	> 356 pg/mL (Yes)	42/100 (42%) FP = 29; FN = 7	83%	50%	median 18 months (IQR 13 ‐ 24)
Koivunen 2008	15/14	CSF p‐tau 9/14 (64%)	≥ 70 pg/mL (Yes)	5/14 (36%) FP = 7; FN = 3	40%	22%	2 years
Koivunen 2008	15/14	CSF p‐tau/ABeta ratio 9/14 (64%)	˂ 6.5 pg/mL (yes)	5/14 (36%) FP = 6; FN = 1	80%	33%	2 years
Monge‐Argiles 2011	37/37	CSF t‐tau 16/37 (43%)	≥ 77.5 pg/mL (No)	11/37 (28%) FP = 8; FN = 3	73%	69%	6 months
		CSF p‐tau 20/37 (54%)	≥ 54.5 pg/mL (No)	11/37 (28%) FP = 11; FN = 2	82%	58%
		CSF p‐tau/ABeta ratio 18/37 (49%)	0.17 (No)	11/37 (28%) FP = 9; FN = 2	82%	66%
		CSF t‐tau/ABeta ratio 23/37 (62%)	0.18 (No)	11/37 (28%) FP = 13; FN = 1	91%	50%
Palmqvist 2013	133/133	CSF t‐tau 65/133 (49%)	> 87 pg/mL (No)	52/133 (39%) FP = 23; FN = 10	81%	72%	mean 5.9 years (range 3.2 to 8.8 years)
Palmqvist 2013	133/133	CSF p‐tau 46/133 (34%)	> 39 pg/mL (No)	52/133 (39%) FP = 11; FN = 17	67%	86%	mean 5.9 years (range 3.2 to 8.8 years)
Parnetti 2012	90/90	CSF p‐tau/ABeta ratio 29/90 (32%)	1074.0 (No)	32/90 (35%) FP = 3; FN = 6	81%	95%	maximum: 4 years; mean 3.40 ± 1.01 years
Visser 2009	168/158	CSF p‐tau 108/158 (68%)	≥ 51 pg/mL (used in clinical practice) (No)	35/158 (22%) FP = 77; FN = 4	88%	37%	range 1 to 3 for MCI
		CSF p‐tau 45/158 (28%)	≥ 85pg/mL (> 90th percentile of controls after correction for age) (No)	35/158 (22%) FP = 25; FN = 15	57%	80%
		CSF p‐tau/ABeta ratio 77/158 (49%)	˂ 9.92 (< 10th percentile of reference group after correction for age) (No)	35/158 (22%); FP = 49; FN = 7	80%	60%
Vos 2013	231/214	CSF t‐tau 93/214 (43%)	> 450 pg/mL for age less than 70 years; > 500 pg/mL for age older than 70 years (Yes)	91/214 (42%) FP = 28; FN = 26	71%	77%	mean 2.5 ± 1.0 years
Vos 2013	231/214	CSF t‐tau/ABeta ratio 147/214 (69%)	ABeta1–42/(240 1 [1.18 3 t‐tau]) ˂ 1.0 (Yes)	91/214 (42%) FP = 60; FN = 4	96%	51%	mean 2.5 ± 1.0 years
^{AD: Alzheimer's disease; FN: false negative; FP: false positive; MCI: mild cognitive impairment} ^{*Studies involved the same participants. Only Hansson 2006 is included in the meta‐analysis}

Table 2. Conversion from MCI to Alzheimer's disease dementia

Table 3. Conversion from MCI to All dementia

Included studies, index test and test accuracy at study level for conversion from MCI to All dementias
Study	Participants n/N (included in analysis)	Index test (Number and % of positive tests)	Threshold (test abnormal) (pre‐specified Yes / No)	Number of converters (%) FP and FN	Test accuracy at study level		Duration of follow‐up
Study	Participants n/N (included in analysis)	Index test (Number and % of positive tests)	Threshold (test abnormal) (pre‐specified Yes / No)	Number of converters (%) FP and FN	Sensitivity	Specificity	Duration of follow‐up
Eckerstrom 2010	42/42	CSF t‐tau 15/42 (36%)	≥ 500 ng/L (No)	21/42 (50%) FP = 1 FN = 7	67%	95%	Total sample: 19.6 ± 9.0 months; MCI‐MCI: 19.5 ± 9.3 months; MCI‐progressive: 17.6 ± 8.8 months (4/8 MCI‐AD: 23.7 ± 2.0 months)
Galluzzi 2010	90/64	CSF t‐tau 24/64 (37.5%)	> 450 pg/mL for subjects with an age range between 51 and 70 determined; > 500 pg/mL for subjects with an age range between 71 and 93 (Yes)	34/64 (53%) FP = 5 FN = 15	56%	83%	Total sample: 8.4 ± 5.1 months (range 2 to 24 months); follow‐up interval for converters was 9.6 ± 5.4, and for non‐converters 7.0 ± 4.3 months
Hansson 2006	137/134	CSF t‐tau 38/134 (28%)	> 350 pg/mL (No)	78/134 (58%) FP = 5 FN = 45	42%	91%	Total sample: median 5.2 years (range 4.0 to 6.8); MCI‐AD: median: 4.3 years (range 1.1 to 6.7); MCI‐other dementias: median 4.2 (1.5 to 6.3)
Herukka 2007	79/79	CSF t‐tau 43/79 (54%)	> 400 pg/mL (Yes)	33/79 (42%) FP = 17 FN = 7	79%	63%	Mean 3.52 ± 1.95 years in MCI converters; mean 4.56 ± 3.09 years in MCI‐stable
^{AD: Alzheimer's disease; FN: false positive; FP: false negative; MCI: mild cognitive impairment}

Table 3. Conversion from MCI to All dementia

Table Tests. Data tables by test

Test	No. of studies	No. of participants
1 CSF t‐tau conversion to AD dementia Show forest plot	7	709

2 CSF p‐tau conversion to AD dementia Show forest plot	6	492

3 CSF p‐tau/ABeta ratio to AD dementia Show forest plot	5	433

4 CSF t‐tau conversion to All dementias Show forest plot	4	319

Table Tests. Data tables by test