Background
Tyrosinemia type 1 (TYR1), also known as fumarylacetoacetase deficiency (Enzyme Commission Number 3.7.1.2), is an autosomal recessive disorder of amino acid metabolism. It is caused by a deficiency in the activity of fumarylacetoacetic hydrolase, the final enzyme in the tyrosine degradation pathway, which leads to a toxic build-up of fumarylacetoacetate, maleylacetoacetate, and succinylacetone (SUAC) [
1]. TYR1 is characterised by progressive liver, kidney, and neurological disease [
2]. Acute (presenting before six months of age), sub-acute (presenting between six and 12 months) and chronic (presenting after one year) forms of the disease have been described [
2]. Without treatment, the prognosis for individuals with TYR1 is poor, with high levels of death during childhood due to liver failure, recurrent bleeding, hepatocellular carcinoma, and porphyria-like syndrome with respiratory failure [
3]. However, treatment with nitisinone and dietary restrictions are associated with reductions in morbidity and mortality [
4‐
6]; liver transplantation is indicated if these treatment fail or if hepatocellular carcinoma develops [
2]. The incidence of TYR1 is estimated to be approximately 1:100,000 live births, but reported values range from 1:1,846 [
7] to 1:781,144 live births [
8]. The incidence of TYR1 is higher in Quebec, Canada, possibly due to a founder effect for Tyrosinemia and high gene frequency [
7], and in Asian children in the West Midlands of the UK [
9], and in North Africa and the Middle East [
10], possibly due to parental consanguinity [
9].
Screening for TYR1 amongst newborn babies is conducted in many countries around the world. While tyrosine levels have been used as the primary screening marker for TYR1, it is not consistently raised in individuals who have TYR1 [
11], and it can be elevated in individuals with other conditions and in unaffected babies [
12,
13]. In 2004, Allard and colleagues developed an alternative method to screen for TYR1 using tandem mass spectrometry (MS/MS) to determine SUAC in dried blood spots (DBS) [
14]. A rapid review of literature published up to 2012 reported that “Screening programmes using succinylacetone as a marker have reported 100% sensitivity and 100% specificity. However, other studies have reported the identification of false positives.” [
15]. The aim of the current review was to examine the range of test accuracy indicators (sensitivity, specificity, and predictive values) of succinylacetone measurement in DBS using MS/MS for TYR1 screening using full systematic review methods.
Methods
Search strategy
We conducted searches in the following electronic databases: Medline, Medline In-Process & Other Non-Indexed Citations, Embase, Web of Science (All Databases), and the Cochrane Library. We searched using text word and MeSH terms relating to “Tyrosinemia type 1 OR inborn errors of metabolism”, AND “succinylacetone OR DBS OR (tandem mass spectrometry AND neonatal screening)”. Full details of the search strategy are provided in Additional file
1: supplement 1. The search was conducted on 26
th January 2016. We examined reference lists of included studies and previous reviews. Experts in the field and organisations were contacted for studies not in the public domain.
Eligibility criteria
We included English language journal articles which investigated screening for TYR1 by MS/MS analysis of SUAC from DBS in newborns. The reference standard was urine testing for SUAC, clinical detection of TYR1 or two-year follow-up. Outcomes included were any reported test accuracy measures from cross-sectional studies, case–control studies, or studies reporting screening experiences. We excluded non-human studies, papers not available in English, letters, editorials, communications, grey literature, conference abstracts, and studies published before 2004 (the year the first paper was published on SUAC measurement in DBS using MS/MS for TYR1) from our review.
Screening and data extraction
Screening of titles and abstracts of all retrieved records, and subsequently of full texts, was undertaken independently by two reviewers. Data extraction was performed by a single reviewer, with all data extraction forms checked by a second reviewer. Disagreements were resolved by discussion between the two reviewers or further discussion with a third reviewer, leading to a consensus on inclusion/exclusion.
Quality appraisal
Quality of included studies was assessed independently by two reviewers using the Quality Assessment Tool for Diagnostic Accuracy Studies 2 [QUADAS-2; [
16]] which was tailored to the research as recommended. Tailoring of the QUADAS-2 tool included adding a topic-specific signalling question and defining appropriate reference standards and cut-offs for participant exclusions, as well as guidance on how many positive signalling questions are required for an overall positive rating in terms of bias and applicability concerns. (See Additional file
1: supplement 2 for signalling questions and Additional file
1: supplement 3 for guidance notes). Disagreements were resolved by discussion between the two reviewers or through discussion with a third reviewer, leading to a consensus on study quality.
Data summary and synthesis
Meta-analysis was not possible due to incomplete 2x2 tables and heterogeneity in study design. Therefore, a narrative synthesis of results is provided.
Discussion
We examined the test accuracy of SUAC measurement in DBS using MS/MS to screen for TYR1 in newborns. Ten studies were identified which reported test accuracy data; five studies reporting screening experiences and five case–control studies. PPV in the studies reporting screening experiences ranged from 67% (two true positive cases and one false positive case out of ~500,000 babies screened) to 100% (eight true positive cases and no false positive cases out of 856,671 people screened). We were unable to calculate sensitivity, specificity, or negative predictive value in these studies due to a lack of follow-up of babies who screened negative. Case–control studies reported clear discrimination between SUAC levels of newborns with and without TYR1.
No consistent test accuracy metric was available. Papers reporting screening experiences suggested that using SUAC to screen for TYR1 resulted in no false negative results, and reported test sensitivity and specificity of up to 100%. However, these conclusions were based on a lack of awareness of false negative results rather than following up babies who had screened negative. Without proper follow-up of the population who have been tested, for an appropriate amount of time, it is not possible to know if the absence of awareness of false negatives reflects an actual absence of false negative results.
While case–control studies showed no overlap in SUAC levels between newborns with and without TYR1, the cut-offs used varied between studies and were specified retrospectively, and the assessors were not blinded to the disease status, which can result in overestimation of test accuracy. The included case–control studies were also at high or unclear risk of differential verification bias as TYR1 cases and healthy controls received different reference standards, the reference standards used were not reported in sufficient detail to assess if their accuracy was comparable, or they were not reported at all. The use of multiple reference standards across participants of a single study might have resulted in an overestimation of accuracies [
25]. In addition, studies evaluating diagnostic tests in a diseased population and a separate healthy control group can overestimate the diagnostic performance compared with studies that use the index test in a clinical population covering the full range of patients without knowing their disease status [
25].
Our understanding of the appropriateness of screening for TYR1 using SUAC is limited by heterogeneity in study design, the methods used for SUAC determination on DBS, and the SUAC cut-off values. For example, the SUAC cut-offs used in the screening test to identify possible cases of TYR1 ranged from 1.29 μmol/l [
23] to 10 μmol/l [
19]. Proficiency testing results for SUAC in dried blood spots have shown large differences among screening laboratories in SUAC recovery reflecting analytic biases, which might explain the wide variation in cut-off values of the studies in our review [
26]. Differences in recovery could be explained by the method used (kit TMS vs. non-kit TMS; butyl ester derivatisation vs. non-derivatisation), DBS extraction strategy (freshly punched DBS, residual DBS or co-extraction of AA, AC, and SUAC, respectively), internal standard used (
13C-SUAC, 5,7-dioxooctanoic acid, or TMS kit internal standard), or the calibration strategy used (DBS calibrators, TMS internal standard/other liquid standard or kit internal standard only, respectively). Laboratories that measure low quantitative SUAC results usually used lower cut-off values to avoid misclassifications [
26]. This highlights an important issue in how screening tests are evaluated. In this paper, we examine test accuracy, meaning the association between results from the test under investigation with the presence or absence of the target disease. However, the term ‘accuracy’ has multiple meanings. Within method validation (the process used to confirm that tests are suitable for their intended purpose), ‘analytical’ accuracy refers to the degree to which test results and the true value of the measured quantity agree and how reproducible and reliable the test is [
27]. The analytical performance of the used SUAC assays has been described in some of the included studies. The recovery of SUAC was assessed in five studies by assaying DBS specimens enriched with predetermined (low to high) SUAC concentrations and was reported to be 51% [
23], 72-80% [
19], 75-78% [
14], 75-86% [
21], and 97-100% [
22] of the expected value, respectively. The quantification limit (the lowest amount of SUAC in a sample which can be reliably quantified) was reported in four studies and was 0.4 μmol/l [
22], 0.5 μmol/l [
19,
23] and 1 μmol/l [
14]. The calibration was reported to be linear up to 50 μmol/l [
14], 100 μmol/l [
19,
22,
24], 240 μmol/l [
21], and 250 μmol/l [
23], respectively. Precision (the ability to consistently reproduce a result when sub-samples are taken from the same specimen) results were presented in seven studies with inter-assay coefficients of variation (CV) at different SUAC concentrations of 10.0-12.2% [
14], 7.1-8.5% [
21], 3.50-4.49% [
22], 5.8-13% [
19], 15.8-16.7% [
24], 17.29-19.00% [
23] and 30% in a pooled sample assay [
20]. Taken together, the analytical performance of the screening tests used in the included studies was in agreement with previously reported proficiency testing outcomes [
26,
28,
29], showing large between-laboratory differences in SUAC recoveries (mostly incomplete recoveries) depending on the method used and reproducible within-laboratory recoveries. There is need to harmonise quantitative results among laboratories. Despite differences among methods in SUAC recoveries (analytical bias), each method seems to have an acceptable precision and might therefore still be able (when using a cutoff value appropriate for the selected method) to reliably sort asymptomatic newborns into probable TYR1 cases and non-cases. De Jesus et al. [
29] and Adam et al. [
26] stress in their papers that bias in quantitative results can be tolerated if the screening test reliably sorts people into those who (probably) do have the disease of interest and those who (probably) don’t. Any differences in the test accuracy between studies might be due to the timing of the test, the SUAC assay used, the cut-off used for classifying the disease status, use of repeat testing in samples with borderline SUAC levels, or variation in normal SUAC values in the tested newborn population.
Our review has a number of limitations. First, we were unable to synthesise our findings numerically due to incomplete 2x2 tables for reporting screening experiences, and heterogeneity in study design, the MS/MS method used, and the SUAC cut-off values. Second, we restricted our search to English language papers; non-English-language papers may be available and add further information. Third, we tailored the applicability questions for the QUADAS-2 in relation to the newborn screening in the UK. For example, in the UK newborn screening takes place five to eight days after birth, so studies in which samples were taken before or after this were rated as having high concerns regarding applicability. None of the studies we identified were conducted in the UK, and the usual time at which screening takes place varies by country; in many European countries newborn screening is conducted three days after birth. Therefore, the criteria for a high applicability concern might be different outside the UK.
While results from case–control studies are promising they are not definitive, as we know that case–control designs tend to overestimate test accuracy [
25]. A research project using MS/MS measurement of SUAC from DBS with follow-up of screen-negatives for at least two years would considerably strengthen the test accuracy data. This could be achieved by following up one of the existing cohorts described in this review by searching hospital/primary care databases for cases of TYR1 that were identified symptomatically. While this approach would not provide a definitive answer, it would enable a measure of false-negative cases that is currently missing from the literature.
Acknowledgements
This research was commissioned by the UK National Screening Committee. Sian Taylor-Phillips, Aileen Clarke, Chris Stinton, and Hannah Fraser are supported by the NIHR CLAHRC West Midlands initiative. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, the UK National Screening Committee, Public Health England or the Department of Health. Any errors are the responsibility of the authors.