Background
Acute lymphoblastic leukaemia (ALL) is a heterogeneous disease characterized by the presence of several subtypes that are of prognostic relevance. These subtypes can be distinguished based on immunophenotype, differentiation status, as well as chromosomal and molecular abnormalities. The identification of different ALL subtypes, the characterization of prognostic features, and the finding that ALL subtypes differ in their response to therapy has greatly facilitated the development of treatments tailored to specific subgroups [
1‐
3]. Current National Cancer Institute (NCI) criteria for risk assignment utilise age and white blood cell counts (WBC) at diagnosis to stratify patients into standard risk (SR; 1-9.99 years of age and WBC<50,000/μl) and high risk (HR; ≥ 10 years of age or WBC ≥ 50,000/μl) [
4]. In addition, several structural and numerical chromosomal abnormalities are known as independent prognostic factors. For example, the t(9;22) translocation is strongly associated with poor prognosis, whilst both t(12;21) translocations and high hyperdiploid karyotypes (>50 chromosomes) confer a favourable prognosis [
5‐
7]. Although detection accuracies for chromosomal abnormalities can be as high as 90%, the success rate varies greatly and cytogenetic analysis remains a challenge due to the low mitotic index and poor quality of the metaphases associated with ALL [
7,
8]. Cytogenetic interpretation can be particularly difficult for complex karyotypes, cryptic translocations such as the
TEL-AML1 translocation, and multiple chromosomal rearrangements that have been identified for the same locus, as is the case for chromosomal abnormalities involving the
MLL gene. Thus, multiple complementary technologies, such as fluorescence in situ hybridization (FISH), spectral karyotyping (SKY), Southern blot analysis and RT-PCR, are often required for the accurate identification of chromosomal abnormalities and hence add to the extremely time-consuming and expensive process of cytogenetic analysis [
5‐
7,
9].
Recent advances in microarray technology have shown that subgroups of ALL as well as acute myeloid leukaemia (AML) can be accurately distinguished based on their gene expression profiles [
10‐
16]. Two of the largest childhood ALL microarray studies published so far demonstrated the presence of distinct gene expression patterns in six known prognostic subgroups [
13,
14]. Using supervised learning algorithms to assign ALL samples into their respective subgroups, the study conducted at the St. Jude Children's Research Hospital achieved an overall prediction accuracy of about 96% [
14]. The findings from this and other studies raised the prospect of developing a standardized diagnostic gene expression platform to enhance accurate diagnosis and risk stratification. One of the major challenges that lies ahead is how the information gained through microarray experiments can be applied to clinical diagnostics, including the issue of whether to employ microarrays themselves as a platform for testing. Here, we explored the possibility of using a small number of genes in such a test, which would allow the exploitation of quantitative reverse transcriptase polymerase chain reaction (qRT-PCR) as an alternative method for diagnostic screening. Compared to microarray technology, qRT-PCR has the advantage of being less expensive, rapid, already established in many laboratories and independent of extensive computational analysis. We examined the ALL microarray data set published by Ross
et al [
14], focusing on 104 specimens from ALL patients that represent six different subgroups defined by cytogenetic features and immunophenotypes. Using the decision-tree based supervised learning algorithm Random Forest (RF), we determined a small set of genes for optimal subgroup distinction and subsequently validated their predictive power in an independent patient cohort. We showed that only 26 genes are required to accurately discriminate the six known prognostic subgroups of paediatric ALL, a number small enough to allow their expression levels to be measured by modern qRT-PCR technology in a clinical setting.
Discussion
The objective of many microarray studies is the improvement of diagnosis with the aim of accurately assigning patients into specific risk categories that facilitate risk-adapted therapy. Recent studies have demonstrated the great potential of gene expression profiling for the classification of clinically relevant subtypes of paediatric leukaemia [
11‐
16]. The results from these studies are promising and suggest that standardized gene expression-based diagnostic tests can provide at least equivalent, if not superior diagnostic accuracy compared to conventional analysis methods. To critically assess whether findings from gene expression profiling in paediatric ALL could be adapted to diagnostic tests we have asked a set of fundamental and very specific questions: 1) Can array data be successfully applied to independent patient data, i.e. are microarray findings robust and generally applicable? 2) Is the selection of discriminating genes governed by the approach chosen for the analysis of microarray data and if so, is the selection of a different set of genes critical for accurate class assignment? 3) Is it possible to drastically reduce the number of discriminating genes without compromising the predictive performance? To answer these questions we chose the leukaemia microarray data set published by Ross
et al [
14] and re-analyzed the data using RMA [
20,
21] for data extraction and normalization and RF [
23,
27] as a supervised learning algorithm for the selection of informative genes and class assignment.
To address the first question we used our RMA/RF analysis approach to identify the top 20 discriminating probe sets for each of the six ALL subgroups represented in this data set. It is important to note that the patient cohort studied by Ross and colleagues [
14] was purposefully chosen to represent all six subgroups in almost equal numbers. To independently test the discriminators identified by RMA/RF in a less "idealized" cohort, we used our own microarray data set, obtained by testing 68 paediatric ALL specimens. This analysis validated the top subgroup-discriminating genes identified by RMA/RF in an independent cohort of ALL specimens achieving overall prediction accuracies of up to 92.6%. Since our specimens were assessed using HG-U133A arrays, several top-ranked discriminators represented on the HG-U133B array were not included and this may have accounted for a less precise classification of some samples. This is particularly exemplified in case of the hyperdiploid subgroup where one of the original top five discriminators is not present on the HG-U133A array. Similarly, the only misclassification of a case with
MLL rearrangements might be due to the highest-ranked discriminator not being part of the HG-U133A array. Despite this, our findings clearly demonstrate that the genes identified by RMA/RF as discriminators of the six ALL subgroups are robust and can be applied to an independent cohort of patients. Importantly, this set of genes accurately classified samples from an independent cohort that was obtained from a different institution, in which subgroups were not artificially represented in equal numbers, and the performance was not affected by laboratory-specific differences in terms of sample handling and data generation.
Unexpectedly, using our RMA/RF analysis we found a high degree of discrepancy between genes identified as most important discriminators for the six ALL subgroups compared to those identified in the study published by Ross
et al [
14]. Only 35–65% of probe sets and 35–71.4% of genes were commonly identified in both analyses. This finding highlights the existence of a large number of genes with the potential to discriminate between subtypes of paediatric ALL. Thus, the selection of discriminators, critical for achieving the most accurate classification and the design of diagnostic tests, seems to be dependent on the chosen analysis approach. While many microarray studies report that similar classifications are obtained with different supervised learning algorithms [
13,
14,
28,
29], so far little attention has been paid to this critical aspect of selecting discriminating genes [
30‐
32]. Not surprisingly, we found the highest degree of variability for discriminators identified for the cases with more than 50 chromosomes. Only 7 of the top 20 discriminating genes were common between our analysis and the analysis conducted by Ross
et al [
14]. These discrepancies coincide with relatively low expression levels and particularly low fold-changes observed for the genes defining this subgroup, most likely reflecting the documented heterogeneity of this subgroup. Interestingly, the top-ranked gene
PHB has not previously been identified as an important discriminator for this subgroup. Although the precise function of
PHB has yet to be clarified, it has been found to play a role in several cellular processes, such as proliferation and apoptosis [
33]. Other prominent subgroup-discriminating genes identified by RMA/RF were
ABL1 for the
BCR-
ABL subgroup, and several B cell-specific genes with very low expression levels in T-ALL samples, including the transcription factor
EBF, PAX5, a potential downstream target of
EBF [
34], and the transcription factor
TFEB. Furthermore
WNT16, a downstream target of the E2A-Pbx1 fusion protein [
35], was found to be the second most important discriminator for cases with
E2A-
PBX1 rearrangements. The results presented here highlight that the selection of genes that distinguish best between ALL subgroups is strongly influenced by the methods used to analyze gene expression profiles, and this in turn may have profound implications for clinical applications. While RMA has more recently become a popular choice as data extraction method [
20,
21], only few studies have reported the use of RF as a supervised learning algorithm [
23,
27,
36]. RF is a decision tree-based algorithm and has been proposed as particularly suitable for the high dimensionality of microarray data sets. Comparisons with other commonly used supervised learning algorithms have shown that the RF algorithm constructs far more precise classification rules [
23]. Besides improved prediction accuracies, a reduction in the number of genes required for classification has also been reported when using decision tree-based methods [
27].
Another critical issue that remains to be addressed is the optimal platform for a diagnostic test to measure gene expression profiles, i.e. low-density custom microarrays or PCR-based assays. Many studies, including our own, have shown that expression levels determined by microarray can accurately be reproduced by qRT-PCR [
13,
22,
37,
38]. Compared to microarrays, qRT-PCR technology has the advantage of being readily available in most laboratories, being more cost-efficient and not involving extensive statistical and computational data analysis. However, a qRT-PCR-based diagnostic platform would require the drastic reduction in the number of genes measured. The comprehensive cross-validation procedures performed in this study revealed that as few as 30 probe sets are sufficient to achieve accurate class assignment. In contrast, a previous study has reported that a single gene could identify T-ALL and
E2A-
PBX1 cases, while 7–20 genes were needed to predict each of the other four classes [
13]. The 30 probe sets determined as requirement for accurate class prediction in our study represent 26 genes, a number that could easily be analyzed in a routine qRT-PCR test. Remarkably, only eight of these 26 genes were listed among the top 5 subgroup-defining probe sets of the study published by Ross
et al [
14]. It is foreseeable that further optimization towards developing a generic and robust classifier will lead to an even further reduction in the number of discriminators required to predict some subgroups, while additional discriminators may be needed to detect distinct subtypes within the
MLL subgroup [
16,
39]. Furthermore, the inclusion of additional translocation-specific assays should enhance the accurate classification of cases expressing
BCR-ABL.
Using our RMA/RF approach, RF discriminant analysis for the six ALL subgroups using 120 probe sets achieved very high average prediction accuracies of 98.2%. These accuracies were slightly higher than the previously reported average prediction accuracy of 94.6% [
14], which included misclassification of cases with more than 50 chromosomes and
TEL-
AML1 rearrangements. In contrast, our analysis classified these specimens with virtually 100% accuracy.
TEL-
AML1 rearrangements and a hyperdiploid karyotype with more than 50 chromosomes represent two of the most frequent genetic abnormalities, found in 22% and 25%, respectively, of children diagnosed with ALL [
3]. Given their prognostic significance, the correct identification of these two subgroups is of great importance. The presence of either of these features in paediatric ALL is significantly associated with a favourable prognosis [
9,
40‐
43]. Importantly, because the
TEL-
AML1 translocation is undetectable by conventional cytogenetic analysis, more sophisticated molecular techniques, such as FISH are required to confirm the presence of this fusion gene [
6,
7]. Hence, our results further emphasize the advantage of using gene expression profiles to determine this prognostically important ALL subgroup.
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
KH was responsible for designing the study, analysing, collating, and interpreting the data, and preparing the manuscript. MJF carried out the statistical analysis. AHB, MJF and NHdK assisted with data analysis, experimental design, and data interpretation. URK supervised all aspects of the study and preparation of the manuscript. All authors read and approved the final manuscript.