Defining consensus leukemia-associated immunophenotypes for detection of minimal residual disease in acute myeloid leukemia in a multicenter setting

Feller, N; van der Velden, V H J; Brooimans, R A; Boeckx, N; Preijers, F; Kelder, A; de Greef, I; Westra, G; te Marvelde, J G; Aerts, P; Wind, H; Leenders, M; Gratama, J W; Schuurhuis, G J

doi:10.1038/bcj.2013.27

Download PDF

Original Article
Open access
Published: 02 August 2013

Defining consensus leukemia-associated immunophenotypes for detection of minimal residual disease in acute myeloid leukemia in a multicenter setting

N Feller¹,
V H J van der Velden²,
R A Brooimans³,
N Boeckx^4,5,
F Preijers⁶,
A Kelder¹,
I de Greef⁷,
G Westra¹,
J G te Marvelde²,
P Aerts³,
H Wind²,
M Leenders⁶,
J W Gratama³ &
…
G J Schuurhuis¹

Blood Cancer Journal volume 3, page e129 (2013)Cite this article

3943 Accesses
60 Citations
Metrics details

Subjects

Abstract

Flow-cytometric detection of minimal residual disease (MRD) has proven in several single-institute studies to have an independent prognostic impact. We studied whether this relatively complex approach could be performed in a multicenter clinical setting. Five centers developed common protocols to accurately define leukemia-associated (immuno)phenotypes (LAPs) at diagnosis required to establish MRD during/after treatment. List mode data files were exchanged, and LAPs were designed by each center. One center, with extensive MRD experience, served as the reference center and coordinator. In quarterly meetings, consensus LAPs were defined, with the performance of centers compared with these. In a learning (29 patients) and a test phase (35 patients), a mean of 2.2 aberrancies/patient was detected, and only 1/63 patients (1.6%) had no consensus LAP(s). For the four centers without (extensive) MRD experience, clear improvement could be shown: in the learning phase, 39–63% of all consensus LAPs were missed, resulting in a median 30% of patients (range 21–33%) for whom no consensus LAP was reported; in the test phase, 27–40% missed consensus LAPs, resulting in a median 16% (range 7–18%) of ‘missed’ patients. The quality of LAPs was extensively described. Immunophenotypic MRD assessment in its current setting needs extensive experience and should be limited to experienced centers.

Reproducible measurable residual disease detection by multiparametric flow cytometry in acute myeloid leukemia

Article Open access 18 July 2022

Measurable residual disease (MRD) testing for acute leukemia in EBMT transplant centers: a survey on behalf of the ALWP of the EBMT

Article 28 July 2020

Minimal residual disease assessment by multiparameter flow cytometry in transplant-eligible myeloma in the EMN02/HOVON 95 MM trial

Article Open access 03 June 2021

Introduction

The interest in minimal residual disease (MRD) assessment as a prognostic tool in acute myeloid leukemia (AML) is growing. Molecular markers, although allowing a greater sensitivity, enable MRD monitoring only in a minority of cases.¹ Immunophenotypic MRD detection using flow cytometry turned out to offer an attractive alternative. Using immunophenotyping, several early studies have shown the prospective value of the frequency of MRD cells for survival of AML patients both in adults^{2, 3, 4, 5, 6} and in children.^{7, 8, 9} Meanwhile, many other studies have confirmed these data.^{1, 10} Immunophenotypic detection of MRD using flow cytometry can be performed by defining aberrant expression of markers or marker combinations on the leukemic cells at diagnosis. These so-called leukemia-associated (immuno)phenotypes (LAPs) are absent, or are present at very low frequencies, in normal blood or bone marrow (BM) cells. This approach can be successfully applied in 80–100% of AML patients.^{2, 3, 4, 5, 6, 7, 8, 9, 11}

Apart from serving as a prognostic marker, MRD may ultimately turn out to offer a tool to guide patient-tailored therapy, an approach already adopted once for childhood AML.¹² Even when categorized as low risk by cytogenetics/molecular markers and/or other conventional prognostic parameters, AML patients may be identified to be at high risk based on MRD cell frequency after induction therapy and stratified to treatment intensification such as allogeneic transplantation. Inversely, intensive treatment in AML patients defined as ‘poor risk’ by conventional factors, but as ‘low risk’ based on MRD assessment, might ultimately be avoided. Furthermore, at follow-up, increments in MRD cell frequency by sequential MRD assessments may identify forthcoming relapses, in turn urging therapy to be administered before overt relapse occurs.

For these reasons, it is important to establish the prognostic value of MRD cell frequency assessment in clinical studies in prospective, multicenter settings. A first requirement to enable adequate performance of such studies is the standardization of all procedures necessary for definition of adequate LAPs in newly diagnosed AML, as well as those necessary for identification and quantification of MRD in BM follow-up material. Standardizing LAP technology for MRD detection is complicated because aberrant immunophenotypes may not only differ from patient to patient but also within a patient, often resulting in many different LAPs (illustrated in Supplementary Figure 1). There are large differences in the suitability of LAPs. This suitability may be defined by at least four factors. First, the percentage of blasts that bear a particular aberrancy at diagnosis determines the sensitivity of subsequent detection and quantification of AML cells. Second, the extent to which equivalent cells in control normal BM bear the aberrancy determines LAP specificity: the lower the frequencies of such ‘background’ expression, the higher the LAP specificity.^{5, 13, 14} Third is the stability of LAPs at follow-up—that is, the possibility that expression of markers, which constitute the LAP, may be higher or lower at follow-up as compared with diagnosis,^{5, 14, 15, 16} thereby resulting in overestimation or false negativity, respectively, of MRD. Finally, the quality of monoclonal antibody conjugates, as well as the stability and resolution of the flow cytometer, is important. All these factors contribute to, often, large differences in the applicability of specific LAPs for different leukemia cases.

To standardize all the procedures that are needed for LAP definition as well as for quantification of MRD, five centers in Belgium and the Netherlands joined forces in 2004. These centers had ample experience in (at least) four-color flow cytometry. One of these centers had extensive experience with MRD detection in adult AML and served as the reference and coordinating center.

The primary goal of this study was to standardize the identification of LAP at diagnosis. First, we defined a standardized antibody panel and standard operating procedures based on both the MRD experience in the group, the published data and extensive knowledge of the performance of antibody conjugates. Second, we evaluated whether this standardized antibody panel could identify robust LAPs in the vast majority of AML patients. The quality of LAPs was determined on the basis of their presumed specificity, sensitivity and stability. The coordinating center served as reference for the definition of LAPs. Finally, the (improvements in) performance of individual laboratories in identifying LAPs in an initial learning phase and a test phase was evaluated.

Materials and methods

Patients and cells

Sixty-four patients with AML, consecutively presenting during a period of 18 months (July 2004–December 2005) in the participating institutes, were included. The median age was 59 years (range: 9–85; two children of 9 and 15 years were included). FAB (French-American-British) classification distribution was 6 M0, 11 M1, 16 M2, 2 M3, 6 M4, 5 M5, 3 M6, 1 M7, 6 RAEB (3 RAEB and 3 RAEB-t), 1 secondary AML (out of CML), 1 AML with non-Hodgkin’s lymphoma and 6 cases with unknown classification. BM of patients with AML, and normal BM from patients with cardiac disease, was obtained after informed consent and according to the institutional protocols. LAPs in normal BM controls were defined as percentages of white blood cells (WBCs).

Participating centers and working plan

The participating centers were required to have the following criteria: (i) longstanding experience in immunophenotyping of leukemia using at least four-color flow cytometry; (ii) access to clinical samples; (iii) experience either in MRD assessment and/or in quantification of low-frequency cell subpopulations; (iv) participation for at least 5 years in external quality control assessment programs for CD34 counting and leukemia/lymphoma immunophenotyping;^{17, 18} and (v) extensive experience with various antibody–fluorochrome combinations. Five centers from the Netherlands and Belgium participated (for details, see Supplementary Files). Amsterdam (indicated as no. 1) served as the coordinating and reference center, and the other centers were anonymously, in arbitrary sequence, referred to as participants no. 2–5.

Four centers were equipped with a FACScalibur (Becton Dickinson, San Jose, CA, USA) with an argon and red diode laser. CellQuest software was used for analyses of the list mode data (LMD). One participant was equipped with a Beckman Coulter five-color FC500 flow cytometer (Miami, FL, USA) with an argon laser. LMD were analyzed with CXP software (Miami, FL, USA).

The study period was subdivided in two phases: a learning phase with 28 patients and a test phase with 35 patients (for details, see Supplementary Files).

Immunophenotypic labeling

The first goal of the working group was to develop a standard panel of monoclonal antibody combinations to define putative LAPs at diagnosis. This panel was composed of monoclonal antibodies and fluorochromes in four-color combinations based on previous experience with LAP definitions and MRD assessments in AML,^{5, 19} as well as extensive knowledge of antibody–fluorochrome performance. MoAbs were conjugated with fluorescein isothiocyanate, phycoerythrin (PE), peridinyl chlorophyllin or allophycocyanin for FACSCalibur users, and with fluorescein isothiocyanate, PE, phycoerythrin-Texas Red conjugate (energy coupled dye, ECD) and PE-Cy5 for the Coulter FC500 user (Table 1). The backbone of a LAP consists of CD45, a primitive marker (CD34, CD133 or CD117), a myeloid marker (usually CD13 or CD33) and one of the aberrant markers listed in Table 1. The antibody–fluorochrome combinations were centrally purchased, titrated and diluted to the user concentration, and subsequently distributed to all centers.

Table 1 Standard immunophenotypic panel

Full size table

LAPs were established in newly diagnosed AML in a two-step approach. In the first step, the panel was used to define the immunophenotype of the blasts in whole BM or peripheral blood at diagnosis. In this so-called first run, aberrant expression patterns of individual markers were defined, and, from these, relevant markers for the putative LAPs for MRD assessment (one LAP per tube) were defined. As these markers may originate from different tubes, the finally chosen combinations had to be tested in a so-called second run. Because of the large number of LAPs, central distribution of conjugate combinations for the second run was not feasible.

For all relevant monoclonal antibodies, the experience of all centers (advantages, disadvantages and pitfalls) was evaluated in regular meetings of the study group. The final list of antibody–fluorochrome combinations for use in the second run is shown on the website http://www.vumc.nl/afdelingen/hematologie/behandelaars/onderzoek/ under ‘clones of monoclonal antibodies’.

Data exchange and data interpretation

When a patient was included in the study in one of the five institutes, the procedure of data exchange was as follows: the LMD of the first run were uploaded by that institute to a website (termed ‘X-drive’), with accessibility restricted to the working group members (Figure 1). Using this site, each of the five institutes analyzed these first run LMD files of that particular patient and, independent of each other, defined putative LAPs. These were communicated to the coordinator. From all putative LAPs, the coordinator, according to its experience, defined so-called ‘proposed LAPs’, fulfilling as many essential criteria as possible. In the quarterly group meetings, these ‘proposed LAPs’, upon consensus, were termed ‘consensus LAPs’. On some occasions, the consensus LAPs deviated from the proposed LAPs. Also, all other putative LAPs were discussed, with performance evaluated for all individual centers.

Statistics

Student’s t-test was used to evaluate the significance of improvements in performance between the learning and test phases of the study.

Results

Standard immunophenotypic panel (first run)

After the learning phase, tube 6 was modified: TdT was replaced by CD36, as TdT had not contributed to suitable independent LAPs, did not appropriately function as an exclusion marker for M4 and M5 AML types, and, as intracellular marker, hampered a rapid application of the panel.

In the next five sections, the performance of the panel (number and nature of LAPs detected and the sensitivity, specificity and stability of consensus LAPs) is described. This part covers the final results after the consensus meetings. Finally, performance of individual centers in both phases is described.

Number and nature of consensus LAPs

In the learning phase, 55 consensus LAPs were found in 28 patients (Tables 2a and b, third column). No LAPs were found in one patient (3.5%). In the test phase, 88 consensus LAPs were found in 35 patients (Tables 2a and b, fourth column). LAPs were found in all patients. For details on numbers of LAPs per patient per study phase, see legends of Table 2. Taken together, one or more LAPs could be identified in almost all AML patients (63/64: 98.5%).

Table 2 Consensus LAPs in terms of sensitivity and specificity

Full size table

Table 3 shows a detailed overview of consensus LAPs, divided into the well-known four major groups. Important are LAPs that were chosen most frequently as the best (that is, most sensitive and most specific, and stable, as outlined in the next paragraphs). Table 3 shows the leading aberrancies in the first column, with, in the fourth column (printed in boldface), the percentage of patients for whom these aberrancies were chosen as best consensus LAPs. These included cross-lineage expression with CD7 as the leading marker (36% of the patients), asynchronous expression of CD34 and CD133 (20%), asynchronous expression of CD15 combined with a progenitor marker (28%) and absence of HLA-DR (20%) or CD33 (11%) in combination with a progenitor marker. All other aberrancies were considered suitable in <10% of the patients.

Table 3 Summary of consensus LAPs

Full size table

In 60 of the 64 patients, at least one primitive marker (CD34, CD117 or CD133; more details about distribution in legends of Table 3) was present in the consensus LAPs. The remaining four cases had LAPs without a primitive marker (three patients), or no LAP at all (one patient).

Sensitivity of the consensus LAPs

LAPs were categorized into the following three groups based on the percentage of LAP-covered leukemic cells (Table 2): (1) >50% (sensitivity category ‘high’); (2) 20–50% (‘intermediate’); and (3) 10–20% (‘low’). LAPs present on <10% of blast cells were not considered suitable for MRD monitoring.

Table 2a shows for all consensus LAPs that the best sensitivity category (‘high’) contained 64% (51+13%) and 50% (42+8%) of the LAPs in the learning phase and test phase, respectively. Taking the high and intermediate categories together, these figures were 92% and 90%, respectively.

On performing the same analysis for the best consensus LAP (total number is 63, similar to the number of patients with LAP), it was seen that the high category contained in the learning phase 71% of the LAPs and in the test phase 60% of the LAPs. Taking the high and intermediate categories together, these figures were 93% and 97%, respectively (Table 2b).

Specificity of the consensus LAPs

The specificity of a LAP depends on the expression of the aberrant immunophenotype on normal myeloid progenitor and/or mature cells. All participants evaluated the background of the consensus LAPs in normal BM. The median expression on normal myeloid cells usually was <0.05% (% of WBC), with maximum values generally <0.1% (Figure 2). This background resulted mainly from aberrant expression on the immature population: LAPs with an immature marker, which represent the vast majority of all consensus LAPs (87%) and of all best consensus LAPs (95%), had up to 10% background aberrancies on the immature population (Figure 2b). CD36 on WBC and on primitive cells was discovered during the study to have higher percentages (Figures 2a and b) and thereafter excluded from further analysis. In LAPs without primitive markers (only 13% of all consensus LAPs and 5% of best consensus LAPs), median LAP expression on normal BM WBC was usually <0.1% (data not shown).

The specificity of a consensus LAP was now categorized as (1) ‘high’ (<0.1% background) or (2) ‘low’ (⩾0.1% background). The two specificity categories are presented in Table 2a and distributed over the three sensitivity categories. The high-specificity category contained in the learning phase 82% (51+24+7%) and in the test phase 82% (42+33+7%) of all consensus LAPs.

On performing the same analysis for the best consensus LAP (total number 63), it was seen that the highest specificity category represented 86% of the patients in the learning phase and 88% in the test phase (Table 2b).

Combined analysis of sensitivity and specificity

Ultimately, the quality of the consensus LAP has to be acquired on the combined evaluation of sensitivity and specificity. The best LAP possesses the highest sensitivity (>50% expression) and the best specificity (<0.1% background). The order of quality of other LAP categories is presented in the legends of Table 2. Assuming that the high- and intermediate-sensitivity categories, together with high specificity, will offer the best consensus LAPs (63 in total), Table 2b shows that in 82% (64+18%) of cases in the learning phase and in 85% (51+34%) in the test phase, LAPs met this demand (further details are in Supplementary Files). It can thus be concluded that, in the majority of the 63 evaluable patients, relatively high-quality (>20% expression and <0.1% background reactivity) LAPs can be designed.

Stability of LAPs

Scoring for a good LAP based on stability did not include a learning and test phase, but it was performed using a list of expression of markers in paired diagnosis–relapse samples. We have performed an extensive comparison between diagnosis and relapse in 33 AML patients (18 different antigens in 125 diagnosis–relapse comparisons; Supplementary Table 1). In general, LAPs are relatively stable in most cases with a notable exception for CD19. For details, see Supplementary Results and Supplementary Table 1.

Performance of individual centers

The data discussed above constitute the final outcome of all consensus meetings. Next, we describe the performance of individual centers.

Missed LAPs

In the learning phase, centers 2–5 missed 39–63% of the consensus LAPs, whereas the coordinator (no. 1) missed 11% (Table 4). In the test phase, these percentages dropped to 27–40% and 3%, respectively (Table 4). Thus, the improvement was 12–33% for centers 2–5 and 8% for the coordinator. Supplementary Figure 2 shows a detailed overview of the numbers of missed LAPs per institute per phase.

Table 4 Percentages of missed LAPs by the different centers in learning and test phase

Full size table

Number of patients with no second LAP defined; comparison between the institutes

We then studied how many patients would not have been eligible for follow-up evaluation because of the missing of all LAPs present. For a patient to be eligible for MRD evaluation, at least one consensus LAP should have been defined. The answer, based on the data in Supplementary Figure 2, has been summarized in Figure 3a: in the learning phase, the percentages of patients apparently ineligible for MRD detection ranged from 21 to 33% for centers 2–5, whereas the coordinator identified at least one LAP in all. In the test phase, results clearly improved: the percentages of patients without any identified LAP ranged from 7 to 18%, whereas the coordinator had no misses.

The best policy to avoid missing MRD due to immunophenotypic shifts may be to define, whenever possible, at least two different LAPs. The performance of all centers to define a second LAP in patients where such LAP had indeed been formulated at the consensus meetings is shown in Figure 3b. In the learning phase, such a second LAP was missed in 8–29% of patients by centers 2–5 and in 10% of patients by the coordinator. These percentages were slightly lower in the test phase: a second LAP was missed in 9–20% of patients by centers 2–5 and in 6% by the coordinator. Thus, in addition to patients in which all LAPs had been missed (Figure 3a), a second LAP may be missed in some instances, where a first LAP had been successfully identified (Figure 3b).

Types of missed LAPs

In a next analysis, we investigated which types of LAPs are missed. LAPs missed by more than one center in relatively high percentages were those with asynchronous antigen expression (31–48%), with the exception of LAPs with an absence of marker(s) (for example, CD133⁺CD34⁻, CD15⁻HLA-DR⁻, CD33⁻ and HLA-DR⁻; see Table 3), LAPs with marker overexpression (25–88%) and LAPs without immature markers (14–50%) (Supplementary Table 2). The identification of LAPs with cross-lineage marker expression was generally most successful (7–32% misses). Performance of the individual LAPs is shown in Supplementary Table 3. Although, upon this further subdivision, numbers become too low to reliably evaluate, institutes were observed to perform well for the cross-lineage LAP with CD7, and also for LAPs with aberrant absence of marker expression (see above).

Irrespective of the type of LAP, it may be argued that the sensitivity and specificity category may also affect the success of identification of a LAP. Figure 4 shows the results for sensitivity only, because the specificity categories harbored too low numbers for reliable conclusions. The main conclusion (details in Supplementary Results) is that the centers performed much better for high-sensitivity category compared with intermediate and low-sensitivity category. However, substantial improvement was seen from the learning phase to the test phase, ending with almost similar performance in the high and intermediate category in the test phase.

Incorrectly assigned LAPs

Although missed LAPs may prevent evaluation of a patient’s MRD, incorrect assignment of a LAP may result in unreliable assessments of MRD cell frequency. Incorrectly designed LAPs mainly resulted from underestimation of background LAP expression in normal BM. In the learning phase, centers performed experiments on background staining in normal BM and evaluated specificity. Specificity of LAPs was an important issue at the consensus meetings. Incorrectly designed LAPs were only scored as such during the test phase and are shown (per sample and per institute) in Supplementary Figure 2 (green symbols). The percentages of incorrectly designed LAPs were low in institutes 1 and 2, but considerably higher in institutes 3 and 4. The number of LAPs was too low for reliable conclusions for institute 5. Such incorrect designs included the omission of exclusion markers, incorrect interpretation of levels of expression and definition of incorrect LAPs (for detailed information, see Supplementary Results). No specific type of incorrectly designed LAPs, common to all participants, could be defined, except for the fact that, for LAPs defined by the absence of expression of markers such as HLA-DR, inclusion of an exclusion marker is recommended.

Discussion

The current study was undertaken to establish criteria necessary to enable assessment of MRD in clinical studies in a multicenter setting. Such studies should establish whether or not MRD has the prognostic value reported by several single-institute retrospective studies, and at what time points and for which specific patient groups MRD assessment can be a part of clinical decision making. Such approach has in the meanwhile been applied in childhood AML, although in a single-center approach.¹²

When defining criteria to be fulfilled for a multicenter approach, we defined two distinct studies: first, definition of LAP at diagnosis and, second, the use of such LAPs to detect and quantify MRD during follow-up. The present study deals with the diagnostic part. Altogether, the final success rate of defining at least one consensus LAP was high (only 1 out of 64 samples had no LAPs at all), whereas in two-third of the samples two or more independent consensus LAPs could be defined. However, this result was achieved after multiple analyses and extensive group discussions and was based for a large part on the experience of the coordinating center. The complexity to define LAPs by relatively unexperienced centers is evidenced by the high percentages of missed LAPs by centers 2–5, even in the second, testing phase. Some cross-lineage aberrancies, as well as those asynchronous aberrancies that are characterized by complete lack of a marker, were apparently the easiest to define. Other asynchronous aberrancies, antigen overexpression and aberrancies on mature cells were more difficult to establish. In addition, in the initial learning phase, the success rate decreased with decreasing sensitivity (that is, represented by blast coverage percentage of LAPs).

With time, a learning effect, evidenced by the success rate of defining LAPs, was seen. This result was partly because of the thorough discussions at the group meetings. Ultimately, in the test phase this resulted in a success rate (that is, at least one LAP defined) of 82–93% for the relatively inexperienced institutes 2–5 in the 35 evaluable samples. Improvement was especially evident in a category with intermediate sensitivity; in the test phase, LAPs were identified with equal success rate in the high-sensitivity and the intermediate-sensitivity category. However, the overall results implicate that, because of missed LAPs, 7–18% of the patients would not have been evaluable for MRD according to these centers.

Furthermore, in another 9–20% of cases, a center defined only a single LAP, whereas at least two LAPs were finally defined in the consensus meetings. These patients would be more likely to become false negative for MRD if immunophenotypic shifts would have occurred. It has been reported by others and by us^{5, 11, 16} that such shifts occur frequently, although it has not always led to the disappearance of markers. In case only one aberrant marker is present, it is advised to design a second staining in which the nonaberrant markers that are included in the LAP are replaced by other, nonaberrant, markers. In this way, the risk of loss of these normal markers, which could also result in false negativity, is minimized. For example, a ‘primitive’ marker, CD133, may occasionally become negative; replacement by, for example, CD34 would then be adequate. This condition was met in the current study because in 18% of cases at least two primitive markers among CD34, CD133 and CD117 were present. In most cases, it is possible to replace a myeloid marker by another. As the expression of CD33, CD13 and CD117 is quite stable, the design of the same LAPs with alternative myeloid markers is not urgent.

Our group had agreed that LAPs should cover at least 10% of the blast cells at diagnosis: first, because ‘background’ levels in normal BM may amount to this percentage and, second, because lower frequencies will significantly hamper the sensitivity necessary for MRD detection.

Another important factor for the quality of a LAP is specificity, which is determined by LAP expression on control normal or regenerating BM. Cell populations with the immunophenotype of LAPs are present in frequencies varying from 0.001 to 0.1% in control BM. In practice, the contribution of these ‘LAP-like’ events to specific AML-derived, LAP+ cells will have to be defined in the setting of MRD. This interaction will strongly depend on the level and intensity of expression of the aberrant markers on the normal versus AML cell populations. AML cells at diagnosis may occupy the so-called ‘empty spaces’—that is, positions of antigen expression absent in normal cells; however, because of even moderate immunophenotypic changes during or after therapy, AML-specific LAP+ events may interfere with normal LAP+ events. Even if stringent precautions have been taken to standardize antibody–fluorochrome combinations, as well as instrumental setup and calibration over time, there is no guarantee that the AML LAP+ population will not be ‘contaminated’ by normal events with ‘LAP-like’ immunophenotypes. The impact of ‘background LAP’ on the reliable detection of specific (AML) LAP+ events is considerable: up to 0.1% of WBC and up to 10% of primitive marker compartment is possible; as a result, specific detection of MRD <0.1% is compromised (Figure 2). On the basis of its very high background levels, at the end of the learning phase, CD36 was concluded not to represent a valuable marker for MRD.

The type of LAP and their distribution over patient groups differ within the literature. In the present study, the frequency of LAPs with asynchronous expression was twice as high as for cross-lineage aberrancies, which is in agreement with some other adult studies performed with these two types of major aberrancies.¹⁴ However, in two previous studies on adult AML⁵ and childhood AML¹¹ in which we were involved, the frequencies of both LAP types were more or less equal.

Our study also indicated that MRD studies may be performed using different instruments: in our case, four Becton Dickinson and one Beckman Coulter (Miami, FL, USA) flow cytometer were used. This is of importance in view of international clinical studies in which MRD assessment may be dealt with by (core) centers that may use different brands of flow cytometers.

The large differences in specificity, as well as LAP coverage on blasts, represent a problem when cutoff levels have to be defined for clinical decision making: although in the case with high specificity and high sensitivity, MRD levels of 0.01% (that is, 1:10 000) can be accurately measured, in other cases levels of only 0.1% or even lower may be the maximally reachable sensitivity. Improvements can be expected from the use of more colors, as shown by Voskova et al.¹³, in the transition from four to five colors. The use of at least eight-color flow cytometry, now feasible in many institutes, will further improve flow-cytometric MRD analysis. The use of novel software tools (for example, Infinicyt (Cytognos, Salamanca, Spain)) for defining LAPs in an objective manner may further improve flow-cytometric MRD analysis.

In conclusion, our study has shown that immunophenotypic MRD assessment is a complex process that requires specific experience that is not covered by a large experience in only standard diagnostic immunophenotyping. The quality of MRD assessment can only be guaranteed after a substantial period of training under the guidance of highly experienced center(s). Therefore, we recommend that, currently, the number of centers that participate in large-scale multicenter MRD studies be restricted to a few core centers with MRD experience.

References

Béné MC, Kaeda JS . How and why minimal residual disease studies are necessary in leukemia: a review from WP10 and WP12 of the European LeukaemiaNet. Haematologica 2009; 94: 1135–1150.
Article Google Scholar
San Miguel JF, Vidriales MB, Lopez-Berges C, az-Mediavilla J, Gutierrez N, Canizo C et al. Early immunophenotypical evaluation of minimal residual disease in acute myeloid leukemia identifies different patient risk groups and may contribute to postinduction treatment stratification. Blood 2001; 98: 1746–1751.
Article CAS Google Scholar
San Miguel JF, Martínez A, Macedo A, Vidriales MB, López-Berges C, González M et al. Immunophenotyping investigation of minimal residual disease is a useful approach for predicting relapse in acute myeloid leukemia patients. Blood 1997; 90: 2465–2470.
CAS PubMed Google Scholar
Venditti A, Buccisano F, Del Poeta G, Maurillo L, Tamburini A, Cox C et al. Level of minimal residual disease after consolidation therapy predicts outcome in acute myeloid leukemia. Blood 2000; 96: 3948–3952.
CAS PubMed Google Scholar
Feller N, van der Pol MA, van Stijn A, Weijers GW, Westra AH, Evertse BW et al. MRD parameters using immunophenotypic detection methods are highly reliable in predicting survival in acute myeloid leukaemia. Leukemia 2004; 18: 1380–1390.
Article CAS Google Scholar
Kern W, Voskova D, Schoch C, Hiddemann W, Schnittger S, Haferlach T . Determination of relapse risk based on assessment of minimal residual disease during complete remission by multiparameter flow cytometry in unselected patients with acute myeloid leukemia. Blood 2004; 104: 3078–3085.
Article CAS Google Scholar
Sievers EL, Lange BJ, Alonzo TA, Gerbing RB, Bernstein ID, Smith FO et al. Immunophenotypic evidence of leukemia after induction therapy predicts relapse: results from a prospective Children’s Cancer Group study of 252 patients with acute myeloid leukemia. Blood 2003; 101: 3398–3406.
Article CAS Google Scholar
Campana D, Coustan-Smith E . Detection of minimal residual disease in acute leukemia by flow cytometry. Cytometry 1999; 38: 139–152.
Article CAS Google Scholar
Langebrake C, Creutzig U, Dworzak M, Hrusak O, Mejstrikova E, Briesinger F et al. Residual disease monitoring in childhood acute myeloid leukemia by multiparameter flow cytometry: the MRD-AML-BFM Study Group. J Clin Oncol 2006; 24: 3686–3692.
Article Google Scholar
Buccisano F, Maurillo L, Del Principe MI, Del Poeta G, Sconocchia G, Lo-Coco F et al. Prognostic and therapeutic implications of minimal residual disease detection in acute myeloid leukemia. Blood 2012; 119: 332–341.
Article CAS Google Scholar
van der Velden VH, van der Sluijs-Geling A, Gibson BE, te Marvelde JG, Hoogeveen PG, Hop WC et al. Clinical significance of flowcytometric minimal residual disease detection in pediatric acute myeloid leukemia patients treated according to the DCOG ANLL97/MRC AML12 protocol. Leukemia 2010; 24: 1599–1606.
Article CAS Google Scholar
Rubnitz JE, Inaba H, Dahl G, Ribeiro RC, Bowman WP, Taub J et al. Minimal residual disease-directed therapy for childhood acute myeloid leukaemia: results of the AML02 multicentre trial. Lancet Oncol 2010; 11: 543–552.
Article CAS Google Scholar
Voskova D, Schnittger S, Schoch C, Haferlach T, Kern W . Use of five-color staining improves the sensitivity of multiparameter flow cytometric assessment of minimal residual disease in patients with acute myeloid leukemia. Leuk Lymphoma 2007; 48: 80–88.
Article CAS Google Scholar
Al-Mawali A, Gillis D, Lewis I . The role of multiparameter flow cytometry for detection of minimal residual disease in acute myeloid leukemia. Am J Clin Pathol 2009; 131: 16–26.
Article Google Scholar
Macedo A, San Miguel JF, Vidriales MB, López-Berges MC, García-Marcos MA, Gonzalez M et al. Phenotypic changes in acute myeloid leukaemia: implications in the detection of minimal residual disease. J Clin Pathol 1996; 49: 15–18.
Article CAS Google Scholar
Baer MR, Stewart CC, Dodge RK, Leget G, Sulé N, Mrózek K et al. High frequency of immunophenotype changes in acute myeloid leukemia at relapse: implications for residual disease detection (Cancer and Leukemia Group B Study 8361). Blood 2001; 97: 3574–3580.
Article CAS Google Scholar
Van Wering ER, van Lochem EG, Leenders M, van der Sluijs-Gelling AJ, Wind H, Gratama JW et al. Three-color flowcytometric analysis of mature and immature hematological malignancies. A guideline of the Dutch Foundation for Immunophenotyping of Hematological Malignancies (SIHON). Biol Regul Homeost Agents 2004; 18: 313–326.
CAS Google Scholar
Levering WH, Preijers FW, van Wieringen WN, Kraan J, van Beers WA, Sintnicolaas K et al. Flow cytometric CD34+ stem cell enumeration: lessons from nine years external quality assessment within the Benelux countries. Cytometry B Clin Cytom 2007; 72: 178–188.
Article Google Scholar
Feller N, Schuurhuis GJ, van der Pol MA, Westra G, Weijers GW, van Stijn A et al. High percentage of CD34-positive cells in autologous AML peripheral blood stem cell products reflects inadequate in vivo purging and low chemotherapeutic toxicity in a subgroup of patients with poor clinical outcome. Leukemia 2003; 17: 68–75.
Article CAS Google Scholar

Download references

Acknowledgements

The European Society for Clinical Cell Analysis (ESCCA) supported the study with a grant from BD Biosciences (CA, USA) for the screening panel of monoclonal antibodies for the FACSCalibur users. We thank Ellen van Lochem and Jaco Kraan for their contributions during the initial phase of the study and Jennita Slomp (Medisch Spectrum Twente, The Netherlands) for critically reading the manuscript.

Author information

Authors and Affiliations

Department of Hematology, VU University Medical Center, Amsterdam, The Netherlands
N Feller, A Kelder, G Westra & G J Schuurhuis
Department of Immunology, Erasmus MC, Erasmus University Medical Center, Rotterdam, The Netherlands
V H J van der Velden, J G te Marvelde & H Wind
Department of Medical Oncology, Laboratory of Clinical Tumor Immunology, Erasmus MC Daniel den Hoed Cancer Center, Rotterdam, The Netherlands
R A Brooimans, P Aerts & J W Gratama
Department of Laboratory Medicine, UZ Leuven, Leuven, Belgium
N Boeckx
Department of Oncology, KUL, Leuven, Belgium
N Boeckx
Department of Laboratory Medicine, Laboratory for Hematology, Radboud University Nijmegen Medical Center (RUNMC), Nijmegen, The Netherlands
F Preijers & M Leenders
Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
I de Greef

Authors

N Feller
View author publications
You can also search for this author in PubMed Google Scholar
V H J van der Velden
View author publications
You can also search for this author in PubMed Google Scholar
R A Brooimans
View author publications
You can also search for this author in PubMed Google Scholar
N Boeckx
View author publications
You can also search for this author in PubMed Google Scholar
F Preijers
View author publications
You can also search for this author in PubMed Google Scholar
A Kelder
View author publications
You can also search for this author in PubMed Google Scholar
I de Greef
View author publications
You can also search for this author in PubMed Google Scholar
G Westra
View author publications
You can also search for this author in PubMed Google Scholar
J G te Marvelde
View author publications
You can also search for this author in PubMed Google Scholar
P Aerts
View author publications
You can also search for this author in PubMed Google Scholar
H Wind
View author publications
You can also search for this author in PubMed Google Scholar
M Leenders
View author publications
You can also search for this author in PubMed Google Scholar
J W Gratama
View author publications
You can also search for this author in PubMed Google Scholar
G J Schuurhuis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G J Schuurhuis.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Supplementary Information accompanies this paper on Blood Cancer Journal website

Supplementary information

Supplementary Figure 1 (PDF 7 kb)

Supplementary Figure 2 (PDF 38 kb)

Supplementary Table 1 (PDF 64 kb)

Supplementary Table 2 (PDF 22 kb)

Supplementary Table 3 (PDF 29 kb)

Supplementary Information (DOC 23 kb)

Supplementary Information (DOC 38 kb)

Rights and permissions

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

Reprints and permissions

About this article

Cite this article

Feller, N., van der Velden, V., Brooimans, R. et al. Defining consensus leukemia-associated immunophenotypes for detection of minimal residual disease in acute myeloid leukemia in a multicenter setting. Blood Cancer Journal 3, e129 (2013). https://doi.org/10.1038/bcj.2013.27

Download citation

Received: 18 June 2013
Accepted: 25 June 2013
Published: 02 August 2013
Issue Date: August 2013
DOI: https://doi.org/10.1038/bcj.2013.27

Keywords

This article is cited by

Impact of hemodilution on flow cytometry based measurable residual disease assessment in acute myeloid leukemia
- Jesse M. Tettero
- Maaike E. Heidinga
- Jeroen J. W. M. Janssen
Leukemia (2024)
“Blasts” in myeloid neoplasms – how do we define blasts and how do we incorporate them into diagnostic schema moving forward?
- Xueyan Chen
- Jonathan R. Fromm
- Kikkeri N. Naresh
Leukemia (2022)
Reproducible measurable residual disease detection by multiparametric flow cytometry in acute myeloid leukemia
- Maximilian A. Röhnert
- Michael Kramer
- Malte von Bonin
Leukemia (2022)
Genetic diversity within leukemia-associated immunophenotype-defined subclones in AML
- F. Tiso
- T. N. Koorenhof-Scheele
- J. H. Jansen
Annals of Hematology (2022)
Aberrant myelomonocytic CD56 expression in Down syndrome is frequent and not associated with leukemogenesis
- Manisha Gadgeel
- Batool AlQanber
- Süreyya Savaşan
Annals of Hematology (2021)

Subjects

Abstract

Similar content being viewed by others

Introduction

Materials and methods

Patients and cells

Participating centers and working plan

Immunophenotypic labeling

Data exchange and data interpretation

Statistics

Results

Standard immunophenotypic panel (first run)

Number and nature of consensus LAPs

Sensitivity of the consensus LAPs

Specificity of the consensus LAPs

Combined analysis of sensitivity and specificity

Stability of LAPs

Performance of individual centers

Missed LAPs

Number of patients with no second LAP defined; comparison between the institutes

Types of missed LAPs

Incorrectly assigned LAPs

Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Search

Quick links