Background
Non-small cell lung carcinoma (NSCLC) has been categorized into several distinct entities by molecular characterization of genetic alterations occurring during epithelial cell transformation. These alterations lead mainly to the activation of oncogenes such as
EGFR,
KRAS,
NRAS,
BRAF or
ERBB2, [
1‐
3] which occur through point mutations, small deletions or insertions and, more rarely, amplifications. Several other key drivers have been implicated in lung cancer carcinogenesis through other mechanisms. Indeed, chromosomal rearrangements involving the tyrosine kinase receptor genes
ALK, [
4]
ROS1, [
5]
RET, [
6‐
8] and
NTRK1, [
9] have been more recently described, extending the repertoire of molecular alterations found in NSCLC. These fusion events, involving a variety of partner genes, result in the formation of chimeric fusion kinases capable of oncogenic transformation and induction of oncogene dependency within the neoplastic cells. The prevalence of each of these chromosomal rearrangements individually is 1–7% in NSCLC [
4,
6,
10,
11], and altogether can be identified in approximately 5–9% of NSCLC [
7,
12,
13].
The development of drugs that specifically target fusion proteins encoded by these rearrangements [
9,
11,
14] has driven the need for systematic sensitive assays to detect them. Lung cancer fusions have traditionally been detected using FISH, IHC, or RT-PCR. While FISH is considered the gold standard, especially for ALK testing due to the availability of an FDA-approved ALK FISH assay, FISH analysis for multiple targets per sample can be costly. The massively parallel nature of next generation sequencing (NGS) allows a rapid characterization of point mutations, small insertions and deletions. Additionally, NGS can be used for the detection of chromosome rearrangements in a large set of genes by targeted sequencing of the fusion junctions or by paired-end mapping methods. In this study we validated a new library kit, the Ion AmpliSeq™ RNA Fusion Lung Cancer Research Panel, for characterization of the most frequent chromosome rearrangements in lung adenocarcinoma by NGS. This library kit is based on the high-multiplexing capabilities of PCR and focuses on the identification of 72 different transcripts. We report the sensitivity and specificity of this assay for the detection of gene fusions implicated in NSCLC.
Methods
Samples
A total of 138 clinical research samples previously tested for ALK, ROS1, and/or RET rearrangements were collected from 10 participating laboratories. All clinical research samples were studied in the laboratory of origin. All samples were from resections or biopsies that had been formalin-fixed and paraffin-embedded (FFPE), with the exception of three fresh frozen samples (one resection and two pleural effusions). These included 128 samples previously tested for ALK rearrangements by fluorescence in situ hybridization (FISH). Sixty-five of these samples had also been tested for ALK rearrangements by another method: immunohistochemistry (IHC), reverse transcription (RT)-PCR, and/or mass spectrometry (performed on the MassARRAY System from Agena Bioscience, San Diego, CA). Categorization of the ALK-tested samples as positive, negative or inconclusive was determined by the FISH results, as this methodology is considered the gold standard for ALK testing. For those samples previously tested by multiple methods, any discrepancies in results between the methodologies were noted. Thirteen of the ALK samples had also been tested for ROS1 and/or RET rearrangements. An additional 10 clinical research samples previously tested for ROS1 and/or RET, but for which ALK testing results were unavailable, were also included in this study. Categorization of the ROS1 and RET samples was based on the results from any available method, including FISH, IHC, RT-PCR and/or mass spectrometry, since there is not an established gold-standard for detection of these rearrangements.
RNA was extracted from each of the clinical research samples by the participating laboratories using their respective standard extraction procedures. Six of the ten laboratories used the RecoverAll Total Nucleic Acid Isolation Kit for FFPE (Thermo Fisher Scientific, Waltham, MA); remaining labs used the Qiagen RNeasy FFPE Kit (Qiagen, Hilden, Germany), the Qiagen AllPrep DNA/RNA FFPE Kit, or the Maxwell LEV RNA FFPE Purification Kit (Promega, Madison, WI). RNA was quantified using the Qubit RNA assay kits (Thermo Fisher) at eight of the laboratories; Quant-iT RiboGreen RNA Assay Kit (Thermo Fisher) and the Nanodrop 2000 instrument (Thermo Scientific) were also used for quantification.
In addition to the clinical research samples, a cocktail of RNA isolated from the ALK fusion-positive H2228 (ATCC CRL-5935), ROS1 fusion-positive HCC-78 (DSMZ ACC 563), and RET fusion-positive LC-2/ad (ECACC LC-2/ad) cell lines was prepared by Thermo Fisher Scientific and supplied to each of the participating laboratories. Select laboratories also prepared and tested RNA isolated from FFPE versions of these cell lines and RNA isolated from the ALK fusion-positive cell line H3122 (ECACC NCI-H322) and the NTRK1 fusion-positive cell line KM-12.
Ion AmpliSeq RNA fusion lung Cancer research panel design
Primers spanning 72 fusions (37
ALK, 9
RET, 15
ROS1, and 11
NTRK1) were designed by a research team at Thermo Fisher. These primers were designed to span all previously described fusions, at the time of development, for
ALK,
ROS1,
RET, and
NTRK1 in lung cancers. Sources used for the curation of known fusions included the COSMIC and NCBI databases, and review of current medical literature. Targeted fusion genes are shown in Table
1. The multiplex primer mix also included primers for the amplification of five housekeeping genes:
HMBS,
ITGB7,
LMNA,
MYC, and
TBP.
Table 1
Targeted Partners for ALK, RET, ROS1, and NTRK1
EML4 | KIF5B | CD74 | CEL |
KIF5B | CCDC6 | SDC4 | NFASC |
KLC1 | CUX1 | SLC34A2 | IRF2BP2 |
HIP1 | | EZR | TFG |
TPR | | TPM3 | SQSTM1 |
| | LRIG3 | SSBP2 |
| | GOPC | CD74 |
| | | DYNC2H1 |
| | | MPRIP |
Additionally, primers designed to amplify 5′ and 3′ regions of
ALK,
ROS1,
RET, and
NTRK1 were included in the primer mix. Amplification of these regions for each gene of interest allowed for the comparison of expression levels between the 3′ end of the gene, which is part of the resulting fusion, and the non-involved 5′end of the gene. A list of all targets in the multiplex PCR – including targeted fusions (genes and exons), expression control genes, and 3′and 5′regions – is available in Additional file
1: Table S1.
Detection of fusions
A minimum of 10 ng of total RNA was reverse transcribed using the SuperScript VILO cDNA Synthesis Kit followed by library generation using the Ion AmpliSeq Library Kit 2.0 and the Ion AmpliSeq RNA Fusion Lung Cancer Research Panel (hereafter, AmpliSeq Fusion Lung Panel). Barcodes were utilized during library generation using the Ion Xpress Barcode Adapters. Libraries were quantified using the Qubit DNA assay, the 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA) or the Ion Library Quantitation Kit, then pooled in equimolar concentrations for sequencing. Eight to sixteen libraries were multiplexed and templated using the Ion OneTouch2 System with the Ion PGM Template OT2 200 Kit. Libraries were sequenced using the Ion PGM Sequencing 200 v2 kit on an Ion 316 v2 or 318 v2 chip on the Ion PGM instrument. (All reagents and instrumentation above are from Thermo Fisher Scientific, with the exception of the BioAnalyzer.) Typically, eight samples were sequenced per 316 chip and sixteen samples per 318 chip.
After sequencing, unaligned BAM files were transferred to the Ion Reporter Software 4.2 and analyzed using the AmpliSeq Lung Fusion single sample workflow. This workflow utilizes a BED file comprised of chimeric sequences for targeted fusion transcripts along with sequences for the expression control genes and the 3′and 5′regions of ALK, ROS1, RET, and NTRK1. The alignment consists of three main steps. In the first step, the aligner requires that the reads align end to end (i.e, reads that are trimmed, or soft clipped, at the ends are not allowed). Each read is then aligned to the best primary alignment and filtering criteria are applied. Alignments to the fusion targets are counted only if the read overlaps at least 70% of the expected fusion insert with high local alignment score. Alignments to the imbalance and control targets are counted if the read overlaps at least 50%. In the second step, all unaligned reads, and reads that aligned but were filtered out, are split into two fragments. These fragmented reads are then re-aligned to the same reference file. Trimming of the reads is allowed in this step and all the alignments of every read (not just the primary alignments) are kept in the alignments files. This step helps recover more counts for the targets in the reference file and also finds any non-targeted fusion isoforms that are not present in the original list of targets. A novel fusion isoform involving existing primers is reported in the output if there is evidence from at least 100 different pairs of fragments. Lastly, counts from steps one and two are aggregated and all the fusion targets that have counts higher than the threshold are reported as “fusion present.” The algorithm generates a 3′/5′expression imbalance metric for each of the driver genes based on the individual counts of the 5′assay and 3′assay. It is calculated by subtracting the count of 5′reads from the count of 3′reads, and dividing the result by the sum of counts of all control targets. This metric can be used to confirm the detection of a known fusion or to predict a fusion in the sample that is not covered by the isoforms in the panel.
Discussion
The advent of therapies targeting the fusion proteins arising from
ALK,
ROS1, and
RET gene fusions makes the routine detection of these events important in patients with lung adenocarcinoma. We have described here an international, multi-institutional study using a multiplex RT-PCR next generation sequencing-based method that enables simultaneous detection of
ALK,
RET,
ROS1, and
NTRK1 gene fusion transcripts in a single assay. The simultaneous detection of these fusions has important implications for turn-around-time and cost. Further, it can be performed with very little input RNA. This is particularly attractive for an assay targeted at lung cancers, as these samples are often biopsies with limited available tissue. Lung cancer fusions have traditionally been detected using FISH, IHC, or RT-PCR. While FISH is considered the gold standard, especially for
ALK testing due to the availability of an FDA-approved
ALK FISH assay, FISH analysis for multiple targets per sample can be costly. Often these analyses are done in step-wise fashion, which can reduce the overall cost of performing multiple FISH assays, but potentially extend the time needed to rule out all relevant gene rearrangements. Immunohistochemistry staining offers a cheaper alternative; however, this methodology is subjective, sometimes making interpretation difficult. [
20] RT-PCR, on the other hand, can offer precise detection of fusions, including identification of both partner genes and the exons involved. The main limitation of traditional RT-PCR is that it typically focuses on only the most common fusion events and is thus limited in detecting rare exon combinations. [
21]
In contrast to FISH or IHC, the detection of ALK, ROS1, RET, and NTRK1 fusions are combined in a single assay with the AmpliSeq design. From the 70 clinical research samples that previously had been determined to be ALK-negative by FISH, we detected two ROS1 fusions and three RET fusions. Both of the ROS1 fusions and two of RET fusions were confirmed to be positive by orthogonal methods; tissue for additional testing was not available for the third RET-positive sample. Further, the detection of fusions by NGS offers a timely methodology that can also be designed to accommodate the simultaneous detection of point mutations and insertions and deletions in the DNA of relevant genes in a single assay. Analysis of these types of mutations, particularly in EGFR and KRAS, is typically part of the work-up of lung adenocarcinoma patients. Methods to detect both DNA mutations and fusion events in a timely manner are particularly important in these patients due to the aggressive nature of the disease. While combined analysis of DNA and RNA was not the focus of this study, it is currently being performed by many of the institutions that participated in this study.
The methodology described in this paper relies on RT-PCR for the initial amplification of fusion events; however, the design of this assay circumvents a limitation of traditional RT-PCR. The AmpliSeq Fusion Lung Panel assay includes multiplexing of primers for 72 different fusion combinations and thus is not limited to only the most common fusions. A second limitation of traditional RT-PCR is that one must have previous knowledge of all possible relevant fusions. The AmpliSeq assay addresses this issue in two ways. First, during the analysis of the sequenced reads, all reads that are initially unaligned to the reference sequence are split in half and allowed to re-align. This step fosters the detection of novel fusions involving existing primers. Secondly, the assay includes a method for detection fusions involving unknown partners using the 3′/5′imbalance calculation. This step analyzes the expression levels of the 3′ and 5′ends of each driver gene. For genes involved in a fusion event, the 3′ end of the gene is now under different regulatory control and shows overexpression relative to the 5′end of the gene. Another recently described methodology using NanoString technology also exploits this phenomenon of 3′overexpression. [
22] That study found that evaluation of the imbalance between 3′and 5′expression works relatively well for
ALK and
RET, which are normally not expressed in lung tissue, but that this calculation was more difficult for
ROS1 as this gene is normally expressed at high levels. Given that a positive imbalance result is suggestive of a fusion event, but alone does not identify an exact fusion, our suggestion for the AmpliSeq assay is to use the imbalance calculation as a method for identifying possible fusions that should be followed up with orthogonal testing methods if desired.
Further analysis of discordant samples within our study found that some samples had either low levels of rearranged cells by FISH or discordant results between FISH and IHC. One of the samples for which FISH testing showed 10% rearranged cells, was positive for a
HIP1-
ALK fusion upon repeat testing with the AmpliSeq assay. The repeat result had fusion reads falling just above the cut-off, while the initial negative result did identify the same fusion but with a number of reads falling below the cut-off, indicating the sample was likely approaching the limit of detection for the assay. Discordance between
ALK FISH and other methods has been noted previously [
23‐
25] and brings up the question of a true “gold standard.” Three of the
ALK FISH-positive samples for which the AmpliSeq assay was negative, were also negative by IHC. Additionally, we found that five of the discordant samples displayed single red signals by FISH. This phenomenon of a single red signal represents a likely deletion of the 5′end of
ALK and is not unusual for this structural variant; however, previous studies have also shown a similar discordance between
ALK FISH-positive results displaying a deletion of the 5′
ALK probe and IHC [
24] or PCR. [
26] The exact nature of these fusion events may be of interest for future studies. We also observed discordant results for one of the
ALK FISH-negative samples. In this case, the AmpliSeq assay identified an
EML4-
ALK fusion with a high number of reads and the sample was also positive by IHC. While this sample was officially classified as an AmpliSeq “false positive,” it likely represents a true positive in which FISH testing failed to detect the fusion.
A recent study using the AmpliSeq method for fusion detection reported 100% concordance between this and other methodologies. [
27] It is unknown, but probable, that the testing for this study was performed at a single institution. The difference between a single or limited institution study and a larger study (in this case, ten institutions) may explain the difference in concordance results between the Pfarr study [
27] and the study described here. The international, multi-institutional nature of this study presented many challenges. Scoring criteria between laboratories often varies even for well-established reference methods, e.g., some samples in this study were deemed FISH-positive, yet fell below the cut-off of 15% used by other institutions. A lack of concordance between multiple institutions for detection of
ALK rearrangements has been previously observed, [
20,
28] and this phenomenon may have contributed to the lower concordance of compared methods in this study. A further challenge of the multi-institutional study included a lack of material for follow up on discrepant samples, as the samples were not only from the participating institutions but in some cases were from additional laboratory partners. However, we believe that the advantages of this multi-institutional study far outweigh the disadvantages. Reproducibility across different laboratories using cell line mixtures was 100%, despite potential differences in laboratory practices and personnel. Additionally, an international, multi-institutional study such as this allows for the inclusion of more varied samples and more fully explores the performance of the assay.
Acknowledgements
The authors wish to thank Xiao Zhang (Queen’s University, Kingston Ontario, Canada), Miguel Silva (Ipatimup, Porto, Portugal) and Ana Justino (Ipatimup, Porto, Portugal).