2 Recent Advances in Nanopore Sequencing Technology
A significant challenge for nanopore sequencing is its relatively high error rate, which, in the early days, was as high as 15%, predominantly constituted by insertions and deletions (indels). To improve the accuracy of the results, scientists often use a method known as hybrid error correction, integrating high-accuracy short reads (e.g., Illumina short reads) with long reads from nanopore sequencing. This method employs the precision of short reads to rectify errors in long reads, thereby obtaining more accurate long reads [
23]. Hence, by using short reads to correct errors in long reads, the benefits of long reads of nanopore sequencing are combined with the high accuracy of short-read sequencing.
Nanopore sequencing technology has been continually evolving and improving with enhanced data quality and accuracy. Recently, ONT introduced a novel R10.4 chip, which has a higher sequencing accuracy than its predecessor, R9.4.1, primarily owing to its enhanced ability to identify homopolymer repeats. Homopolymer repeats, which are consecutive repeat sequences of the same base, are ubiquitous in microbial genomes and frequently cause errors in nanopore sequencing runs using earlier versions of the technology. The R10.4 chip optimizes identification algorithms and hardware design to generate accurate microbial genomes without the need for short-read or reference genome correction [
24].
The advent of the R10.4 chip marks a significant breakthrough in the accuracy and complex sequence processing of nanopore sequencing technology. This development will greatly improve the reliability of microbial genome sequencing, thereby more accurately revealing the characteristics and functions of microbial genomes. These technological advances carry significant implications for scientific research and open new possibilities for the diagnosis and treatment of microbial diseases.
Q20+ is a sequencing technology that combines the latest Q20+ chemical reagent, a new reagent kit (Kit12) supporting duplex sequencing, and the most recent R10.4 chip. It utilizes a new reaction buffer and a new data processing algorithm to achieve longer read lengths and higher coverage, thereby exhibiting an improved ability to rectify errors that occur during the sequencing process. Under the Q20+ mode, multiple copies of a particular DNA/RNA region are stacked, implying that random errors are averaged out, yielding a more accurate consensus sequence. For example, in bacterial genome sequencing, achieving a quality score of Q50 is challenging, even at high depth in the standard mode. However, Q20+ can achieve a Q50 quality (99.999% accuracy) at a sequencing depth of 20×. The advent of Q20+ has enabled nanopore sequencing technology to reduce its average error rate to less than 1%, achieving > 99% original read (single-strand) accuracy, or approximately Q30 duplex accuracy, and enhancing the precision of shared sequence sequencing and variant identification. As such, nanopore sequencing has reached an important milestone in accuracy [
25].
Q20+ allows high-resolution bacterial typing and genetic variation analysis through highly accurate long reads, an aspect that traditional nanopore sequencing technology struggles to achieve. This improvement not only enhances the accuracy of sequencing results but also broadens the application scope of nanopore sequencing, enabling it to compete with short-read sequencing in tasks that require high resolution and accuracy. Real-time nanopore Q20+ sequencing can be used for rapid and precise bacterial pathogen monitoring, including core genome multi-locus sequence typing (cgMLST), virulence-factor screening, and antibiotic-resistance gene screening. Analyzing the genetic relationships and evolutionary history among different strains can help clinicians better understand the epidemiological characteristics, transmission routes, and drug resistance of pathogenic microbes, thereby guiding the formulation of clinical treatments and preventive measures. For example, cgMLST can identify genetic differences and relationships among different strains, subsequently determining whether a cluster infection or epidemiological link exists, and promptly implementing appropriate isolation, disinfection, and treatment measures. Additionally, by sequencing and comparing the genomes of different strains, important genes, such as antibiotic-resistance genes and virulence factors, can be identified, and their possible expression levels and functions can be predicted, providing a basis for personalized treatment.
Although the emerging Q20+ nanopore sequencing technology has led to improvements in read length, coverage, and accuracy, it cannot yet fully replace short-read technologies. Q20+ offers enhanced read length and accuracy, undeniably beneficial for whole-genome sequencing, chromosomal structural variation detection, and other applications. In these areas, Q20+ has the potential to become the preferred sequencing method. Although the Q20+ error rate has been reduced to less than 1%, this rate is still higher than that of some second-generation sequencing platforms (e.g., Illumina) [
26]. Moreover, the approach to select the most appropriate data analysis strategy and resolve potential issues of host DNA interference requires further investigation. Therefore, in applications that require very high accuracy or large-scale sample sequencing, such as single-nucleotide variation detection or extensive population genetic studies, short-read sequencing may still be the preferred option.
Table
1 expounds on the operational principles of the first, second, and third-generation sequencing technologies, showcasing the field's technical progression. Subsequently, Table
2 presents a thorough comparison of these generations, focusing on Sanger's 3730xl, Illumina's MiSeq, and ONT's MinION. The comparison is structured around critical metrics, such as read length, maximum throughput, data volume, cost, time to data, run duration, device size, and consensus sequence accuracy. Lastly, Table
3 details the features and specifications of four Oxford nanopore sequencing platforms: Flongle, MinION, GridION, and PromethION. The compared parameters include read length, number of flow cells, independence of flow cells, maximum theoretical output, real-time data availability, run time for maximum output, machine size, weight, test cost, and hardware requirements.
Table 1
Comparative overview of the working principle of the sequencing technologies
First generation: Sanger sequencing | Sanger sequencing, based on the selective incorporation of chain-terminating ddNTPs during in vitro DNA replication, allows sequencing of the targeted single-stranded DNA. Each ddNTP is labeled with a distinctive fluorescent tag for detection | |
Second generation: next-generation sequencing | Next-generation sequencing uses a reversible terminator-based method. The target DNA is amplified using a PCR to create a sufficient number of fragments for detection. However, the quality of base sequencing can be affected as the read length increases due to the potential misalignment of replicated DNA clusters | |
Third generation: nanopore sequencing | In nanopore sequencing, sequencing occurs without the need for DNA polymerase. DNA or RNA molecules are read directly as they pass through nanopores, providing truly real-time sequencing. This process is achieved by applying a voltage across a biological membrane that contains nanopores, which drives the translocation of the single-stranded molecules to be sequenced through the nanopores | |
Table 2
Comparison of sequencing platforms
Sanger (3730xl) | 400–900 bp | 1 per reaction | 96 kb/run | 99.99% | Several hours to days | Several hours to days | W 100 cm, H 65 cm, D 93 cm | 180 kg | $50,000–$200,000 | |
Illumina (MiSeq) | 2 × 300 bp | 22–25 million per run | 13.2–15 Gb/run | > 70% bases higher than Q30 | Hours to days | 56 h | W 55 cm, H 70 cm, D 55 cm | 93.6 kg | $1000–$5000 | |
ONT (MinION) | > 4 Mb | Thousands to millions of reads per run | 50 Gb/run | Kit 14 Chemistry > Q20 (99%) for simplex reads, ~ Q30 (99.9%) for duplex reads | Real-time | 72 h | W 10.5 cm, H 2.3 cm, D 3.3 cm | 87 g | $1000 | |
Table 3
Comparison of key parameters of Oxford nanopore sequencing platforms
Flongle | Up to > 4 Mb | 1 | No | Up to 2.8 Gb | Yes | 16 h | W 10.5 cm, H 2.3cm, D 0.8 cm | 20 g | $90 per flow cell | MinION or GridION |
ONT (MinION) | Up to > 4 Mb | 1 | Yes | Up to 50 Gb | Yes | 72 h | W 10.5 cm, H 2.3 cm, D 3.3 cm | 87 g | Starting at $1000 | Laptop/tablet |
GridION | Up to > 4 Mb | 1-5 | Yes | Up to 250 Gb | Yes | 72 h | W 37 cm, H 22 cm, D 36.5 cm | 11 kg | Starting at $49,955 | Integrated high-capacity computation |
PromethION | Up to > 4 Mb | 1–48 (P24), 1–96 (P48) | Yes | Up to 14 Tb | Yes | 72 h | W 59 cm, H 19 cm, D 43 cm | 28 kg | Starting at $10,455 | Powerful computational capability |
The processing of data generated by nanopore sequencing, especially for metagenomic analysis, involves the use of an array of bioinformatics tools and methods. The raw sequencing signal data requires initial quality control and filtering, a task that can be accomplished by tools such as Cramino or Kyber [
33]. Furthermore, the interpretation of raw current signals from nanopore sequencing, thereby enhancing sequence accuracy, can be facilitated by Nanopolish [
34].
After initial quality control and filtering, the alignment and assembly of reads using reference databases is a critical step. This process can be efficiently performed using minimap2 [
35] or NGMLR [
36], both capable of handling the unique attributes of long-read data from nanopore sequencing.
Unicycler [
37] emerges as a potent assembly tool specifically designed for bacterial genomics. It employs a unique algorithm combining short, accurate reads from Illumina sequencing with long reads from nanopore sequencing to produce high-quality hybrid assemblies. The algorithm conducts multiple rounds of read alignments and assembly graph cleaning, thereby accurately resolving repeats in bacterial genomes, making Unicycler a valuable asset for nanopore sequencing data processing.
For read assembly, other tools, such as GoldRush [
38] and Shasta [
39], can offer effective solutions, while gene prediction and annotation, crucial parts of nanopore sequencing data processing, can be conducted rapidly and accurately by tools like Prokka [
40].
In addition to the above, BacWGSTdb 2.0 [
41] serves as a comprehensive platform for bacterial whole-genome sequence analysis and source tracking. This user-friendly tool integrates an extensive range of bacterial genome sequencing data and associated metadata. It incorporates specialized features for multiple genome analysis, bacterial isolate characterization, and user-uploaded sequence comparison, making it an invaluable resource for downstream analysis following nanopore sequencing.
For metagenomic analysis, specialized tools, such as MetaBAT [
42], prove to be essential. These tools can separate the genomes of individual species from metagenomic sequence data, thus elucidating the composition of microbial communities.
In terms of de novo assembly of nanopore sequencing data, Flye and Canu [
43] are remarkable tools. Flye specializes in long-read assembly and excels in constructing high-quality, particularly circular, bacterial genome sequences. Canu, on the other hand, adeptly handles both long and short reads and is particularly effective in hybrid data assembly. Despite differences in their strengths and operational requirements, both tools manage the high error rates and variability in read lengths, delivering robust genome assembly tailored to the specific research objectives and data types.
Lastly, for the taxonomic classification of metagenomic data, Kraken 2 [
44] has emerged as an exceptionally useful tool. With its high-speed operation, accuracy, and parameter optimization flexibility, Kraken 2 is positioned as a vital asset in the classification and identification of sequences within nanopore sequencing data analysis.
6 Application of Nanopore Sequencing in Metagenomics
Second-generation sequencing, with its high-throughput, is widely used in metagenomics, also known as metagenomic NGS (mNGS). Metagenomic sequencing involves sequencing all microbial nucleic acids in environmental samples. When applied clinically, its greatest advantage lies in its unbiased detection of all pathogens, including bacteria, fungi, viruses, and atypical pathogens, and its potential to discover unknown pathogens. Due to the large total genome volume, the shotgun method is used to break up the target DNA fragments, which are then sequenced by a computer assembly [
94]. Eukaryotic organisms often contain a large number of repeat sequences, and second-generation sequencing, with its short read length, introduces more misalignments when dealing with complex repeat sequences, and the assembly process is more time-consuming. Nanopore sequencing has shown great potential in metagenomic studies. Its long read length can provide a more complete and continuous genome assembly [
95]; it also has advantages in sequencing genome repeat areas and structural variant regions.
However, for samples containing host organisms, the removal of host DNA poses a major challenge. For example, when pathogenic microorganisms infect the lungs, 99% of the nucleic acids extracted from samples taken from the infected area come from the human host, vastly outnumbering the DNA of the pathogenic microorganisms and limiting the sensitivity of detection [
94,
96]. Although this inherent drawback can be improved by methods such as host gene depletion by DNA exhaustion and differential lysis [
97,
98], the process of removing or reducing host DNA to enrich pathogenic microbial DNA may lead to the loss or bias of pathogenic microbial DNA in the sample, thereby affecting subsequent genomic analysis results [
99].
Compared with second-generation sequencing technology, traditional nanopore sequencing has a relatively high error rate, which may lead to the misidentification or omission of certain microbial groups when dealing with regions of host and microbial DNA that are extremely similar [
100]. However, the recent R10.4 chip developed by ONT enhances the recognition of homopolymeric compounds and can generate accurate microbial genomes without the need for short-read or reference genome correction.
Although nanopore sequencing faces some technical and data analysis challenges in dealing with host DNA removal in metagenomic samples, the latest technological advancements, especially the emergence of the R10.4 chip, provide new possibilities for solving these problems [
24].
Mu et al. [
101] collected BALF and sputum specimens from hospitalized patients with suspected lower respiratory tract infections and performed metagenomic tests using nanopore sequencing and confirmed the results using qPCR and Sanger sequencing. The turnaround time from sampling to obtaining results was approximately 6 h. In contrast, the turnaround time of conventional culture was approximately 94 h. Compared with conventional culture and real-time PCR diagnostic tests, rapid metagenomics achieved 96.6% sensitivity and 88.0% specificity. Among the five diseases caused by lower respiratory tract infections, the diagnostic accuracy was the highest in patients with community-acquired pneumonia, with 97.6% sensitivity and 90.2% specificity. The investigators successfully identified 63 pathogens in 161 culture-negative samples. Wang et al. [
80] performed nanopore sequencing on culture-negative pulmonary tissue biopsy specimens from patients with severe pneumonia who had been treated with empirical antibiotics.
K. pneumoniae was rapidly identified within 1 min through a specific sequence of 823 bp. In this study, the use of antibiotics prior to sample acquisition reduced the sensitivity of culture, whereas nanopore sequencing could still rapidly detect the pathogen with a small number of sequences. These studies demonstrate that nanopore sequencing-based metagenomic testing has advantages over conventional culture in terms of rapid turnaround time and sensitivity.
7 Comparison of Nanopore Sequencing with the BioFire System
The BioFire system is a product series from BioFire Diagnostics (Salt Lake City, USA), which includes the FilmArray Pneumonia Panel, FilmArray Respiratory Panel, and several other testing chips. This system is a rapid molecular diagnostic system based on PCR, equipped with pre-designed nested multiplex PCR kits that can perform simultaneous detection of up to 20 respiratory infection pathogens within 1 h [
102].
The BioFire system also has its limitations. First, the kits of the BioFire system are pre-designed, mainly for the detection of specific pathogens or combinations of pathogens. Therefore, its target detection range is restricted and cannot cover all possible pathogens. Second, although the BioFire system can simultaneously detect multiple pathogens, its throughput is relatively low. Nanopore sequencing, on the contrary, has higher throughput and can handle a larger number of samples. Finally, the BioFire system focuses on the detection of specific gene fragments and cannot provide complete genomic information, whereas nanopore sequencing provides more comprehensive genomic sequencing and analysis, including the detection of unknown sequences and structural variants [
63].
However, the higher sequencing error rate of nanopore sequencing technology might affect certain analyses that require high-precision sequences. In addition, nanopore sequencing data processing and analysis require certain bioinformatics knowledge. Compared with the BioFire system, the operation and data processing procedures of nanopore sequencing are more complex.
In summary, the BioFire system is more suited for rapid screening of known and common pathogens in clinical environments; whereas, nanopore sequencing is more appropriate for the identification of unknown or rare pathogens and in-depth and comprehensive genome analyses, such as drug-resistance detection and gene-expression analysis.
8 Summary and Vision
The exceptional read length of nanopore sequencing technology offers new possibilities for optimizing metagenomic sequencing and 16S rRNA targeted sequencing protocols. Particularly in dealing with the diagnosis and treatment of rare and emerging pulmonary infections, it provides clinicians with richer and more in-depth information. The relatively low startup cost and rapid turnaround time render it particularly suitable for the detection of pathogens in acute and severe pulmonary infections in clinical settings. In addition, its portability enables bedside testing and rapid detection in resource-limited environments, including in field investigations of infectious disease outbreaks.
However, nanopore sequencing technology also faces certain challenges. Although recent technological advancements, such as the optimization of algorithms and nanopore structures, and improvements in reagents, have significantly ameliorated the issue of high error rates that were present in its early iterations, it currently cannot entirely replace shorter-read technologies with higher accuracy. Appropriate diagnostic tools should be selected according to research needs.
Looking forward, nanopore sequencing technology has the potential to overcome current bottlenecks in molecular diagnostics. Combined with metagenomics, amplicon sequencing, PCR, and mass spectrometry, among other technologies, nanopore sequencing could jointly promote the development of diagnostic and therapeutic technologies for pulmonary infections.