Background
Coronaviruses (CoVs) are in the family
Coronaviridae, order
Nidovirales [
1,
2]. CoVs can infect humans and animals and thus have led to widespread and costly diseases, such as COVID-19 caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [
3‐
6]. CoVs contain the largest known viral RNA genome with the length of ~ 30 kilobases (kb). The genome structure consists of a cap, a 5’ untranslated region (UTR), open reading frames (ORFs), intergenic spaces, a 3’ UTR and a 3’ poly(A) tail. Nonstructural proteins (nsps) are derived from the 5’ two-thirds of the genome which contains two ORFs (ORF1a and ORF1b). The structural and accessory proteins, on the other hand, are encoded from subgenomic mRNAs (sgmRNAs), which are synthesized from the other one-third of the genome during coronavirus transcription [
7].
Defective viral genomes (DVGs) is a truncated version of the virus genome and can be found in most RNA viruses during infection [
8‐
10]. Because DVGs have been identified to have effects on tumor cells [
11], virus replication [
12] and pathogenicity [
13], research on DVG has regained attention in recent years. In addition to genomes and sgmRNAs, coronaviruses are also able to synthesize DVGs. Prior to the development of next-generation sequencing (NGS), only 9 coronavirus DVG species from mouse hepatitis viruses (MHVs), bovine coronavirus (BCoV), transmissible gastroenteritis virus (TGEV) and infectious bronchitis virus (IBV) have been experimentally identified [
14]. Because these previously identified DVGs contain
cis-acting elements required for gene expression in their 5’ and 3’ termini, they have been intensively employed as surrogates of the ~ 30 kb full-length genome for studies on coronavirus gene expression [
15‐
21]. With the development of NGS, more coronavirus DVG species have been discovered. However, the basic biological characteristics and thus the biological relevance of DVGs in coronavirus gene expression and pathogenesis remain to be defined.
It has been suggested that in Brome mosaic virus that the AU-rich sequence is a hot spot involved in the recombination and synthesis of a smaller size of viral RNA [
22]. Since the coronavirus DVGs have been speculated to be synthesized through a copy-choice template switching recombination process [
14], whether coronavirus full-length genome bears the sequence features for potential recombination to synthesize DVGs has not been analyzed.
In the current study, in addition to the well-known coronavirus genomes and sgmRNAs, coronavirus DVGs were comprehensively and experimentally analyzed both in vitro and in vivo by RT-PCR with the assistance of nanopore direct RNA sequencing. Furthermore, the biological features of coronavirus DVGs in terms of the structure, classification, abundance, origin, reproducibility and altered species and amounts under different infection environments were also determined. It is expected that the unveiled characteristic of coronavirus DVGs may provide a database for studies of coronavirus gene expression and pathogenesis and thus assist the coronavirus community to develop antiviral strategy.
Methods
Viruses, cells and animals
The plaque-purified Mebus strain of BCoV (GenBank: U00735.2) and MHV-A59 (GenBank: NC_048217.1) were used for the study. BCoV-p95 (GenBank: OP296992.1) is a BCoV variant with an altered genome structure of 106 nucleotide mutations obtained from supernatant of HRT-18 cells persistently infected with BCoV. Human rectal tumor (HRT)-18 cells, mouse L (ML) cells, adenocarcinomic human alveolar basal epithelial (A549) cells and baby hamster kidney (BHK) cells were grown in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum (HyClone, UT, USA) at 37 °C with 5% CO2. Mice were maintained according to the guidelines established in the “Guide for the Care and Use of Laboratory Animals” prepared by the Committee for the Care and Use of Laboratory Animals of the Institute of Laboratory Animal Resources Commission on Life Sciences, National Research Council, USA. The animal study was reviewed and approved (IACUC No.: 108–110) by the Institutional Animal Care and Use Committee of National Chung Hsing University, Taiwan.
Nanopore direct RNA sequencing and data analyses
For nanopore direct RNA sequencing, total cellular RNA was collected from BCoV-infected HRT-18 cells and MHV-A59-infected ML cells at a multiplicity of infection (MOI) of 0.1. Total cellular RNA was collected at 24 hours (for BCoV) or 20 hours (for MHV-A59) postinfection. In addition, 3-week-old male and specific pathogen-free BALB/c mice (BioLASCO Taiwan Co., Ltd.) were infected by intraperitoneal inoculation of 10
6 PFU of MHV-A59 in 500 µl of DMEM and total cellular RNA was harvested from the liver at 3 days postinfection. TRIzol (Thermo Fisher Scientific, Waltham, USA) was used to extract total cellular RNA and 500 ng of poly(A)-containing RNA was used for library preparation according to the manufacturer’s instructions (SQK-RNA001, Oxford Nanopore Technologies). Note that ENO2 mRNA, which was added during the library preparation for nanopore direct RNA sequencing supplied by SQK-RNA001 kit (Oxford Nanopore Technologies), was used as an RNA calibrant strand (RCS) to allow assess the RNA degradation during the library preparation based on the coverage of reads [
23,
24]. Two biological replicates were performed for nanopore direct RNA sequencing. The data processing codes for basecalling, alignment, and file transformation and primary alignment filtering were as follows: (i) guppy_basecaller --recursive --flowcell FLO-MIN106 --kit SQK-RNA002 -x cuda:0 --u_substitution 0 -i [Input.fast5] -s [output.fastq] --compress_fastq --disable_pings --num_callers 32 --min_qscore 7, (ii) minimap2 -Y -k 14 -w 1 --splice -g 30000 -G 30000 -F 40000 -N 32 --splice-flank = no --max-chain-skip = 40 -u n --MD -a -t 10 --secondary = no [ref] [query] and (iii) Samtools view [Input.sam] -b -f 0 | samtools -@ 10 | bedtools bamtobed -split > [output.bed]. The raw data were filtered with a quality score cutoff of 7 during base-calling. The reads with average quality score higher than 7 were kept for further analysis, and the low-quality reads were removed in this step. To recover the viral recombination reads for BCoV and MHV-A59, the alignment was processed by the minimap2. Furthermore, the secondary and supplementary reads were removed after alignment. The secondary alignments were the inferior alignments, while the supplementary reads were potentially the chimeric reads. Therefore, only the primary alignment reads were retained for further analysis. During the read classification, the reads were classified in the following order: (i) the number of fragments in the RNA transcripts, (ii) whether they contain 3’ UTR, (iii) whether they contain 5’ UTR and (iv) whether they are TRS-relevant. The detailed classification methods are described in Figures
S1 and
S2, and the associated figure legends. The BAM files were used for (i) the visualization of 5’ and 3’ terminal sequences of DVGs, (ii) analyses of the structures and amounts of coronavirus transcripts, (iii) analyses of the sequence flanking the recombination points of coronavirus DVGs and (iv) analyses of the reproducibility. For reproducibility, RNA transcript with a read count of ≥ 5 was applied and the reproducibility was measured in reads per kilobase per million mapped sequence reads (RPKM) and determined by Spearman’s correlation coefficient [
25].
Preparation of RNA for biological characterization of noncanonical transcripts
To determine the synthesis of BCoV DVGs, HRT-18 cells were infected with 0.1 MOI of BCoV followed by total cellular RNA collection at 2, 8, 24 and 48 h postinfection. To determine the origin of DVGs, the reverse-genetics system of infectious clone MHV-A59-1000 (icMHV), which is divided into 7 cDNA fragments and developed by Dr. Ralph Baric and colleagues, was used [
26]. After assembly of the 7 DNA fragments, the full-length viral RNA was in vitro-transcribed using the T7 mMessage mMachine kit (AM1344, Thermo Fisher Scientific, Waltham, USA) with the assembled full-length cDNA as a template. The in vitro-transcribed full-length viral genome was transfected into BHK-MHVR cells. After 48 h of transfection, supernatant (designated MHVVP0) was collected and total cellular RNA was harvested (designated VP0RNA). Plaque assay was employed to detect the virus titer and 0.1 MOI of MHVVP0 was used to infect fresh BHK-MHVR cells. Total cellular RNA was collected (designated VP1RNA). The virus passage step was repeated until VP2RNA was collected.
To evaluate whether the species and the amounts of DVGs were altered in different cells, HRT-18 cells, BHK cells, ML cells and A549 cells were infected with BCoV or BCoV-p95 at an MOI of 0.1, followed by total cellular RNA collection at 24 h postinfection. To determine whether the species and the amounts of DVGs were altered under antiviral selection pressure, HRT-18 cells were infected with 0.1 MOI of BCoV, and after 1 h of infection, HRT-18 cells were treated with the antiviral remdesivir (GS-5734) at final concentrations of 125, 250, 500 or 1000 nM. After 48 h of treatment with remdesivir, total cellular RNA was collected. To evaluate whether the species and the amounts of MHV-A59 DVGs were altered under IFN β treatment, ML cells in 2 ml of DMEM were treated with IFN β at final concentrations of 103, 104 or 105 U/mL. After 16 h of treatment, IFN β-treated ML cells were infected with 0.1 MOI of MHV-A59 followed by total cellular RNA collection at 16 h postinfection. To experimentally determine the synthesis of DVG in mice, 3-week-old male and specific pathogen-free BALB/c mice (BioLASCO Taiwan Co., Ltd.) were infected with 106 PFU of MHV-A59 in 500 µl of DMEM by intraperitoneal inoculation. The livers of MHV-A59-infected mice were collected at 3 days postinfection, and total cellular RNA was prepared.
Detection of DVGs by RT-PCR
The collected total cellular RNA from aforementioned procedures was used for cDNA synthesis. For this, 10 µg of collected total cellular RNA were used and reverse transcription (RT) was performed by SuperScript III reverse transcriptase (Thermo Fisher Scientific, Waltham, USA). The resulting cDNA was then used for detection of DVGs by PCR and primers (Table
S1) and the resulting mixture was heated to 94 °C for 2 min and subjected to 35 cycles of 30 s at 94 °C, 30 s at 55 °C and 90 s at 72 °C. The same cDNA used for detection of 18 S rRNA, coronavirus genome and sgmRNA was heated to 94 °C for 2 min and subjected to 25 cycles of 30 s at 94 °C, 30 s at 55 °C and 20 s at 72 °C.
Discussion
It is presumably that the coronavirus DVGs are synthesized through copy-choice template switching mechanism [
14]; however, the factors affecting the synthesis remain unclear. In the current study, it is suggested that DVGs can be derived from full-length genome (Fig.
6) and the sequences flanking the recombination point of DVGs are AU-rich. This structural features in coronavirus are consistent with those identified in other RNA viruses in which the AU-rich sequences are associated with the synthesis of DVGs [
22]. In addition, the previous study also suggests that the secondary structures near the recombination point as well as the protein factors also play important roles in facilitating recombination events [
30] and thus the synthesis of DVGs. In line with this, such a recombination event may also occur with a longer length of DVG as a template, leading to the synthesis of DVG with a shorter length. Consequently, this may increase the diversity of DVG species and possibly the protein species, contributing to coronavirus pathogenesis. Consequently, it is important to determine the synthesis mechanism of coronavirus DVGs. The identified structural features including AU-rich sequences and secondary structures as well as the proteins involved are all potential antiviral targets, contributing to disease control.
There are various definitions and classifications regarding the coronavirus RNA transcripts. The differences in definition and classification between the current study and others [
28,
31] are clarified as follows. Non-canonical subgenomic RNAs (nc-sgRNAs) defined by Nomburg et al., [
31] suggest that nc-sgRNAs are deleted versions of coronaviral genome with recombination points and are not associated with TRS. Based on the definition, nc-sgRNAs belong to the DVGs (Δ5’3’DVG, Δ3’DVG, Δ5’DVG and 5’3’ DVG) with two or more than two fragments, but not the noncanonical sgmRNAs, in the current study based on the classification criteria of RNA transcripts illustrated in Figures
S1 and S2, and the associated figure legends. Note that the noncanonical sgmRNAs defined in the current study are associated with TRS (Figures
S1 and S2). On the other hand, the defective interfering (DI) RNAs in DI particles defined by Girgis et al., [
28] are coronavirus RNA transcripts which maintain the ability to replicate and can be packaged. Because the defined DI RNA can replicate, they must contain the essential 5’ and 3’ UTR sequences derived from genome for replication. Thus, since the DI RNAs contain 5’ and 3’ UTR sequences and they are not associated with TRS, they belong to 5’3’DVG based on the classification criteria of DVGs in the current study (Figures
S1 and S2).
It has been suggested that the DVGs in Sendai virus can stimulate innate immunity [
32]. It remains unclear whether coronavirus DVGs bear the structures related to the stimulation of innate immunity. For example, it remains to be determined that whether all of the coronavirus DVG species have the structure of 5’ cap. If the coronavirus DVGs have no cap but bear 5′ triphosphate, during DVG synthesis, the DVGs with 5′ triphosphate may stimulate innate immunity and thus may affect the pathogenesis. On the other hand, if the coronavirus DVGs have cap structure, DVGs may have potential to encode proteins because the identified DVG species contain ORF(s) from one or different portions of full-length genome based on the results in Fig.
5 and obtained from nanopore RNA direct sequencing. Alternatively, it is also possible that some of the DVG species bear the cap structure, but others do not. In either case, such diverse structural features may play important roles in coronavirus pathogenesis. It is worthy of note that, because there are too many DVG species in infected cells, the read number for each DVG species is not high although collectively the total amount of DVGs are abundant and higher than that of canonical sgmRNA (Fig.
1 C-
1E). Consequently, it is proposed that DVGs may exert their function in populations but not in individuals either by their structures or by their encoded proteins. Thus, understanding the biological characteristics of DVGs in the current study is also a critical step to explore the mechanism of coronavirus pathogenesis.
Based on the results above, DVG species and their amounts are altered under different infection conditions. Such alteration may be a way for coronavirus to respond to environmental changes and may also contribute to coronavirus pathogenesis. This argument may be one of the reasons why infection of different cells or organs with the same coronavirus leads to different pathologic outcomes. It is speculated that the alteration in DVG species and amounts may suggest the existence of a related regulatory structure or molecule. Alternatively, it is also likely that the alteration may be caused by stochastic variation in different environments. However, the mechanisms of how DVG species and their amounts are altered in response to the different infection conditions remain unclear and thus need to be elucidated. Furthermore, since (i) the synthesized DVG species may differ depending on the infection environments (Fig.
8) and (ii) some of the coronavirus DVGs can replicate and can be packaged into virus particles [
15,
28], the DVG species in the new host cells could be from the last passage of the host cells or newly synthesized from the new cells. Consequently, the DVG species in virus particles transmitted among different hosts may also be different and may lead to different effects on infection. Lastly, the selected DVG population may potentially assist coronavirus in developing resistance against the same pressure, posing a concern in controlling coronavirus diseases.
Conclusions
With the assistance of nanopore RNA direct sequencing, we in the current study experimentally revealed the fundamental characteristics of coronavirus DVGs both in vitro and in vivo. The biological features of coronavirus DVGs in terms of abundance, reproducibility, and variety extend the current model for coronavirus gene expression. The unveiled characteristics of coronavirus DVGs in terms of abundance, reproducibility, the variety of the DVG structures and their protein-coding potential may contribute to the pathogenesis. In addition, the findings that the amounts and DVG species are alterted under different infection environments and selection pressures may further contribute to virus fitness and thus the pathogenesis. Consequently, the current study may contribute to a variety of biomedical studies including the synthesis mechanism of DVGs and its role in pathogenesis, contributing to development of antiviral strategy.
Acknowledgements
We thank Dr. Wei-Li Hsu at National Chung Hsing University, Taiwan, for the BHK cells, and Ruey-Yi Chang at National Dong Hwa University, Taiwan, for A549 cells. We thank Dr. Ralph Baric and colleagues at University of North Carolina, Chapel Hill, for the reverse-genetics system of MHV-A59 infectious clone MHV-A59-1000. We also thank Dr. David A. Brian at University of Tennessee, Knoxville, for providing HRT-18 cells, ML cells, BCoV and MHV-A59.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.