Background
Diarrhoeal diseases caused by enteric infections continue to pose a major threat to global health [
1,
2]. With respect to the overall impact on human health, diarrhoea ranks second among all infectious diseases [
3,
4], with approximately 2 billion incidences of diarrhoeal diseases reported annually in China alone [
5]. The World Health Organization (WHO)’s Global Burden of Disease study lists diarrhoeal diseases as one of the leading causes of preventable deaths worldwide [
6].
One of the greatest challenges in the diagnosis of diarrhoeal diseases is that a large number of aetiological agents are associated with generally non-specific clinical symptoms [
7]. Traditional detection methods for these pathogens in faecal specimens are culture-based methods, immunological detection assays, and molecular diagnostic methods [
8]. The culture-based methods, while considered the “gold standard” options for routine diagnosis, are time consuming [
9,
10]. Furthermore, infections caused by microbes with very stringent/specific requirements for culture conditions are likely to be underdiagnosed, especially in cases of polymicrobial infections. The development of molecular diagnostic methods such as real-time PCR, has increased the sensitivity of these methods, but typically, such testing is restricted to a single pathogen per test. Finally, metagenomic approaches using next-generation sequencing, while capable of generating virtually infinite amounts of information, require sophisticated analytical tools and time-consuming analysis and are very costly; thus, these methods cannot be broadly applied in everyday diagnosis of common diarrhoeas. Due to these limitations, precise aetiological diagnosis of enteric infections is not routinely accomplished, resulting in poor therapeutic efficacy and increased risks for the development of drug-resistant bacteria and introduction of imbalance in the intestinal microbiota (dysbiosis) [
11,
12].
Studies conducted worldwide demonstrate wide variations in the prevalence and composition of causative agents of acute diarrhoea [
13‐
16]. These discrepancies can arise due to non-uniform diagnostic approaches and preferential detection of some pathogens. These broad concerns resulted in the publication of clinical guidelines by the Expert Consensus on Diagnosis and Treatment of Infectious Diarrhoea in Chinese Adults (2013), which recommend aetiological diagnosis to promote rational treatment of diarrhoeal diseases, proper epidemiological studies of these diseases, and prevention of antimicrobial drug resistance [
17,
18].
Here, we redesign and adapt a high-throughput multiplex genetic detection system (HMGS) to screening faecal specimens for 19 major pathogenic diarrhoeal pathogens (DPs) that cause acute diarrhoeal infections [
19]. A total of 613 faecal specimens were analysed by the DP-HMGS assay, sequencing, and conventional methods (culture-based methods and singleplex real-time PCR) in parallel, and the methods were compared for accuracy and applicability in the generation of epidemiological data.
Methods
Ethics statement
This study was carried out in accordance with the recommendations of the Ethics Committee for Human Studies of Huadong Hospital and registered under Ethics Approval Number 2013-077 with written informed consent from all subjects. All subjects provided written informed consent in accordance with the Declaration of Helsinki.
Diarrhoeal pathogens
Based on epidemiological investigations, the DPs that most commonly cause diarrhoeal diseases in the Shanghai area were selected as candidates for the DP-HMGS screening assay [
20‐
22]. The six most common viral pathogens that cause outbreaks of gastroenteritis, namely, human astrovirus (HASV), norovirus II (NorV), human adenovirus (HADV), rotavirus A (RoVA), rotavirus B (RoVB), and rotavirus C (RoVC), as well as a negative control sapovirus were isolated from clinical specimens at Huadong Hospital, Shanghai, China and verified by sequencing of species-specific, conserved genes. A group of bacterial species that either most commonly cause enteritis and/or induce severe forms of enteritis were obtained from Shanghai Municipal Center for Disease Control & Prevention (CDC). Standard ATCC strains of the following bacterial species were used:
Campylobacter jejuni (
C. jejuni) ATCC33560
, Shigella ATCC12022, pathogenic
Clostridium difficile (
C. difficile) ATCC9689
, Salmonella enteritidis (
S. enteritidis) ATCC31194
, Salmonella typhimurium (
S. typhimurium) ATCC14028
, Vibrio parahaemolyticus (
V. parahaemolyticus) ATCC17802
, Yersinia enterocolitica (
Y. enterocolitica) ATCC23715. From the CDC, we obtained the standard control microbes, namely,
Helicobacter pylori (
H. pylori) ATCC43504
, Pseudomonas aeruginosa ATCC27853
, Staphylococcus aureus (
S. aureus) ATCC29213, and
Escherichia coli (
E. coli) ATCC35218. Other common bacterial DP strains selected for DP-HMGS testing, including 6 major pathogenic strains of
E. coli—enterotoxigenic
E. coli (ETEC), enterohemorrhagic
E. coli (EHEC), enteropathogenic
E. coli (EPEC), enteroaggregative
E. coli (EAEC), enteroinvasive
E. coli (EIEC), and
E. coli O157 strain (
E. coli O157)—as well as the non-pathogenic
E. coli strain DH5α (
E. coli DH5α) and
Plesiomonas shigelloides were isolated from clinical specimens at Huadong Hospital, Shanghai, China, and verified by sequencing of species-specific conserved genes.
Faecal specimen collection
Six hundred and thirteen faecal specimens were obtained from outpatients diagnosed with diarrhoea from January 2016 to November 2017, and 30 faecal specimens were obtained from healthy volunteers from Renji Hospital and Children’s Hospital, affiliated with Shanghai Jiaotong University; Tongji Hospital, affiliated with Tongji University; and the Centers for Disease Control in Songjiang district in Shanghai. Patients of all ages with symptoms of acute diarrhoea were considered to be eligible for enrolment. As per the ACG clinical guidelines, acute diarrhoea was defined as the occurrence of defecation 3 or more times per 24 h, with abnormal faecal characteristics, such as loose stool, watery stool, mushy stool, mucosal stool and bloody stool, lasting for less than 14 days [
23]. The exclusion criteria were diarrhoea caused by medicines, poisons, food allergies food intolerance or other diseases. Patients undergoing antibiotic treatments were also excluded. Fresh whole faecal specimens (10 g) were collected in sterilized containers containing 2 mL of normal saline supplemented with recombinant RNase inhibitor (TaKaRa, Japan) to prevent degradation of genetic material from RNA viruses and stored at − 20 °C within 2 h.
Total nucleic acid was extracted from a 200 μL faecal suspension using the Whole Genome Extraction Kit (Zhongding Biotech Co., Ltd., Ningbo, China) according to the manufacturer’s instructions. The extracts were eluted with 100 μL of DNase/RNase-free H2O (ddH2O). The concentrations of each extract were determined using a Thermo Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA). The extracts were stored at − 80 °C until further analysis.
Cloning and sequencing
The genomic targets of the selected pathogens were amplified, and the resulting products were purified using the High Pure PCR Product Purification Kit (Roche, Basel, Switzerland) and subsequently ligated into the pMD18-T simple vector. The constructs were transformed into
E. coli DH5α, followed by sequencing by Shanghai RuiDi Biological Technology Company. Sequencing was performed using the Sanger method with an ABI 3730XL automated DNA analyser (Applied Biosystems Inc., California, USA). The DNA sequences were verified by a BLAST search of the National Center of Biotechnology Information (NCBI) nucleotide database (
http://www.ncbi.nlm.nih.gov/blast) using DNASTAR Lasergene analysis software (DNASTAR Inc., WI, USA).
Bacterial culture and identification
Faecal specimen suspensions were mixed briefly and transferred (1 mL) into a TissueLyser to obtain a uniform suspension (Jingxin Co., Ltd., Shanghai, China) and cultured with Salmonella-Shigella (SS) agar and Columbia blood agar at 37 °C for 24 h for Shigella and S. typhimurium. C. jejuni were cultured with charcoal cefoperazone deoxycholate agar (CCDA) under microaerophilic conditions at 37 °C for 24 h. The specimens were also inoculated into selective enrichment broth at 37 °C for 24 h, followed by subculturing on thiosulphate-citrate-bile salts-sucrose (TCBS) agar for culturing Vibrio species. Colonies of Vibrio parahaemolyticus and Vibrio minicus (green colonies on TCBS) and Vibrio cholerae and Vibrio fluvialis (yellow colonies on TCBS) were identified by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (BioMerieux, Lyon, France). Different serotypes of E. coli were inoculated onto Columbia blood agar and cultured at 37 °C for 24 h. Y. enterocolitica was inoculated into MacConkey agar and cultured at 28 °C for 48 h. C. difficile was inoculated into cycloserine cefoxitin fructose agar (CCFA) under anaerobic conditions a 37 °C for 24 h.
Viral identification by singleplex real-time PCR
The Real-Time PCR Kit (BioPerfectus Technologies, Taizhou, China) was used to detect adenovirus, which is a DNA virus. Three reverse transcription PCR kits (BioPerfectus Technologies, Taizhou, China; including reverse transcriptase) were used to detect the RNA viruses norovirus, rotavirus and astrovirus. Four singleplex PCRs were conducted in a real-time PCR system (7500 real-time PCR system; ABI, California, USA) with software version 2.3. All procedures were conducted following the manufacturer’s instructions.
Primer design
The 22 pairs of primers targeting the species-specific conserved genomic fragments of the selected DPs were designed to include 13 bacterial DPs (listed above), 6 viral DP (listed above), a human internal RNA control gene (hum_RNA) beta-2 microglobulin (B2 M), a human internal DNA control gene (hum_DNA) ribonuclease P (RNaseP), and a systematic internal control (IC). Hundreds of sequences were downloaded from NCBI and analysed using Vector NTI to identify the most highly conserved gene targets specific for each individual DP type. The primers for amplification of the highly conserved regions were designed using DNASTAR software (DNASTAR Inc., Madison, WI, USA) and Primer Premier 5.0 software (Premier Biosoft International, Palo Alto, CA, USA). All the primers were synthesized and purified by Invitrogen™, China. These gene-specific primers were designed and optimized by applying the following criteria: homogeneity of primer sequences; amplification product sizes ranging from 100 to 350 bp, with at least 3-base-pair size differences between each fragment; absence of significant dimer formation between different primers; and absence of non-specific products with each pair of gene-specific primers.
The specific primer sets used in the DP-HMGS molecular detection assay and the corresponding amplicon sizes are all listed in Additional file
1: Table S1. The specificity of each single pair of primers was verified by singleplex PCR using templates containing all the corresponding extracted nucleic acids from each DP, and confirmed by Sanger sequencing. All the primers used for Sanger sequencing are listed in Additional file
2: Table S2. The primer pairs that generated amplification products with a single specific DP-HMGS peak but no non-specific peaks were selected (shown in Additional file
3: Figure S1A–Q).
Setup of the DP-HMGS assay
Each DP-HMGS reaction contained 2 µL of 5× PCR buffer, 0.35 µL of 10 µM dNTPs, 0.25 µL of 25 mM MgCl
2, 0.4 µL of 5 U/µL enzyme mix (Taq polymerase and reverse transcriptase), 0.1 µL of 1 U/µL anti-contamination enzyme UDG (uracil DNA glycosylase; TaKaRa, Japan) [
24], 1 µM each of the forward and reverse primers, and 2.5 µL of plasmid; the amount of plasmid template for each target pathogen in the HMGS assay ranged from 5 to 50 ng. ddH
2O was added to the PCR to attain a final volume of 10 µL. The PCR mixture was incubated as follows: 25 °C for 5 min; 50 °C for 30 min; 95 °C for 15 min; 35 cycles of 94 °C for 30 s, 60 °C for 30 s, and 72 °C for 1 min; 72 °C for 15 min.
Separation by capillary electrophoresis and fragment analysis
Following the amplification step, 1 µL of the reaction product was added to 9 µL of highly deionized (Hi-Di) formamide along with 0.25 µL of DNA Size Standard 500 (AB Sciex, USA). The Applied Biosystems 3500DX genetic analysis system (Applied Biosystems, California, USA) was then used to analyse the PCR products based on size separation using high-resolution capillary gel electrophoresis. The peak height for each PCR product was reported in the electropherogram, and the reaction was considered to be positive when the dye signal was greater than 300 relative fluorescence units (rfu). ddH2O was used as a negative control throughout the experimental process.
Establishment and optimization of the DP-HMGS assay
Multiple sets of primers and reaction parameters were used to optimize the performance of the DP-HMGS assay in a single reaction. The main optimization principle was to keep all amplicons that had similar amplification efficiency ranges and exhibited the gene-specific target amplicons. Primer sequences, concentrations and ratios were optimized so that each DP signature could be amplified specifically without cross-interaction. Additionally, the annealing temperature was optimized using the temperature gradient descent method (using chimeric primers, with temperatures from 50 to 65 °C). Other reaction parameters, such as buffer, enzyme, and reaction time, were also systematically optimized. The primers for the human internal RNA control gene B2 M and human internal DNA control gene RNaseP were included in the DP-HMGS PCR primer mix. Detections of these two genes in the samples indicated that no significant nucleic acid degradation had occurred during specimen handling/storage. Additionally, a modified fragment of the kanamycin resistance gene (Kanr) was inserted into the pcDNA3.1 vector to generate a fusion plasmid that served as internal control for the detection system. The fusion plasmid (1.5 × 105 copies in 3 µL) was added to the 200 µL faecal suspension, immediately prior to nucleic acid extraction, to monitor the extraction and the DP-HMGS reaction. The appearance of all 3 internal control peaks in the DP-HMGS trace confirmed that the sample RNA and DNA had good integrity and underwent efficient extraction, processing, and amplification.
Sensitivity, specificity and accuracy of the DP-HMGS assay
The sensitivity of the DP-HMGS assay for each pathogen was tested by serial tenfold dilutions of plasmids. Serial tenfold dilutions of 22 plasmids using equal amounts of templates were used to test the simultaneous detection limit of the DP-HMGS for all pathogens. The specificity of the DP-HMGS assay in detecting pathogens in a microbiologically diverse gastrointestinal environment was tested using plasmids from 19 positive DPs from our panel combined with DNA from 7 negative control pathogen species expected to be present in the GI tract, namely,
H. pylori, E. coli DH5α
, Pseudomona aeruginosa, S. aureus, Plesiomonas shigelloides, E. coli and sapovirus (Additional file
4: Figure S2). To assess the accuracy, different amounts of three pathogen-associated plasmids (
S. enteritidis, 1 × 10
3 copies; HADV, 1 × 10
5 copies; EHEC, 1 × 10
4 copies) were randomly selected from the 19 types of DPs and mixed for testing with the DP-HMGS assay, and the results were compared with those of the single-template HMGS assay. The plasmids containing genes of the 7 negative control species were then mixed with plasmids containing genes from selected pathogenic species (
S. enteritidis, HADV and EHEC) to further test the ability of the DP-HMGS assay to identify polymicrobial infections in microbiologically diverse environments. The reaction system setup and detection were performed as described above.
Data analysis and statistics
The sensitivity and specificity of the diagnostic tests were calculated according to the following formulas: SE = TP/(TP + FN) × 100; SP = TN/(TN + FP) × 100; the positive predictive value (PPV) and negative predictive value (NPV) were calculated as follows: PPV = TP/P; NPV = TN/N (FN: false negative; FP: false positive; N: negative; P: positive; SE: Sensitivity; SP: specificity; TN: true negative; TP: true positive). Among these variables, TP refers to the number of samples that were positively detected by conventional methods (culture-based methods and singleplex real-time PCR) or DP-HMGS and the Sanger sequencing method. TN refers to the number of samples that gave negative results with conventional methods (culture-based methods and singleplex real-time PCR) or DP-HMGS and the Sanger sequencing method. FP refers to the number of samples that were positively detected by conventional methods (culture-based methods and singleplex real-time PCR) or DP-HMGS but gave negative results with the Sanger sequencing method. FN refers to the number of samples that gave negative results with conventional methods (culture-based methods and singleplex real-time PCR) or DP-HMGS but were positively detected by the Sanger sequencing method. The data were statistically analysed by the χ2 test using the Stata statistical software package, version 12.0 (Stata Corp College Station, TX, USA). The DP distribution of different groups was analysed by the Mann–Whitney rank-sum test for two variables and the Kruskal–Wallis H test for more than two variables. All of the above hypothesis tests were two-sided, and a two-tailed p-value of 0.05 or less was considered to indicate statistical significance.
Discussion
Diarrhoeal infections represent a class of highly infectious diseases that rapidly spread and significantly impact the health of large populations, with the most severe cases leading to death [
5]. Causes of gastroenteritis are predominantly infectious, triggered by multiple classes of bacterial, viral and parasitic pathogens or other non-infectious factors [
27]. Detection of DPs in these cases is not routinely conducted or is conducted using conventional detection methods, such as culture-based methods and singleplex real-time PCR. These methods are relatively time consuming, costly (due to the need for the application of multiple approaches), and labour intensive and are usually limited to the detection of a single pathogen or a group of closely related pathogens per test [
28,
29]. In this study, we established and optimized a rapid, sensitive, specific and well-controlled DP identification and screening assay—DP-HMGS—which allowed the detection of 19 classes of pathogenic DPs simultaneously in faecal specimens. Systematic analysis of 613 clinical specimens with this highly sensitive and specific DP-HMGS assay revealed that (1) the DP-HMGS method was more sensitive and could detect more pathogenic bacteria than the culture-based method while maintaining sensitivity levels comparable to those of single PCR-based detection; (2) several major aetiological agents remained frequently underdiagnosed as important causes of acute infectious diarrhoeas when assessed solely using conventional methods, leading to incorrect conclusions regarding major causes of infectious diarrhoeas; (3) the DP frequency distribution detected by DP-HMGS exhibited significant age variation; and (4) approximately 1/3 of the cases of infectious diarrhoeas were co-induced by multiple pathogens, with some DPs preferentially occurring as co-infecting agents.
Culture-based methods are the most conventional methods and continue to be commonly used for the detection of bacterial causes of diarrhoea as a “gold standard” for aetiological diagnosis in a vast majority of health centres. However, culture-based methods have several significant limitations that restrict the use of these methods as the “first line” of DP screening [
28,
29]. These limitations include the requirement of up to several days of growth before analysis, requirement of variable media and culture conditions for various species, problems associated with overgrowth of non-pathogenic bacteria that are abundant in the gut, and limited sensitivity due to technical limitations of incubation environments. Finally, a number of strictly anaerobic but important bacterial species, such as
C. difficile are difficult to be cultured under routine laboratory conditions due to the requirement of specific equipment [
30,
31]. Detection of viruses is commonly conducted by singleplex real-time PCR. However, the use of individual PCRs to screen multiple pathogens is tedious and costly [
29,
32,
33]. Our present study demonstrates that the detection levels for most bacterial pathogens using culture-based methods alone were unacceptably low. Indeed, for most pathogens, the detection levels were 30% or lower (Table
1). Furthermore, low and non-uniform culture detection rates resulted in a significant bias, with the epidemiological data showing a very high impact of viral infection and greatly underestimating that of bacteria (Fig.
5). DP-HMGS allowed us to overcome the significant disparity in the sensitivities of the conventional detection methods, bringing the accuracy of the epidemiological map of DPs to the level obtained by sequencing methods (Table
2, Fig.
5). The major advantages of the DP-HMGS platform that allowed its successful implementation were high sensitivity, specificity and accuracy for the identification of DP and, most importantly, a relatively uniform performance for all DPs. One potential drawback of DNA/RNA-based molecular detection methods (including DP-HMGS) is that these methods do not reveal whether an infectious agent is viable. However, considering the detection limit (10
2–10
3 copies/µL), DP-HMGS detected only significantly abundant microbes in the GI tract, in turn strongly suggesting the association of the microbes with the disease. Further studies are needed to definitively address whether the detection of microbes at this level always signifies the presence of viable microbes in the GI tract.
In terms of epidemiological findings, we showed that EPEC was the DP most frequently associated with infectious diarrhoeas in Shanghai, accounting for 24.6% of cases (Table
3); this finding was consistent with previous reports from China [
34] and Singapore [
35]. In contrast, a report by Moreno et al. concluded that EAEC rather than EPEC was the major cause of diarrhoea [
36]. While its relative contributions to diarrhoea epidemiology have often been inconsistent between reports [
37,
38], EPEC continues to be the most prevalent type of pathogenic
E. coli found in industrialized countries [
39,
40]. A strong association between EPEC and diarrhoea in children has been reported [
39], and yet, we also found a relatively high prevalence of EPEC in older patients in our study (Fig.
6c). In the study of the age distribution of pathogens, the predominant DP frequencies varied among all age groups, as exemplified by the “top 3” DPs in each age interval (Fig.
6d), which might be attributed to the different lifestyles, food preferences and immune statuses of the patients in different age groups. However, these disparities were likely influenced by different research periods and the variety of methods applied for detection. Rotavirus was most frequently found and highly prevalent among young patients (0–19 ages) but less prevalent in other age groups, which was consistent with the results of most epidemiological studies [
17,
41]. Another infection that predominantly impacted patients ≤ 19 years old was pathogenic
C. difficile. This result is consistent with a report by Buss et al. that showed a high proportion of
C. difficile in the infectious DPs detected from paediatric faecal specimens using the FilmArray gastrointestinal panel [
42]. The clinical practice guidelines for
C. difficile infection in adults and children do not recommend testing for
C. difficile in children less than 2 years old unless other causes of disease have been explicitly excluded [
43]. In our study, we specifically detected pathogenic
C. difficile in 0–2-year-old patients with diarrhoea symptoms without a confirmed clinical diagnosis of other causes of disease. These outcomes provided useful diagnostic clues, suggesting that in most of these cases, the major cause of diarrhoea was pathogenic
C. difficile. Thus, future studies are needed to establish whether the detection of pathogenic
C. difficile in young children in these circumstances is important for clinical diagnosis and treatment.
The final important finding in our study was a high frequency of polymicrobial infection with two or more DPs, relative to previously reported data [
44,
45], which demonstrated another level of complexity in determining the aetiology of diarrhoea, which could be offset by the use of multiplex detection systems such as DP-HMGS. Because of the limitations of conventional methods, polymicrobial infections usually go undetected, and none of the polymicrobial infections could be identified by culture-based methods in our study [
46]. The polymicrobial infections identified by DP-HMGS accounted for 1/3 of the positive specimens detected (Fig.
7a), showing that the
E. coli subgroups were much more common in polymicrobial infections than as the cause of single DP infections (Fig.
7b). This result may be due to the
E. coli pathogroup requiring partner pathogens to cause severe diarrhoeal disease [
46‐
48]. Among the polymicrobial infections, we found that EPEC and
Vibrio presented the highest ratio (Additional file
6: Table S3). One possible reason for this observation could be that the isolation rate of EPEC was the highest (24.6%), and
Vibrio was the most frequently isolated pathogen from seawater and seafood, and the consumption of contaminated seafood appears to be one of the major causes of acute diarrhoea in Shanghai [
49]. Finally, we noticed that the proportion of polymicrobial infections in the 20–39-year-old age group was significantly higher than that in the 40–59-year-old age group (Fig.
7c), which may be attributed to the different lifestyles of younger adults, including more frequent social activity, travelling and moving than the older group.
However, these findings regarding polymicrobial infections could be a result of colonization by another DP because differentiation between colonization and true polymicrobial infections is relatively difficult (regardless of the detection methods) [
50]. One advantage of DP-HMGS was that specific pathogenic gene sequences were used for the detection of
C. difficile, S. typhimurium, Shigella and
E. coli pathogenic strains (EPEC, ETEC, EAEC, EIEC, EHEC and
E. coli O157). Thus, DP-HMGS could be used to distinguish between colonizing and pathogenic bacteria. Nevertheless, for the remaining DPs, which did not have specific pathogenicity-associated genes, colonization could not be distinguished from an active pathogenic process.
Authors’ contributions
SW, FY, DL and JQ contributed equally to this work. ZB, MAO, HZ and YZ contributed to the design and coordinated the study. SW, YW, YZ, YM, LX and JC designed the primers and optimized the conditions of the DP-HMGS assay. DL, WH, LJ, YF and FZ collected and verified the bacteria and viruses. SW, DL, MK, ZB, MAO, HZ and YZ wrote the manuscript. All authors read and approved the final manuscript.