The evolutionary history of CV-B3 was studied by maximum likelihood analysis and Bayesian inference method. A total of 236 entire
VP1 nucleotide sequences (dated to December 2017) with known sampling dates in the world were selected for phylogenetic analysis, including sequences in this study and sequences incorporated from GenBank (Additional file
1: Table S1). To investigate the epidemiological pattern in the mainland of China, 134 entire
VP1 nucleotide sequences were used to analyze the phylogenetic characteristics. Nucleotide sequences were aligned with their corresponding homology by Muscle implanted in MEGA software (version 7.0.26) [
27]. The maximum likelihood phylogenetic tree was constructed by the IQ-TREE software and inferred by ModelFinder to search the best nucleotide substitution model of GTR + F + Γ
3, including the General Time Reversible model (GTR), rate heterogeneity of Gamma distribution with rate categories of 3 (Γ
3) and Empirical base frequencies (F) [
28,
29]. Phylogenetic trees were also inferred by using Bayesian method implemented in BEAST software (version 1. 7.5) [
30], with the nucleotide substitution model of GTR + I + Γ supported by the jModelTest software (version 2) [
31]. The topology of phylogenetic trees was also assessed using the MrBayes software (version 3. 2. 6) and RaxML software (version 8) to confirm the topology of phylogenetic trees [
32,
33]. The Markov Chain Monte Carlo chain was run for 1.5 × 10
8 generations to establish convergence of all parameters. Convergence and effective sample size (> 200) of the parameters were checked with Tracer software (version 1.6) [
34]. The resulting trees were summarized using a maximum clade credibility (MCC) topology from TreeAnnotator software (version 1.8.4), with a burn-in of the first 10% of sampled trees. We used the FigTree software (version 1.4.2) to manipulate the phylogenetic trees for the best performance. Sampling times of the sequences were used to calibrate the molecular clock. We performed date randomization tests in R package (version 3.4.3) using the Tip Dating Beast package to determine the temporal signal in the data [
35]. Based on 20 random replicates of the sampling dates produced by this package, the datasets are considered to have sufficient temporal signal for the datasets when the 95% credibility intervals of rate estimate of real datasets doesn’t fall within the 95% credibility intervals of rate estimate from the date randomized replicates. This approach can provide a more accurate test for the temporal structure of CV-B3 datasets so that we could accurately estimate the evolutionary timescale of CV-B3. A Bayes factor analysis was performed to select the best demographic model and compare different models for the best one.
After the CV-B3 sequences had sufficient temporal signals, the gene timing of origin was calculated, which added a timescale to the phylogenetic histories, and the their most recent common ancestors (tMRCA) were calculated [
36], based on a relaxed uncorrelated exponential growth coalescent inference and a relaxed uncorrelated lognormal growth coalescent inference. To determine the extent to which the viral population was constructed by geography, phylogeny-trait association analysis was performed using BaTS software (version 2.0) to compute the values of the association index, parsimony score, and maximum monophyletic clade statistics [
37].
P values of < 0.05 were considered significant from the three statistics. Natural selection pressure on the entire
VP1 region of CV-B3 was assessed by estimating the ratio of nonsynonymous substitution to synonymous substitution implemented in the software of PAML 4.7 [
38] and on-line Datamonkey [
39,
40]. Likelihood ratio tests of the former were performed to compare these nested models (M0 vs. M3, M1a vs. M2a, M7 vs. M8) for selecting the one that fitted the data best. The latter, which used the methods of MEME (Mixed Effects Model of Evolution) and FEL (Fixed Effects Likelihood), was considered to be under the positive selection with
p < 0.05.