Background
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a novel β-coronavirus, namely, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [
1,
2]. It has evolved into an ongoing pandemic since 11 March 2020 [
3,
4]. According to the World Health Organization (WHO), over 215 million confirmed cases with nearly 4.5 million related deaths have been reported globally as of the end of August 2021 [
5]. Straddling Southeast Asia and Oceania, Indonesia is the world's largest archipelago nation and the world’s fourth most populous nation. Despite some degree of underestimation, Indonesia has been bearing the highest cumulative burden of COVID-19 in the region of the Association of Southeast Asian Nations (ASEAN) since mid-October 2020 [
6]. As of 6 June 2021, this populous tropical country had only 5.51% of its citizens fully vaccinated and was once considered to be Asia’s new epi-center of the pandemic after India [
7,
8].
With the worldwide spread of the virus, its continuous evolution yielded the emergence of multiple variants, which has posed an increased risk to global public health [
9]. In this context, sequencing-based genomic surveillance, whether at the national, regional or global level, has become the key to formulating effective intervention strategies [
10]. The Pango nomenclature offers a dynamic proposal for the classification and subtyping of SARS-CoV-2 strains based on genetic relatedness [
11]. Under this system, a multitude of SARS-CoV-2 lineages and sub-lineages are identified and named accurately, including four variants of concern (VOCs: B.1.1.7, B.1.351, P.1 and B.1.617.2 corresponding to Alpha, Beta, Gamma and Delta labelled by WHO) and several variants of interest (VOIs) [
12]. Since 2020, several teams and institutions (e. g., the Yogyakarta-Central Java COVID-19 Study Group and the West Java Health Laboratory) have conducted whole-genome sequencing (WGS) of SARS-CoV-2 strains circulating in various parts of Indonesia, and have shared genetic data and related metadata via the Global Initiative on Sharing Avian Influenza Data (GISAID) platform [
13‐
17]. These efforts make it possible to gain insights into Indonesia's COVID-19 epidemic from a micro perspective.
One of the WHO-designated variants under monitoring (VUMs), SARS-CoV-2 lineage B.1.466.2, was first documented in Indonesia in November 2020 and was reported to contribute to 31.7% of all cases nationwide as of the third week of August 2021 (second only to the Delta variant) [
12,
18]. During the pre-Delta outbreak period, the B.1.466.2 lineage was the most dominant variant in Indonesia whereas the Alpha variant was more dominating in other parts of Asia [
19]. Although the B.1.466.2 variant has been introduced into neighbouring countries (e. g., Malaysia) [
20], it still mainly circulates in Indonesia and might have accumulated unique mutations in indigenous transmission. Due to its overly dispersed territory and relatively limited health resources, extensive and continuous SARS-CoV-2 genomic surveillance remains a daunting task for Indonesia [
21]. Two bioinformatic analyses of SARS-CoV-2 full-length sequences from Indonesia confirmed the dominance of GH clade with marker mutations of D614G and Q57H [
22,
23]. A more representative recent study found that in Indonesia, the B.1.466.2 variant shared some missense mutations with both Alpha and Delta VOCs, indicating a significant impact on transmission and clinical severity; the N439K mutation showing antibody resistance was prevalent in the Indonesian B.1.466.2 variant [
19]. Other related studies mainly focused on a certain province or part of Indonesia (especially the Java Island) [
13‐
17]. In West Java, the B.1.466.2 variant dominated the local SARS-CoV-2 infections from January through April 2021, which was generally consistent with the frequency trend of Q57H mutation [
17]. In Central Java and Yogyakarta, around 13% of SARS-CoV-2 samples (collected from May 2020 to June 2021) were clustered into the B.1.466.2 lineage, ranking second among non-Delta variants [
16]. As an important local variant, there is a lack of nationwide genomic surveillance targeting the B.1.466.2 lineage. In particular, the trajectory of its population size over time and the picture of how it transmitted between different islands in Indonesia remain unclear.
In the present study, we retrospectively analyzed the whole-genome sequencing data of SARS-CoV-2 lineage B.1.466.2 from Indonesia to systematically reveal its evolutionary features and transmission patterns in such a tropical archipelago setting.
Discussion
As the COVID-19 pandemic continues to progress globally, the evolution and transmission dynamics of various SARS-CoV-2 variants have aroused great interest among virologists, clinicians, epidemiologists, and even health decision-makers. This study revealed the prevalence, evolution and spatiotemporal transmission of an endemic variant (the lineage B.1.466.2) in Indonesia using publicly available complete genomes and associated metadata. Such findings provide valuable reference for responding to future outbreaks in archipelagic countries.
More than 80% of the SARS-CoV-2 lineage B.1.466.2 genomes are from Indonesia, and other countries with a contribution rate of more than 1% (except Japan) are adjacent to Indonesia. The geographic distribution supported the local prevalence of this variant with the Indonesian Archipelago as its global epi-center [
17]. Furthermore, the number of samples contributed by each region in Indonesia was generally consistent with its population size and laboratory diagnostic capacity [
21]. It should be pointed out that sampling may have been biased by the availability of sequencing centers and resources, which was relatively low in Eastern Indonesia (especially before the establishment of the nationwide network for SARS-CoV-2 genomic surveillance) [
37]. The B.1.466.2 variant was first documented in Indonesia in November 2020 [
12]. However, the collection time of one Indonesian B.1.466.2 sample submitted to GISAID can be traced back to 6 August 2020. Hence, we inferred that this variant was likely to have been circulating in Indonesia before August 2020. In the study, we observed a noticeable shift in the dominant strain of Indonesian SARS-CoV-2 infections, which was from the B.1.466.2 variant to the B.1.617.2 or Delta variant and took place in late May 2021. Interestingly, the prevalence of these two variants in Indonesia was almost complementary in time series. This phenomenon reflected the fierce competition between different variant populations of the same virus, and in this case, the highly contagious Delta variant eventually gained a competitive advantage [
38]. As the virus continues to evolve, similar population succession is likely to be repeated, which calls for real-time genomic surveillance [
10].
The time-scaled phylogeny revealed the most recent common ancestor and heterogeneous evolution of this lineage. According to the root state in the evolutionary tree, the B.1.466.2 variant circulating in Indonesia was estimated to have originated in Java in mid-June 2020. The estimate of its MRCA was logically consistent with the sampling location and date of the earliest B.1.466.2 sequence. To comprehend the emergence and evolution of pathogenic viruses, the complex "host–pathogen-environment" interaction provides an analytical framework [
39]. As the world's most populous island, Java is home to 147.8 million people (over 50% of the Indonesia’s total population) [
40]. Java's very high population density (~ 1200/km
2) means a high SARS-CoV-2 transmission rate [
41], superimposed on its large number of potential hosts, which can lead to large viral population size and intense selection pressure—two major evolutionary drivers [
42]. Therefore, it seemed unsurprising that this local variant of Indonesia originated from Java. In addition to the root, the trunks to which side branches belong were also assigned to Java, indicating that the SARS-CoV-2 lineage B.1.466.2 strains in other parts of Indonesia were all introduced from Java. Like other forms of life on earth, the evolution of viruses is subjected to environmental selection. Among the side branches of the evolutionary tree, we observed the clustering preference of sequences derived from Sumatra and Kalimantan in the B clade (sub-lineage). The equator traverses Sumatra and Kalimantan, and both large islands are known for their dense jungles and diverse species. At least 88 and 97 species of tropical bats have been identified in Sumatra and Kalimantan respectively, including those that potentially serve as natural hosts for SARS-CoV-2 [
43‐
45]. Interestingly, a viral ecology study in the islands of the Western Indian Ocean found a strong signal of co-evolution between coronaviruses and their bat host species and further revealed the effect of the endemism resulted from geographical isolation on viral diversification within bat families [
46]. Therefore, we speculate that the unique jungle ecology of Sumatra and Kalimantan may contribute to the local circulation of the B.1.466.2 variant through certain species of bats (e. g.
Rhinolophus spp.) [
47,
48].
The evolutionary rate of Southeast Asian SARS-CoV-2 strains (isolated before September 2020) was estimated to be 1.446 × 10
−3 substitutions per site per year [
49], while the corresponding estimate for the Indonesian B.1.466.2 variant was almost halved. The decline in the rate of evolution probably occurred under its RNA proofreading mechanism and partly reflected that this variant was more adaptable to the local environment than the original strain [
50,
51]. Several high-frequency non-synonymous mutations were identified in the SARS-CoV-2 lineage B.1.466.2 from Indonesia, most of which occurred in the non-structural protein 3 (NSP3) and the spike protein. NSP3 is the largest viral protein containing multiple domains and participates in the formation of the replication/transcription complex [
52]
. Notably, the S944L mutation (only observed in the B sub-lineage) is located in the papain-like protease domain of NSP3. However, whether this mutation can improve viral fitness remains to be verified in further experiments [
53]. Co-mutations of S-D614G and ORF3a-Q57H are the genetic markers for the GH clade, to which the Pango lineage B.1.466.2 belongs according to GISAID [
54]. The D614G mutation in the spike protein confers a selective advantage on SARS-CoV-2 and eventuates in all VOIs and VOCs [
55]. Previous studies have proven that the D614G strain has higher infectivity, primarily by altering the conformation of the RBD (receptor binding domain) and enhancing the affinity with the ACE2 (angiotensin-converting enzyme 2) receptor [
56‐
58]. The Q57H mutation causes major truncation of ORF3b and structural instability of ORF3a [
59]. SARS-CoV-2 strains with this mutation are reported to evade innate immune activation and inflammatory responses [
60,
61]. As the second most common RBD mutation worldwide, the S-N439K mutation has been shown not only to enhance the infectivity of the virus (probably via creating a new salt bridge with ACE2), but also to increase the resistance to the neutralizing antibody [
62,
63]. Located in the furin cleavage site of the spike protein, the P681R mutation is one of the defining mutations for the current dominant strain globally—the Delta variant [
64]. It was reported that this highly conserved mutation could facilitate furin-mediated S1–S2 cleavage, accelerate membrane fusion and internalization, enhance replication efficiency and promote resistance to neutralizing antibodies [
65‐
67]. In this study, we found that the sub-lineage B of the Indonesian B.1.466.2 variant has evolved the S-P681R mutation, and the numerical advantage of the B sub-lineage over the A lineage (approximately 7:1) supported the potential higher infectivity of the strain with this mutation. Although the effect of a single amino acid mutation on viral infectivity, virulence and immunological properties is generally limited, the accumulation of some significant mutations in a background of multiple mutations may profoundly alter the pandemic patterns and the clinical outcomes, as we have seen with the Delta variant [
15,
16,
68]. Given the mutation profiles observed (especially the D614G/N439K/P681R co-mutations in the spike protein), continuous monitoring of the B.1.466.2 variant remains essential.
In epidemiological contexts, Ne (a measure of genetic diversity) is generally considered to be proportional to the number of infected individuals [
69]. For the COVID-19 pandemic, Ne may reflect the incidence of SARS-CoV-2 infection among the population due to the short transmission window of the virus [
70]. Considering the decline in susceptible individuals and the implementation of intervention strategies, Rt is commonly used to assess the real-time dynamics of infectious disease transmission [
71]. In the initial phase after the emergence of the B.1.466.2 variant, its effective population expanded the fastest and corresponded to a very high Rt, suggesting that the local hosts were highly susceptible to the new variant. An exponential growth in Ne with a median Rt > 1 was observed during October 2020–February 2021, which documented the widespread of SARS-CoV-2 lineage B.1.466.2 in communities (especially in the form of infection clusters). Moreover, the peak of estimated Rt overlapped with the peak of total reported cases nationwide in January 2021 [
72], indicating that the B.1.466.2 variant could be responsible for the surge of total SARS-CoV-2 infections at the period. From early March 2021 to early May 2021, the Ne of the B.1.466.2 variant showed a decreasing trend, which may have resulted from the competitive pressure brought by the Delta variant, the enforcement of community-level public activity restrictions (locally known as PPKM) and the launching of Indonesian nationwide vaccination program against COVID-19 [
73]. Similar reasons could be used to explain the lower Rt. In the final phase of the study, the median Rt estimate < 1 and the plateaued Ne suggested that SARS-CoV-2 lineage B.1.466.2 seemed to be gradually extinct in Indonesia. However, in this case we could not exclude the possibility that the B.1.466.2 variant was still circulating in certain regions due to the lag in SARS-CoV-2 genome submissions to GISAID [
74]. Similar to some of its relatives (e. g., OC43 and HKU1), the circulation pattern of SARS-CoV-2 also exhibits a certain seasonality [
75,
76]. During the period of December 2021 to March 2022 (just in another rainy season), the B.1.466.2 variant was detected again in several regions of Indonesia [
77], which may in part reflect seasonal outbreaks.
We assessed the inter-regional transmission and the relative advantage of intra-regional transmission over inter-regional transmission of SARS-CoV-2 lineage B.1.466.2 in Indonesia in the four phases (divided based on the inferred Ne trajectory). Java was estimated to be the initial epi-center for this SARS-CoV-2 variant, which was consistent with the estimated origin located in Java. In addition, a certain degree of intra-regional transmission was likely to have already existed in Java before the first infection with this variant was officially documented. This finding highlighted the importance of improved genomic surveillance for the timely identification of new SARS-CoV-2 variants. Corresponding to the exponential growth of Ne, the transmission of the B.1.466.2 variant in Indonesia reached its peak during the second phase, which coincided with the country’s rainy season (October to March) and several important festivals. The rainy season usually means more water vapour in air and less ultraviolet radiation from sunlight. In flow physics, higher ambient relative humidity contributes to longer lifetime of respiratory droplets [
78]. Previous studies demonstrated that ultraviolet light has the strongest correlation with lower COVID-19 growth [
76,
79,
80], probably due to the inactivation of the virus by ultraviolet light and its promotion of the vitamin D (as an immune regulator) synthesis. In addition, Indonesia is a country prone to flooding, and serious flood disasters mostly occur in the rainy season every year and mainly affect populated areas like Java and Sumatra [
81]. Flooding can cause both difficulties in practicing social distancing during evacuation and interruptions in the supply of medical protective materials, facilitating the local transmission of the virus [
82,
83]. Thus, strengthening the prevention, preparedness and readiness for COVID-19 in the rainy season is needed. A large number of studies have confirmed the contribution of population mobility to the spread of COVID-19 [
84‐
86]. To transition to a healthy, safe and productive society, the Indonesian government changed the strictest intervention policy of large-scale social restrictions (PSBB) into the “PSBB transisi” policy [
87]. This allowed some citizens (especially those living in the capital, Jakarta) to return home to visit their relatives during Christmas 2020 and the following New Year holiday, partly contributing to the spread across regions [
88]. The sea barrier makes inter-regional passenger traffic in Indonesia largely dependent on aviation, and air travel is sometimes the only way to reach certain parts of the archipelago. Based on previous research evidence, we speculated that air travel provided a path for the relocation diffusion of the B.1.466.2 variant from Java and Sumatra to the remote Papua [
89,
91]. In the socio-economic dimension, Java is the center of gravity of Indonesia, followed by Sumatra. These two regions in Western Indonesia contribute about 80% of the national economy and interact frequently with convenient transportation conditions [
90]. The phylogeographic reconstruction suggested that Java and Sumatra successively acted as epi-centers and formed stable transmission loops between them, reflecting the significant influence of the spatial structure of socio-economic factors on the diffusion pattern of the pandemic [
91,
92].
Targeting an indigenous SARS-CoV-2 variant (B.1.466.2) circulating in Indonesia, we identified its evolutionary characteristics and transmission patterns by retrospective genomic surveillance. Although an epidemiological paradigm that combines molecular and native perspectives was provided, unevenly distributed and relatively inadequate genomic samples may bias the analytic results to some extent. In fact, sampling bias is inevitable in genomic surveillance and related research [
93]. Therefore, in the process of deriving conclusions, we should try to combine the real-world pandemic situations and be cautious. As Indonesia improves the use of WGS for SARS-CoV-2 and strengthens international health cooperation, more accurate and real-time genomic surveillance for this variant and even comparative genomic analysis along with other variants are expected in the future.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.