Scientists have discovered that several viruses belonging to the Coronaviridae family can infect a wide range of hosts, including birds, humans, and other mammals. These viruses are positive-sense single-stranded RNA viruses ranging in size from 27 to 32 kb. They are divided into four categories, namely alpha, beta, delta and gamma.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of the ongoing coronavirus disease pandemic 2019 (COVID-19), and was first identified in Wuhan province of China in December 2019. Due to its high mortality rate, the World Health Organization announced that COVID-19 would be a pandemic on March 11, 2020.
Because viruses undergo a genomic mutation, it is very important to identify the mutant site for vaccine development. Several tree-based phylogenetic analyzes have been performed to understand the evolutionary relationship of SARS-CoV-2 with other beta coronaviruses. A previous study constructed a phylogenetic tree and revealed that the genomic sequence of SARS-CoV-2 is 88% identical to BAT-CoV. In another study, scientists have isolated about 70 genomic sequences from SARS-CoV-2 from COVID-19 patients and studied the ear glycoprotein gene. This study also reported that the BetaCoV-bat-Yunnan-RaTG13-2013 virus is almost identical to SARS-CoV-2.
Although a comparative study of the genomic sequences of SARS-CoV, MERS-CoV, and SARS-CoV-2 is available, there is a gap in research regarding the comparison between four types of coronavirus, namely, SARS-CoV, MERS-CoV, BAT-CoV and SARS-CoV-2. A new study, which deals with the genomic comparison between the sequence of the four types of coronavirus mentioned above, has been published in the Journal of Medical Virology. This study used several genetic markers, including single nucleotide polymorphisms (SNPs), whole genome sequence phylogeny, protein mutations, and microsatellites. These were compared with the SARS-CoV-2 reference genomic sequence known as the Wuhan strain (Wuhan-Wu-I). All sequences were obtained from NCBI Genbank.

The SARS-CoV, MERS-CoV, and SARS-CoV-2 sequences were obtained homo sapiens (host), while BAT-CoV sequences were collected from eight different types of bats. The results of this study are described below.
Phylogenetic analysis
For the phylogenetic analysis of the different coronavirus sequences, a maximum likelihood approach with 1000 initialized values was used. Phylogenetic analysis revealed different coronavirus lineages. All genome-based phylogenetic analysis has shown that MERS-CoV belonged to species outside the group, while the other three were classified as group species. Within the group, two lineages were found, namely one lineage consisting of SARS-CoV-2 and another consisting of SARS-CoV and BAT-CoV. The branches of the phylogenetic tree indicated that SARS-CoV had diverged very early from BAT-CoV. The tree also revealed a SARS-CoV-2-independent divergence from BAT-CoV. Phylogeny also showed that SARS-CoV-2 was more closely related to BAT-CoV and SARS-CoV than MERS-CoV. Simplot software was used to visualize the similarity plot between the four selected species. It revealed about 98% BAT-CoV homology with the reference sequence, i.e. the Wuhan spot of SARS-CoV-2. However, 92% similarity between SARS-CoV and baseline sequence was obtained and 58% similarity between MERS-CoV and Wuhan strain.
Analysis of genetic variants
A variant-based analysis showed that the MERS-CoV genome differed from the Wuhan reference strain by 134.21 sites, the BAT-CoV genome differed by 136.72 sites, the SARS-CoV genome differed by 26.64 sites and the SARS-CoV-2 genome differed by 0.66 sites. In addition, the current study also revealed that the probability of mutations at the missile sites of MERS-CoV and SARS-CoV-2 is higher compared to SARS-CoV and BAT-CoV. This is due to the reduction in the number of missense variations in SARS-CoV and BAT-CoV, which has occurred due to selection pressure at missense sites.
The number of mutations in Spike protein (S), Envelope protein (E), membrane protein (M), nucleocapsid protein (N), and structural proteins was calculated. SNPs were filtered from the S, M, E, and N gene regions using a python script. The S, M, E, and N genes revealed the presence of a varied number of SNPs. The online tool Multialin was used to detect similarities between four coronaviruses selected for the current study.
Microsatellite analysis
Microsatellite analysis is used to determine the repetitive sequences of the genome. These sequences have a significant impact on the onset of diseases and their evolution. In this study, microsatellite analysis was performed using IMEX (Imperfect Microsatellite Extractor) and FMSD (Fast Microsatellite Discovery) online tools. No significant microsatellite presence was found using IMEX. However, FMSD revealed the presence of more microsatellites in MERS-CoV. The SARS-CoV-2 genome showed the presence of the highest incidence of compound microsatellites.
In summary, the analysis of the phylogenetic tree showed that SARS-CoV-2 is closely related to BAT-CoV and its second closest relative is SARS-CoV. All MERS-CoV strains showed distal relationship to SARS-CoV-2. In the analysis of genetic variants, more mutations were found in MERS-CoV compared to SARS-CoV and BAT-CoV. Phylogenetic analysis, the study of genetic variation, multisequence, and microsatellite analysis, showed that the bat is the native host of SARS-CoV-2. In addition, it also concluded that BAT-CoV is closely related to SARS-CoV-2. There is a possibility of the presence of an intermediate host to initiate the transmission of COVID-19 from BAT to humans. However, more research is needed to validate this assumption. The FMSD tool revealed that SARS-CoV is more closely associated with SARS-CoV-2 than BAT-CoV.
Newspaper reference:
- Rehman, AH et al. (2021). Complete genomic and microsatellite comparative analysis of coronavirus SARS, MERS, BAT-SARS and COVID – 19. Journal of Medical Virology, https://doi.org/10.1002/jmv.26974, https://onlinelibrary.wiley.com/doi/10.1002/jmv.26974