Viruses are submicroscopic infectious agents that can only multiply in living cells. They are not free-living and although they can survive and remain infectious briefly on a variety of surfaces, they cannot reproduce without a host. Viruses reproduce by infecting other cells, where they take over the cellular machinery for producing proteins. All viruses contain genetic material, either DNA or RNA, and one or more proteins, with megaviruses predicted to contain upward of 1,000 proteins. The number of known viruses is relatively large, with 6,828 species described to date, but the virosphere is believed to be much bigger, with perhaps hundreds of thousands of species yet to be discovered. Some estimates suggest there may be millions of undiscovered virus strains. Each virus is characterized by its own unique genetic composition. Upon infecting human cells, viruses cause a multitude of diseases ranging from influenza to rabies. COVID-19 is the respiratory disease caused by the coronavirus SARS-CoV-2, the cause of the current crisis.
There are 39 identified species of coronavirus. Seven of these are known to infect humans, three of which can cause severe disease, namely SARS-CoV, MERS-CoV and SARS-CoV-2, and four of which are associated with much milder symptoms.1 SARS-CoV, which causes severe acute respiratory syndrome (SARS), first appeared in 2002 in southern China. The virus likely emerged in bats, then moved into nocturnal mammals called civets, before finally infecting humans. SARS spread to 26 countries around the world, infecting more than 8,000 people and killing more than 770 over the course of two years. SARS-CoV-2 (previously known as 2019-nCoV), which causes COVID-19, is much more serious than SARS-CoV due to its high infectivity. SARS-CoV-2 was first identified in December 2019 in the Chinese city of Wuhan. To date it has infected ~900,000 people worldwide with almost ~50,000 deaths. Although its origins are not fully understood, genomic analysis of SARS-CoV-2 suggests that it probably emerged from a coronavirus strain found in bats; there is no reliable evidence that the virus arose from non-natural sources. The SARS-CoV-2 genome is closely related to that of SARS-CoV. Due to its high infectivity, the severity of symptoms, the relatively high risk of death, and the lack of a vaccine, the spread of SARS-CoV-2 has led to the current unprecedented travel bans, lockdowns, quarantines, and economic meltdowns in a world that was unprepared for such a global catastrophe. It should be noted that as long ago as 2007, the presence of such viruses in horseshoe bats combined with southern China’s culture of eating exotic mammals was considered a ticking time bomb and the likely cause of a future pandemic.2
Viruses are classified according to the type of nucleic acid they use as genetic material. DNA viruses usually contain double-stranded DNA and replicate using an enzyme known as DNA-dependent DNA polymerase, which can be part of the virus genome. RNA viruses, such as SARS-CoV-2, typically contain single-stranded RNA, allowing their genome to be directly translated by the host cell upon infection (at least in the case of positive-sense single-stranded RNA viruses such as SARS-CoV-2). The genome of SARS-CoV-2 shares about 80% identity with SARS-CoV. The SARS-CoV-2 genome contains 29,891 nucleotides, encoding for 9,860 amino acids, which results in 27 proteins.3 There are some significant differences between SARS-CoV-2 and SARS or other SARS-like coronaviruses, with 380 different amino acid residues encoded by the viral genomes between these coronaviruses. By way of example, the 8a protein present in SARS-CoV is absent in SARS-CoV-2; the 8b protein is 84 amino acids long in SARS-CoV, but 121 amino acids in SARS-CoV-2. Presumably, these and other molecular differences are responsible for the functional and pathogenic divergence of SARS-CoV-2 from other coronaviruses, although how this relates to the higher infectivity of SARS-CoV-2 is not known.
Of the proteins encoded by the SARS-CoV-2 genome, four are the so-called major structural proteins of the virus, namely the spike surface glycoprotein, the small envelope protein, the matrix protein, and the nucleocapsid protein. The spike surface glycoprotein plays an essential role in binding to receptors on the host cell and determines host tropism: i.e., the way in which SARS-CoV-2 preferentially targets specific species, host tissues, or host cells.
In terms of their general shape and appearance, coronaviruses are spherical, with diameters of about 125 nm. By way of comparison, a typical bacterium could be ~1,000 nm in size. The characteristic features of coronaviruses are the club-shaped spike projections emanating from the surface of the virion which give them the appearance of a solar corona, prompting the name coronavirus. The three-dimensional structure of the entire SARS-CoV-2 virion shows that the nucleic acid and nucleocapsid protein are found underneath a lipid bilayer.4 Hence, SARS-CoV-2 is known as an enveloped virus, which appropriates lipids from the host cell when it buds off to form a new virion.
The critical point in the life cycle of all viruses is gaining entry into a host cell such that the virus can deliver its genetic material for replication and reproduction. Enveloped viruses use their surface glycoproteins to catalyze membrane fusion, an essential step in cell entry. Host cell components prime these viral surface glycoproteins to catalyze membrane fusion. Among these priming components are proteases, which cleave the viral surface glycoproteins allowing them to refold in ways that catalyze virus–cell membrane fusion.
For SARS-CoV-2, a highly specific binding interaction occurs between one of the four major proteins of the virus, the viral spike surface glycoprotein, and the host receptor, which is human angiotensin-converting enzyme 2 (ACE2), the same host receptor by which SARS-CoV gains entry to cells. Details have recently emerged of the precise molecular details of this interaction.5
Internalization of SARS-CoV-2 occurs through a number of steps that depend entirely on the interaction between the spike protein and ACE2. The SARS-CoV-2 spike protein consists of two main domains: a receptor-binding domain (S1) that engages with ACE2, and the S2 domain which mediates fusion between the viral and host cell membranes. The amino acid sequence of the receptor-binding domain of the SARS-CoV-2 spike protein is similar to other SARS-CoV-related viruses, some of which also use ACE2 for host cell entry. A comparison of the interaction interfaces of the SARS-CoV-2 receptor-binding domain and that of SARS-CoV with ACE2 reveals some variations that could, in principle, strengthen the interactions between SARS-CoV-2 and ACE2, or alternatively that could reduce these interactions. Understanding the molecular and atomic details of the mode of binding is likely to pave the way for structure-based rational design of molecules with enhanced affinities to either ACE2 or the SARS-CoV-2 spike protein, which will facilitate development of drugs or antibodies that interfere with or neutralize this interaction and thereby block virus entry and subsequent infection.
Binding of the SARS-CoV-2 spike protein to ACE2 is not sufficient to allow viral entry to the host cell. A further step is required, namely protein priming by host cell proteases. In the case of SARS-CoV-2, the protease appears to be TMPRSS2, although other candidates have been suggested. An inhibitor of TMPRSS2 blocks the ability of SARS-CoV-2 to infect lung cells in a culture dish; likewise, additional drugs that target proteases and thus proteolytic cleavage of the spike protein have shown some benefit in blocking virus entry into cells. Despite these encouraging studies, there is still a long way to go before these preliminary research findings can be translated to human clinical trials. It should also be stressed that the pace of research into SARS-CoV-2 entry into cells is extraordinarily rapid, to say the least, and published findings will need to be reproduced by multiple studies in multiple laboratories before investment in any particular therapeutic approach is warranted.
Most basic science studies on SARS-CoV-2 have focused on its mode of entry into cells via the interaction of the spike protein and ACE2; less is known about how the virus works once it gains access to the host’s cellular machinery. However, much can be inferred from previous work performed on other coronaviruses.6 Even though the fine details may differ among them, the intracellular mode of SARS-CoV-2 action is likely to be quite similar. For instance, the nucleocapsid protein in other coronaviruses functions primarily by binding to the coronavirus RNA genome making up the nucleocapsid, and it is also involved in the host-virus response. Ditto for the matrix protein, which is involved in the completion of viral assembly. The function of the small envelope protein is less understood, except that it is involved in virus production and maturation within cells. Further details of the life cycle of SARS-CoV-2 within cells are likely to emerge in the next year or so as more studies focus on aspects of virus function unrelated to the spike protein.
In closing, I would like to make a few personal observations that may be of interest to readers of Inference. While the sudden appearance of COVID-19 has led to a dramatic change in how we go about our daily lives, there is hope that vaccines and drugs will eventually be developed that help alleviate the dreadful effects of infection by SARS-CoV-2, and possibly of other coronaviruses that are likely to emerge in the future. This will only occur if governments and federal agencies worldwide support basic, empirical science programs over the long term. For example, biochemical studies on protein–protein interactions might seem esoteric to some, but determining the mode of interaction between the SARS-CoV-2 spike protein and its receptor, ACE2, is critical for drug development for COVID-19. How can such interactions be studied? The tools used by basic scientists might seem obscure, but they are the bread and butter of biochemistry, cell biology, and structural biology. As a result of such approaches, basic science has led to enormous advances over the past century: compare the response to SARS-CoV-2 to that of the Spanish flu. Additionally, basic science has an inbuilt mechanism for self-policing, known as peer review, which can often be a frustratingly lengthy process. Recently, with the need for rapid publication of research data on SARS-CoV-2, a number of websites and journals have appeared in the biological sciences that are publishing data prior to peer review. While this is laudable if it leads to more rapid dissemination of scientific knowledge, care must be taken so that fake scientific data, which has not or cannot be reproduced, does not proliferate to the detriment of the scientific enterprise.
Finally, we also need to be aware of the current preponderance of anti-science agendas among some circles, which could severely limit progress in scientific research. Even if we are people of faith who believe that there is more to life than atoms and molecules, I trust that we will give credit to science where credit is due, and credit to the scientific method and enterprise where it is due. This will allow the scientific community to develop innovative ways to treat the effects of this devastating virus.