Symmetry is ubiquitous in nature. It occurs in snowflakes, crystals, and molecules; and it is central to our understanding of subatomic particles. Although the importance of symmetry in chemistry and physics is widely appreciated, its significance in biology is often underestimated. Viruses are notable examples of biological systems in which symmetry plays a pivotal role. Mathematical techniques directed at understanding their structure and symmetry have a transformative potential both within virology and biology as a whole.
Viral Symmetry
Viral genomes are packaged in protein containers known as viral capsids. These containers function like Trojan horses, facilitating the release of the viral genome into host cells and providing protection from host defense mechanisms between rounds of infection. In their simplest form, viral capsids are less than 20 nanometers (nm) in diameter and encapsulate short genomic sequences that code for only a few viral proteins. In larger and more complex viruses, viral capsids can be over 500nm in diameter, accommodating genomes that are more than a thousand-fold longer and which code for many different viral components. Despite variations in size and complexity, the majority of these molecular Trojan horses share a fascinating feature: like tiny soccer balls, virus capsids exhibit the 2-fold, 3-fold, and 5-fold rotational symmetries characteristic of icosahedral symmetry.
The principle of genetic economy, introduced by Francis Crick and James Watson, explains the occurrence of symmetry in virology.1 Viruses synthesize multiple identical copies of a capsid protein from the same genetic message in order to minimize the sequence coding for capsid building blocks. The capsid itself must make do with limited coding. Identical building blocks interacting in the same way across the capsid surface self-assemble into symmetric capsids. Recognizing this, Crick and Watson concluded that viruses must be either helical or exhibit cubic symmetry— the same symmetries as the convex regular polyhedra known as Platonic solids.2 These are the only polyhedra formed from identical copies of the same regular polygonal shape: the cube from 6 squares, the dodecahedron from 12 pentagons, and the tetrahedron, octahedron, and icosahedron from, 4, 8, and 20 equilateral triangles, respectively.
Both the icosahedron and its dual, the dodecahedron, possess icosahedral symmetry, and together they comprise the largest rotational symmetry group in three dimensions.3 Given a set of identical building blocks under a repeating pattern of assembly, it is an icosahedrally symmetric shape that optimizes container volume. Since viruses are under selective pressure to package their genetic material efficiently, it is no surprise that evolution has forced their convergence to icosahedrally symmetric shapes.
Icosahedral Symmetry
Unlike other types of rotational symmetries in three dimensions, icosahedral symmetry is noncrystallographic: it is not possible to tessellate a three-dimensional physical space using a single icosahedrally symmetric shape without creating gaps or overlaps. Icosahedral tessellations are possible by means of quasiperiodic tilings constructed from more than one shape. Penrose tilings are a prominent example. Such quasiperiodic structures occur in alloys formed from aluminum and manganese—a discovery for which Dan Shechtman was awarded the 2011 Nobel Prize in Chemistry. Icosahedral symmetry also plays a central role in carbon chemistry. In 1996, Sir Harry Kroto was awarded the Nobel Prize in Chemistry for discovering an icosahedrally symmetric carbon cage structure formed from 60 carbon atoms. The molecule was named buckminsterfullerene after the geodesic dome designed by Buckminster Fuller. Carbon cage structures also occur as nested cages, known as carbon onions, in which icosahedrally symmetric shells of decreasing size are nested, one inside another, in the manner of a matryoshka doll. Under special conditions, even water molecules can form such nested cage structures with icosahedral symmetry.
Icosahedral symmetry alone cannot account for the array of viral architectures found in nature. Most viruses are formed from more than 60 capsid proteins, but the symmetry group does not act transitively on capsids with more than 60 protein components. Indeed, for most viruses there is no protein-to-protein mapping in the capsid surface using the icosahedral symmetry group that covers the entire capsid. Additional principles are needed to explain the positions of the capsid proteins in these larger capsids.
In 1962, Donald Caspar and Sir Aaron Klug introduced the concept of quasi-equivalence.4 Larger capsids formed from identical protein units, they argued, must exhibit similar local configurations of proteins across the entire capsid surface. Such capsids should be organized according to surface lattices that exhibit icosahedral symmetry and are formed from only one type of shape. In their paper, Caspar and Klug investigated hexagonal icosahedral tessellations via embeddings of the planar icosahedral surface into a hexagonal, or honeycomb, lattice. Only planar embeddings for which the hexagonal tiles do not have any gaps or overlaps are possible as models of viral geometry. According to Euler’s theorem, in order to create a closed polyhedral surface, 12 hexagons must be replaced by pentagons in the transition from the planar representation to three-dimensional shapes. The polyhedral models of viral architecture described by Caspar and Klug theory (CKT) must have 12 pentagonal and 10(T – 1) hexagonal faces,5 where T = n2 + nk + k2 is the triangulation number, in which n and k are either integer values or zero. By interpreting pentagons and hexagons as clusters of five- and six-protein subunits, CKT not only indicates the positions of protein subunits in the capsid surface, but also predicts that viral capsids must be composed of precisely 60T proteins. This provides a classification of viral architectures that predicts protein numbers for viral capsids to be quantized.
Viral Tilings
This prediction was thought universally true for several decades until improvements to the resolution of virus imaging techniques began revealing increasing numbers of outliers. The cancer-causing papilloma- and polyomaviruses were early examples. Although their capsids are formed from multiple identical copies of a capsid protein, these viruses have a capsid protein number of 360, outside the range of allowed values.6 In these capsid architectures, the proteins are organized into 72 pentamers rather than the expected 12. Neither exhibit any of the hexamers, or six-fold clusters, that are characteristic of larger CKT geometries. The conflict was resolved when I revisited the assumption that identical capsid proteins must interact in the same way across the capsid surface.7 In this study, papilloma- and polyomavirus tilings used rhombs and kites to identify the positions of the capsid proteins and represent the two types of interactions between them. This reflects the fact that in the papilloma- and polyomaviruses, some of the capsid proteins interact in pairs (dimer interactions), while others interact in groups of three (trimer interactions), the arrangement resembling a daisy chain. This was the first use of tilings in virology, and provided the foundation for viral tiling theory (VTT).8
The capsid models used in VTT not only predict the numbers and relative positions of capsid proteins, but the sites and types of interactions that stabilize the capsid, as well. These models are able to capture structural features that determine the biophysical properties of viruses, which in turn influence how viruses form, evolve, and infect their hosts. Some viruses have capsid architectures that open pores through which the viral genome is released. These capsid rearrangements can be modeled as transitions between different lattice models in VTT, revealing the expansion pathways and mechanisms underpinning genome release.9
Hidden Geometries
Both CKT and VTT assume that viral capsids must be formed from a single type of capsid protein. Although this appears to be true for smaller viruses, more than one type of capsid protein is the rule rather than the exception in larger and more complex viruses. As a result, quasi-equivalence is not universally applicable to virology. This issue has been addressed by a new generalized theory of quasi-equivalence in which each capsid protein must have the same mode of interaction with proteins of the same type.10 This approach retains quasi-equivalence within a protein species, but allows for distinct interactions with other types of proteins. Interactions within each protein species are represented by a specific type of polygon characteristic of that protein species. Consider the herpes simplex virus, formed from a major and a minor capsid protein. The major capsid proteins self-organize locally in groups of six and are represented by hexagons in models of viral geometry. The minor capsid proteins form trimer interactions and are modeled using triangles. In order to classify all the polyhedral layouts, each icosahedral surface must be embedded in a lattice formed from regular polygons. Given that distinct protein species form the same type of interaction across the capsid surface, different polygons representing interactions within species should meet in the same local configuration throughout the surface lattice. These lattices must therefore be constructed from regular polygons with a single vertex type. Lattices with this property were studied by Johannes Kepler in his Harmonices Mundi and are known as Archimedean lattices. Using the same method, Antoni Luque and I constructed and classified polyhedral models from the Archimedean lattices, enumerating all the possible ways in which capsids can be organized.11 Since the hexagonal lattice is one of the Archimedean lattices, our theory contains the polyhedral models of CKT as a special case. It encompasses new capsid architecture models that explain outliers to CKT, as well as models providing layouts for viruses exhibiting hitherto forbidden capsid protein numbers.
Although successful in explaining virus structure, the generalized theory leaves open the question of why viruses realize geometries with more than one type of capsid protein. Such geometries come at a significant additional coding cost. Capsids exhibiting these novel geometries have new biophysical properties, as well. Such capsid geometries may confer varying degrees of stability to the viruses that adopt them, and support different strategies for genome release. This may mean that they are better adapted to their specific function in the viral life cycle, their increased coding costs offset by a gain in fitness.12
Affine Extensions
Viral symmetry is more than skin deep. Indeed, symmetry manifests itself differently across the radial levels of a virus, whether at the inner capsid surface or the capsid exterior at the epitopes, which are important for immune recognition. In 2016, Aloysio Janner argued that so-called encasing forms might represent the entire three-dimensional structures of viral capsids.13 These forms were derived by embedding virus structures into three-dimensional lattices, adding a radial dimension to the description of the virus and its material boundaries. Due to the noncrystallographic nature of icosahedral symmetry, quasiperiodic structures lend themselves well to the construction of encasing forms. To describe their vertices in the context of aperiodic lattices, our team developed affine extensions of noncrystallographic groups. Similar affine extensions had long been used in mathematical physics in the context of crystallographic lattices, but my team was the first to develop this concept for the noncrystallographic case. We matched the affine extensions with material boundaries in a number of viruses and, in the process, revealed a molecular scaling principle that relates viral features at different radial levels.14 We have shown that the atomic positions of carbon onions can also be described with this approach,15 extending the scope of the new mathematical structures from virology into carbon chemistry.
These achievements aside, a fundamental issue still remained unresolved. Quasilattices are infinite structures, while viral capsids are finite. As a result, quasilattices contain more information than needed to describe a virus. The same issue potentially also applies to the quasicrystal structures realized by Shechtman’s aluminum and manganese alloys; but these structures could, at least theoretically, be extended into space by means of the same local construction. With this in mind, we refined our group theoretical approach. Instead of working with affine extensions of symmetry groups, we constructed their orbits—the elements related by a group action. We worked in six dimensions, the minimal dimension in which the icosahedral group is both crystallographic and has an invariant three-dimensional subspace. Viral geometry can then be modeled in a way mimicking the shadows in Plato’s allegory of the cave.16 Multi-shell models that exhibit icosahedral symmetry at every radial level were generated by constructing orbits of lattice groups in six dimensions. The precise structure of each radial level in these models and their relative spacing are characterized by a projection of these into a three-dimensional subspace that is invariant under the icosahedral group.17
Assembly Code
Viral genomes code for the proteins required to build their capsids, as well as others that perform vital functions in viral lifecycles. For simple viruses, these are basic functions such as genome replication and lysis. For larger viruses, the proteins and their functions are more complex. It turns out that viral genomes can also encode virus assembly instructions, a discovery that I made with Peter Stockley.18 Many viruses, including major human pathogens, contain an assembly code tightly embedded within the genetic code. The assembly code is composed of secondary structure elements, termed packaging signals (PSs), that are dispersed across the viral genome and serve to regulate and promote virus assembly. The existence of an assembly code had long been overlooked, and viral genomes assumed to be passive during capsid formation. This is indeed true for larger viruses, and in particular bacteriophages, that store their genetic material in the form of DNA. But single-stranded RNA viruses, the largest group of viruses comprising major human pathogens, appear to use PS-mediated assembly in the arms race against host defense mechanisms.
In many viruses, cryo-electron microscopy reveals multiple contacts between the packaged genomes and the inner capsid surfaces. These contacts were not thought to be sequence specific, but instead mediated by electrostatic interactions between the genomic RNA and capsid protein. A repeat pattern, which would point toward a sequence-specific interaction, was difficult to discern from the genomic RNA. Viral geometry provided a new perspective.19 The capsid shell’s icosahedral symmetry imposes constraints on the relative positions of contacts in the linear genomic sequence. These constraints became key in finding repeat patterns that are too sparse to be easily detected by sequence analysis alone. A reevaluation of the multi-shell models of capsid geometry revealed a radial level of points located at the contact sites between genomic RNA and capsid protein.20 These points can be connected to form a polyhedron whose vertices mark potential PS-binding sites at the inner capsid shell. By enumerating the PSs along the linear genomic sequence and connecting the vertices corresponding to their position in the polyhedral shell, we were able to model constraints on genome organization as self-avoiding paths on the polyhedron. Each binding site can only be occupied once. When all possible PS sites are occupied, such a path is referred to as a Hamiltonian path.21 It is also possible that only a subset of the positions are occupied.
In order to identify putative PSs in the bacteriophage MS2, we combined this graph theoretical representation of genome organization with bioinformatics.22 This method is now known as Hamiltonian path analysis.23 A surprising conclusion from this work: the genomic RNA 5' and 3' of a specific secondary structure element, known as the translational repressor, occupies a separate hemisphere of the capsid. This has been confirmed by a recent asymmetric cryo-electron microscopy reconstruction of MS2. All of the PSs identified as part of this process are contained in the list of PSs identified via Hamiltonian path analysis.
Packaging Signal–Mediated Assembly
Viral geometry has also played a key role in the discovery of PS-mediated assembly. How do viruses navigate the complex landscape of possible assembly pathways in order to find the most efficient route to capsid formation? Cyrus Levinthal raised the same question with respect to proteins, which fold quickly into their biologically functional state. To answer this question, our team used Hamiltonian path analysis in combination with stochastic simulations of capsid assembly around coarse-grained models of their genomes.24 This work revealed a surprising result: the high variation in PS sequence around a sparse capsid protein recognition motif—the very reason why a PS consensus motif was so difficult to detect—is essential for the functioning of the mechanism. This is because it confers different affinities for capsid protein to distinct PSs. Mutation and selection in viral evolution tunes PS affinities via such sequence variation to those that make for efficient assembly scenarios. This effect can only be observed at the low protein concentrations that are characteristic of the early stages of viral replication after infection of a host cell, when capsid protein is still being synthesized; it would be masked in regimes involving high protein concentrations, such as those that are traditionally used as part of in vitro virus assembly experiments. Strong variations around a minimal and noncontiguous sequence motif led virologists astray, and so did the previously unrecognized importance of studying low capsid protein concentrations.
The discovery of PS-mediated assembly has important implications for the development of new antiviral therapies. Following the identification of PSs in MS2, a research effort was initiated to identify PSs in a wide range of viruses, including both hepatitis B, picornaviruses, and most recently coronaviruses.25 Novel antiviral strategies are now being explored that inhibit interactions between PSs and capsid proteins by means of small molecular weight compounds. Since PSs are dispersed across the genome, the acquisition of therapy resistance through mutation is less likely to occur in response to PS-targeting drugs than in response to therapeutic options directed against traditional, localized protein targets.26 The fact that different strain variants in a viral family exhibit the same PS consensus motif is an enticing prospect for broad-spectrum antiviral therapy. Our team has demonstrated that it is possible to isolate and optimize the PS-mediated assembly code, resulting in assembly substrates that outcompete wild-type genomes in experiments.27 This has, in turn, paved the way for new technologies that repurpose the PS-mediated assembly code to design viruslike particles for applications in vaccination and drug delivery.
Modeling the Future
Imaging viruses without icosahedral averaging has revealed asymmetric components that distort the icosahedral symmetry of the capsid. Asymmetric capsid components play a role in the packaging and release of the genome, and impact the dynamics and stability of the capsids. Modeling the consequences of such symmetry breaking for function will require new developments in mathematics along the lines of our recent work.28 A detailed understanding of viruses at the single-particle level remains only a first step in the modeling of viral infections. Viral geometry informs infection models at the intracellular level in a way that can be integrated into multiscale models comprising intercellular and between-host dynamics—as we have recently demonstrated for COVID-19.29 Such models further our understanding of viral infections across different scales and provide a computational platform to study the merits of different antiviral therapies. They also provide insights into the evolution of viral quasispecies—the populations of genetically related mutant viruses that collectively infect their hosts. These models are not only an important step toward understanding how viruses spread and evolve, they also point toward novel antiviral solutions.