
You are currently viewing the abstract.
View Full TextThe principles of protein assembly
A knowledge of protein structure greatly enhances our understanding of protein function. In many cases, function depends on oligomerization. Ahnert et al. used mass spectrometry data together with a large-scale analysis of structures of protein complexes to examine the fundamental steps of protein assembly. Systematically combining assembly steps revealed a large set of quaternary topologies that were organized into a periodic table. Based on this table, the authors accurately predicted the expected frequencies of quaternary structure topologies.
Science, this issue p. 10.1126/science.aaa2245
Structured Abstract
INTRODUCTION
The assembly of proteins into complexes is crucial for most biological processes. The three-dimensional structures of many thousands of homomeric and heteromeric protein complexes have now been determined, and this has had a broad impact on our understanding of biological function and evolution. Despite this, the organizing principles that underlie the great diversity of protein quaternary structures observed in nature remain poorly understood, particularly in comparison with protein folds, which have been extensively classified in terms of their architecture and evolutionary relationships.
RATIONALE
In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization. Our approach was to consider protein complexes in terms of their assembly. Many protein complexes assemble spontaneously via ordered pathways in vitro, and these pathways have a strong tendency to be evolutionarily conserved. Furthermore, there are strong similarities between protein complex assembly and evolutionary pathways, with assembly pathways often being reflective of evolutionary histories, and vice versa. This suggests that it may be useful to consider the types of protein complexes that have evolved from the perspective of what assembly pathways are possible.
RESULTS
We first examined the fundamental steps by which protein complexes can assemble, using electrospray mass spectrometry experiments, literature-curated assembly data, and a large-scale analysis of protein complex structures. We found that most assembly steps can be classified into three basic types: dimerization, cyclization, and heteromeric subunit addition. By systematically combining different assembly steps in different ways, we were able to enumerate a large set of possible quaternary structure topologies, or patterns of key interfaces between the proteins within a complex. The vast majority of real protein complex structures lie within these topologies. This enables a natural organization of protein complexes into a “periodic table,” because each heteromer can be related to a simpler symmetric homomer topology. Exceptions are mostly the result of quaternary structure assignment errors, or cases where sequence-identical subunits can have different interactions and thus introduce asymmetry. Many of these asymmetric complexes fit the paradigm of a periodic table when their assembly role is considered. Finally, we implemented a model based on the periodic table, which predicts the expected frequencies of each quaternary structure topology, including those not yet observed. Our model correctly predicts quaternary structure topologies of recent crystal and electron microscopy structures that are not included in our original data set.
CONCLUSION
This work explains much of the observed distribution of known protein complexes in quaternary structure space and provides a framework for understanding their evolution. In addition, it can contribute considerably to the prediction and modeling of quaternary structures by specifying which topologies are most likely to be adopted by a complex with a given stoichiometry, potentially providing constraints for multi-subunit docking and hybrid methods. Lastly, it could help in the bioengineering of protein complexes by identifying which topologies are most likely to be stable, and thus which types of essential interfaces need to be engineered.
Three main assembly steps are possible: cyclization, dimerization, and subunit addition. By combining these in different ways, a large set of possible quaternary structure topologies can be generated. These can be arranged on a periodic table that describes most known complexes and that can predict previously unobserved topologies.
Abstract
Structural insights into protein complexes have had a broad impact on our understanding of biological function and evolution. In this work, we sought a comprehensive understanding of the general principles underlying quaternary structure organization in protein complexes. We first examined the fundamental steps by which protein complexes can assemble, using experimental and structure-based characterization of assembly pathways. Most assembly transitions can be classified into three basic types, which can then be used to exhaustively enumerate a large set of possible quaternary structure topologies. These topologies, which include the vast majority of observed protein complex structures, enable a natural organization of protein complexes into a periodic table. On the basis of this table, we can accurately predict the expected frequencies of quaternary structure topologies, including those not yet observed. These results have important implications for quaternary structure prediction, modeling, and engineering.
Figures
Protein assembly steps lead to a periodic table of protein complexes and can predict likely quaternary structure topologies. Three main assembly steps are possible: cyclization, dimerization, and subunit addition. By combining these in different ways, a large set of possible quaternary structure topologies can be generated. These can be arranged on a periodic table that describes most known complexes and that can predict previously unobserved topologies.
Fig. 1 Mass spectrometry characterization of heteromer (dis)assembly pathways. For each characterized complex, the known three-dimensional structure is shown with a representative mass spectrum, accompanied by graph representations of the full complex and subcomplexes. In all cases, the full complex is represented by the rightmost graph. A full list of subcomplexes is provided in table S1. The structures of 3DVA, 3O8O, and 4B7Y shown here differ from those in the PDB: 3DVA is missing the γ subunit, because it was not present in our sample, and the 4:4 model of 3O8O and the 4:2 model of 4B7Y were built from the unit cell to match the mass spectrometry data. Colors in the graph representations indicate homomeric isologous (green), homomeric heterologous (blue), and heteromeric heterologous (red) interfaces; shapes indicate different subunit types.
Fig. 2 Types of assembly steps observed in homomeric and heteromeric complexes. (A) The five possible types of assembly steps. (B and C) Distribution of observed assembly steps for homomers and heteromers from mass spectrometry experiments, from assembly pathways identified in the literature, and from complexes with varying quaternary structures in the PDB. Error bars represent 68% Clopper-Pearson confidence intervals.
Fig. 3 Three assembly transitions give rise to the topological space of protein complexes. These transitions are cyclization via a homomeric isologous interface (blue), dimerization via a homomeric isologous interface (green), and subunit addition via a heteromeric heterologous interface (red). We enumerated all possible topologies arising from these steps by calculating all ways in which a cyclic or dihedral interface can be distributed across a heteromer with 1:1 stoichiometry. For heterodimers, there are two such ways for both the cyclization and the dimerization steps. For heterotrimers, there are four such ways for each step. In the graph representation of the enumeration step, the possible locations of the distributed interfaces are indicated by colored dots.
Fig. 4 Frequencies of protein complex types and their quaternary structure assignment error rates. Among nonbijective heteromers, we further distinguished between those with even stoichiometry and those with uneven stoichiometry. The former are much more like to be the result of quaternary structure assignment errors. The latter are more likely to represent a biologically relevant quaternary structure. In the last column, we give alternative error rates in brackets that exclude the PiQSi (17) error assignments “probably yes” and “probably no” from the analysis. These error rates follow the same pattern for nonbijective heteromers of both even and uneven stoichiometries.
Fig. 5 Periodic table of protein complexes. All bijective protein complex topologies can be arranged according to the number of different subunit types (s) and the number of times these subunits are repeated (r). Isologous interfaces between the same subunits (dihedral interfaces) are shown in green, and heterologous interfaces between subunits of the same types (cyclic interfaces) are shown in blue. Heteromeric interfaces are shown in red, apart from those that correspond to a symmetric dimerization (yellow) or to higher-order cyclization (purple). The topologies in the s = 1 row are the equivalent homomers of the heteromeric structures in the s > 1 rows. To clarify this equivalence, subunits in the heteromers are grouped according to the repeated subcomplexes. In addition, the yellow and purple interfaces of the heteromeric complexes highlight interfaces that are dihedral (green) and cyclic (blue) in the equivalent homomers. The ratio in the bottom right of each cell indicates the number of topologies that have been observed and the total number of possible topologies of this type. The table shown here is an excerpt (s < 5; r < 13) of the full table. An interactive version of this table with information on the structures represented by each topology can be found at http://www.periodicproteincomplexes.org/. [Inset (A)] Number of discovered topologies as a function of time, which has been steadily increasing at a rate of about four topologies per year for the past two decades. [Inset (B)] An illustration of observed topologies versus all possible topologies with six repeats and two subunits (r = 6; s = 2). Three of the possible five topologies have been observed thus far.
Fig. 6 The top 20 most likely quaternary structure topologies from our model that are not observed in the main data set. Of these top 20, six are observed in the extended data set, validating the power of the model (P = 2 × 10−6). The other 14 topologies in the top 20 are also expected to occur relatively frequently in nature and thus to be observed soon in experimentally determined structures. The distribution of all new topologies observed in the extended data set compared with the expected frequencies of all predicted topologies is shown in fig. S7.
Additional Files
- Principles of assembly reveal a periodic table of protein complexes
Sebastian E. Ahnert, Joseph A. Marsh, Helena Hernández, Carol V. Robinson, Sarah A. Teichmann