1. Introduction
Multiple amino acid mutations can interact in biological systems, leading to nonadditive effects termed epistasis. While a general understanding of the concept of epistasis has existed for many years, the prevalence of epistasis, or its importance in biological systems, is still a matter of debate1–5. Some believe it is a major force in evolution, either by constraining the available pathways for systems to evolve, by counteracting mutations that reduce fitness through compensatory effects, or by contributing to a more rugged fitness landscape6–18. Others have explored the epistatic effect between sets of beneficial mutations, finding that epistasis is pervasive and a key aspect of adaption, but leading to diminishing returns or negative epistasis10,19,20,20–22. Other studies using RNA viruses have shown that epistasis is prevalent and likely a mechanism for their evolution23–28. Epistasis has also been shown to be a likely contributing factor to drug and antibody resistance of influenza A, HIV-1 and other pathogens12,24,29,30, and for general disease susceptibility in humans31. Finally, the complexity that epistasis provides in understanding mutation effects must be accounted for in protein engineering and design32–35.
For pairs of simultaneous mutations in proteins (we will refer to these as “double mutations”), epistasis can be expressed in terms of free energy differences:
ϵ = 𝚫𝚫G1,2 - (𝚫𝚫G1 + 𝚫𝚫G2) (EQ 1)
Where 𝚫𝚫G1,2 corresponds to the change in the folding or binding free energy due to the double mutation, and 𝚫𝚫G1+ 𝚫𝚫G2 refers to the sum of the constituent single mutation free energy changes. This nonadditivity can be caused by direct interactions between mutational sites, or by indirect effects such as conformational perturbations. Epistasis is positive when the double mutant is more stabilizing than the sum of the constituent singles (ϵ < 0) and negative when the double mutant is more destabilizing than the sum of the constituent singles (ϵ > 0).
Despite its importance to understanding biological systems, a comprehensive mechanistic picture of the drivers of epistasis in proteins is not known. An early attempt to explain epistasis mechanisms is a study by Wells36; they concluded that features like separation distance, electrostatic interactions, and conformational perturbations were likely contributors. However, this conclusion was based on a small data set containing a total of 12 folding and binding systems, with less than 75 total multiple mutations. More recent studies have examined specific protein systems like TEM-1 β-lactamase37,38and the IgG-binding domain of protein G39, finding pervasive negative epistasis. Long-range epistasis has also received attention Gromiha et al. proposed that distant residues that are part of a specific local group (they defined this as a rigid cluster) could lead to epistasis40. Other researchers have used tools like molecular dynamics to analyze if networks of interactions can mediate long-range epistasis41. Classification systems have also been developed. Jemimah et al. used structural features to build a model to classify whether mutational pairs would be additive (i.e., not epistatic)42. These previous studies provide a basis for understanding possible contributors to epistasis and some even offer predictive capability, however they do not provide a complete understanding of epistasis mechanisms and their interactions.
In this study, we determine biophysical drivers of pairwise epistasis in protein systems and rank their contribution to the observed epistasis, ϵ (EQ 1). We used protein structural data, protein-protein binding affinities, and protein folding stabilities from the largest, most diverse datasets currently available. We explored possible relationships between the observed epistasis and features that are intrinsic to both the proteins and the mutated residues. A statistical model selection procedure was performed to determine the features that are most important to explaining the observed epistasis. The models determined for binding and folding have similar and modest predictive power. Both models contain similar features that include separation distance and charge interactions. Our work serves as a stepping stone to further our understanding of the biophysical drivers of epistasis, and to build future models with more complex features and interactions.