1. Introduction
Multiple amino acid mutations can interact in biological systems,
leading to nonadditive effects termed epistasis. While a general
understanding of the concept of epistasis has existed for many years,
the prevalence of epistasis, or its importance in biological systems, is
still a matter of
debate1–5.
Some believe it is a major force in evolution, either by constraining
the available pathways for systems to evolve, by counteracting mutations
that reduce fitness through compensatory effects, or by contributing to
a more rugged fitness
landscape6–18.
Others have explored the epistatic effect between sets of beneficial
mutations, finding that epistasis is pervasive and a key aspect of
adaption, but leading to diminishing returns or negative
epistasis10,19,20,20–22.
Other studies using RNA viruses have shown that epistasis is prevalent
and likely a mechanism for their
evolution23–28.
Epistasis has also been shown to be a likely contributing factor to drug
and antibody resistance of influenza A, HIV-1 and other
pathogens12,24,29,30,
and for general disease susceptibility in
humans31.
Finally, the complexity that epistasis provides in understanding
mutation effects must be accounted for in protein engineering and
design32–35.
For pairs of simultaneous mutations in proteins (we will refer to these
as “double mutations”), epistasis can be expressed in terms of free
energy differences:
ϵ = 𝚫𝚫G1,2 - (𝚫𝚫G1 +
𝚫𝚫G2) (EQ 1)
Where 𝚫𝚫G1,2 corresponds to the change in the folding or
binding free energy due to the double mutation, and 𝚫𝚫G1+ 𝚫𝚫G2 refers to the sum of the constituent single
mutation free energy changes. This nonadditivity can be caused by direct
interactions between mutational sites, or by indirect effects such as
conformational perturbations. Epistasis is positive when the double
mutant is more stabilizing than the sum of the constituent singles (ϵ
< 0) and negative when the double mutant is more destabilizing
than the sum of the constituent singles (ϵ > 0).
Despite its importance to understanding biological systems, a
comprehensive mechanistic picture of the drivers of epistasis in
proteins is not known. An early attempt to explain epistasis mechanisms
is a study by
Wells36;
they concluded that features like separation distance, electrostatic
interactions, and conformational perturbations were likely contributors.
However, this conclusion was based on a small data set containing a
total of 12 folding and binding systems, with less than 75 total
multiple mutations. More recent studies have examined specific protein
systems like TEM-1
β-lactamase37,38and the IgG-binding domain of protein
G39,
finding pervasive negative epistasis. Long-range epistasis has also
received attention Gromiha et al. proposed that distant residues that
are part of a specific local group (they defined this as a rigid
cluster) could lead to
epistasis40.
Other researchers have used tools like molecular dynamics to analyze if
networks of interactions can mediate long-range
epistasis41.
Classification systems have also been developed. Jemimah et al. used
structural features to build a model to classify whether mutational
pairs would be additive (i.e., not
epistatic)42.
These previous studies provide a basis for understanding possible
contributors to epistasis and some even offer predictive capability,
however they do not provide a complete understanding of epistasis
mechanisms and their interactions.
In this study, we determine biophysical drivers of pairwise epistasis in
protein systems and rank their contribution to the observed epistasis, ϵ
(EQ 1). We used protein structural data, protein-protein binding
affinities, and protein folding stabilities from the largest, most
diverse datasets currently available. We explored possible relationships
between the observed epistasis and features that are intrinsic to both
the proteins and the mutated residues. A statistical model selection
procedure was performed to determine the features that are most
important to explaining the observed epistasis. The models determined
for binding and folding have similar and modest predictive power. Both
models contain similar features that include separation distance and
charge interactions. Our work serves as a stepping stone to further our
understanding of the biophysical drivers of epistasis, and to build
future models with more complex features and interactions.