2.1: Curating experimental data
Experimental binding affinity data was obtained from SKEMPI v2.043and folding stability data from ProTherm
444.
Since the focus of our study is pairwise epistasis, we extracted a
subset of the data consisting of all instances where there was data for
a double mutant and the corresponding constituent singles. For both
folding and binding data, values were converted to kcal/mol. A
temperature of 298 K was used if not specified in the dataset. Averages
were calculated for mutations that included multiple free energy values.
The attributes in the resulting curated folding and binding datasets
used in our study include the PDB ID, protein complex name, the
mutation(s), and either binding or folding free energy values. The total
number of data points for double mutants with constituent single mutants
were 572 from 58 protein-protein complexes for binding, and 204 from 30
protein systems for folding. Epistasis was calculated for each double
mutation data point using EQ 1, that is, by taking the difference
between the free energy change due to the double mutation and the sum of
the free energy changes due to the constituent single mutations. Protein
structures used for analysis were acquired from the RCSB Protein Data
Bank
(PDB)45.