Amino Acid Properties
To investigate the effect of electrostatics on epistasis, we classified amino acids as positively charged (+), negatively charged (-), or neutral (0). To incorporate every wildtype-mutant pair state would be infeasible due to overparameterization, as it would result in 34=81 possible categories (++;–, ++;-+, ++;+-, +-;–, …). To avoid overparameterization, we explored various abstractions of this data, incorporating this into our model selection process (detailed below). The resulting charge contribution was given by a simplified charge-interaction scheme with pairs belonging to one of three categories: attractive (+- or -+, denoted “A”), repulsive (– or ++, denoted “R”), and neutral (all other cases, denoted “0”). The reverse of each wildtype-mutant states were classified as the same (e.g. 0;A = A;0), resulting in four categories: 0A, 0R, AR, and 00 to capture all possible electrostatic interactions. Note that the AR case was not present in either dataset.
To include the change in size for the constituent amino acids we used the van der Waals volume in Å. To capture the net effect due to the change in size for both sites we used the metric (referred to as sizenet.
sizenet = | sizem1 - sizewt1| + | sizem2 - sizewt2| (EQ 2)
where wt and m correspond to the wildtype and mutant amino acids, respectively, and 1 and 2 denote the amino acid sites in an arbitrary order. Under this scheme, if one or both sites undergo a large/small change in volume occupancy the corresponding metric will be large/small respectively, even if they are in opposing directions.
To include the effect of hydrophobicity, each residue is classified as either “H” for hydrophobic or “P” for polar. Using all possible 16 categories would be possible, but risk overfitting. We instead found the following abstraction: a boolean value (“0” or “1”) that denotes whether the net hydrophobicity of the pair changed upon mutation. For example, HP;PH would give 0 since the net hydrophobicity remained the same. By contrast, PP;HP or PP;HH would both give 1, since the net hydrophobic state changed upon mutation.