Amino Acid Properties
To investigate the effect of electrostatics on epistasis, we classified
amino acids as positively charged (+), negatively charged (-), or
neutral (0). To incorporate every wildtype-mutant pair state would be
infeasible due to overparameterization, as it would result in
34=81 possible categories (++;–, ++;-+, ++;+-,
+-;–, …). To avoid overparameterization, we explored various
abstractions of this data, incorporating this into our model selection
process (detailed below). The resulting charge contribution was given by
a simplified charge-interaction scheme with pairs belonging to one of
three categories: attractive (+- or -+, denoted “A”), repulsive (–
or ++, denoted “R”), and neutral (all other cases, denoted “0”). The
reverse of each wildtype-mutant states were classified as the same (e.g.
0;A = A;0), resulting in four categories: 0A, 0R, AR, and 00 to capture
all possible electrostatic interactions. Note that the AR case was not
present in either dataset.
To include the change in size for the constituent amino acids we used
the van der Waals volume in Å. To capture the net effect due to the
change in size for both sites we used the metric (referred to as
sizenet.
sizenet = | sizem1 -
sizewt1| + | sizem2 -
sizewt2| (EQ 2)
where wt and m correspond to the wildtype and mutant amino acids,
respectively, and 1 and 2 denote the amino acid sites in an arbitrary
order. Under this scheme, if one or both sites undergo a large/small
change in volume occupancy the corresponding metric will be large/small
respectively, even if they are in opposing directions.
To include the effect of hydrophobicity, each residue is classified as
either “H” for hydrophobic or “P” for polar. Using all possible 16
categories would be possible, but risk overfitting. We instead found the
following abstraction: a boolean value (“0” or “1”) that denotes
whether the net hydrophobicity of the pair changed upon mutation. For
example, HP;PH would give 0 since the net hydrophobicity remained the
same. By contrast, PP;HP or PP;HH would both give 1, since the net
hydrophobic state changed upon mutation.