Defining the rules for rheostatic modulation of protein function (marked for deletion)
Although dozens of computer algorithms have been developed to predict the outcomes of protein mutations. many improvements are needed.1.2 This deficit significantly limits protein design. which still relies on brute force screening of massively mutated libraries. Further. since any two unrelated people have >10.000 amino acid differences. faulty protein algorithms are one of the factors that limit genomic diagnosis rates in personalized medicine to less than 25%.3
Rather than devise another algorithm. we examined several assumptions that underlie many predictions: First. most algorithms use sequence alignments of evolutionarily related proteins as part of their input. Positions that show few changes (conserved) are presumed to be critical for function; the rationale is that mutations at conserved positions must have been so damaging that they were selected against. Second. a set of substitution rules (Box 1). derived from decades of laboratory experiments is either explicitly or implicitly included in most analyses. These rules are reasonably successful for predicting mutational outcomes at conserved positions. perhaps because lab experiments have been highly biased to conserved positions.2
In contrast. nonconserved positions are seldom mutated in the lab. even though they often comprise >50% of the positions in a sequence alignment. Moreover. nonconserved positions can make important contributions to function: Protein paralogs evolve functional variation via changes at nonconserved positions; changes at nonconserved positions also cause disease.ref PYK We hypothesized that. for a subset of nonconserved positions. the assumptions above give rise to false mutation predictions.
This hypothesis is supported by our experiments with LacI/GalR homologs.4Using these proteins. we identified a class of important. nonconserved amino acid positions that does not follow any of the rules in Box 1. Our ongoing experiments show that predictions for substitution outcomes have nearly zero power at these positions. We named this group “rheostat positions". after their most prominent characteristic: When multiple amino acids were substituted at one rheostat position. functions of the mutant proteins could be rank-ordered to show a progressive effect (Fig. 1A. page 3).
Thus. to improve predictions. we must explicitly consider contributions from rheostat positions. To that end. the following objectives must be addressed: (i) develop a library of rheostat positions to guide and benchmark future computational studies; (ii) develop general methods to identify rheostat positions; (iii) compare and contrast the contributions of rheostat positions in proteins subject to different structural constraints; and (iv) generate the biophysical data required to formulate new substitution rules for rheostat positions. Work is ongoing to study rheostat positions in globular proteins (described in section 2. below).To enhance the impact of our studies and to expand the range of objectives addressed. we will use high risk/high reward protein targets described below. Aims 1 and 2 will generate a database of experimentally identified rheostat positions and their mutational outcomes in proteins that evolved under varied structural constraints. Aim 3 will evaluate whether the locations of rheostat positions correlate with patterns of change found in sequence alignments.