| Title: | Protein flexibility and rigidity predicted from sequence |
| Author: | Avner Schlessinger & Burkhard Rost |
| Quote: | Proteins, 2005, 61:115-126 |
Protein flexibility and rigidity predicted from sequence
| 1 | CUBIC, Dept. of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA |
| 2 | Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA |
| 3 | North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA |
| * | Corresponding author: as2067@columbia.edu URL http://cubic.bioc.columbia.edu/ Tel: +1-212-305-4018, fax: +1-212-305-7932 |
This article is published in (Proteins, issue, 2005 and pages) © copyright Proteins: Structure, Function, and Bioinformatics Wiley (2005). Wiley is the only authorized source. All copying of this article including placing on another website requires the written permission of the copyright owner.
Structural flexibility has been associated with various biological processes such as molecular recognition and catalytic activity. In silico studies of protein flexibility have attempted to characterize and predict flexible regions based on simple principles. B-values derived from experimental data are widely used to measure residue flexibility. Here, we present the most comprehensive large-scale analysis of B-values. We used this analysis to develop a neural network-based method that predicts flexible/rigid residues from amino acid sequence. The system uses both global and local information, i.e. features from the entire protein such as secondary structure composition, protein length, and fraction of surface residues, and features from a local window of sequence-consecutive residues. The most important local feature was the evolutionary exchange profile reflecting sequence conservation in a family of related proteins. To illustrate its potential, we applied our method to four different case studies, each of which related our predictions to aspects of function. The first two were the prediction of regions that undergo conformational switches upon environmental changes (switch II region in Ras) and the prediction of surface regions the rigidity of which is crucial for their function (tunnel in propeller folds). Both were correctly captured by our method. The third study established that residues in active sites of enzymes are predicted by our method to have unexpectedly high B-values. The final study demonstrated how well our predictions correlated with NMR order parameters to reflect motion. Our method had not been set up to address any of the tasks in those four case studies. Therefore, we expect that this method will assist in many attempts at inferring aspects of function.
Key words: flexibility prediction, protein dynamics, protein motion, protein structure prediction, solvent accessibility, multiple alignments, secondary structure prediction, protein function prediction, enzyme active sites, conformational switch.
| 1D structure | one-dimensional (e.g. sequence or string of residue secondary structure or numbers for residue solvent accessibility) |
| 3D structure | three-dimensional structure, i.e. co-ordinates of all residues/atoms in a protein |
| Angstroem | =0.1 nm |
| B-value or B-factor | 'Temperature'- or 'Debye-Waller'-factor that describes the degree to which the electron density in an X-ray image of a residue is dispersed"> |
| Bnorm | normalized B-value (eqn. 1for experimental and eqn. 2for predictions) | Dunker-disorder | residues that are not visible in the X-ray electron density map are defined as ÒdisorderedÓ regions by the Dunker group [1] |
| DSSP | automatic assignment of secondary structure and solvent accessibility from 3D coordinates [2] |
| NORS | long regions with no regular secondary structure [3, 4] |
| PDB | protein data bank of protein structures [5, 6] |
| PROFacc | profile-based neural network prediction of solvent accessibility [7, 8, 9] |
| PROFsec | profile-based neural network prediction of secondary structure [7, 8, 9] . |
Protein flexibility is related to function. Proteins are dynamic molecules that are in constant motion. The structural flexibility that enables this motion has been associated with various biological processes such as molecular recognition and catalytic activity [10, 11, 12, 13, 14, 1, 15, 16, 17, 18, 3, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28] . In fact, even such a course-grained aspect of protein structure as the secondary structure assigned from X-ray crystals of proteins captures flexibility relevant for protein function [29] .
Flexible regions can be predicted from sequence. In silico studies have attempted to characterize and predict flexible regions from the amino acid sequence. Different groups used different definitions for flexibility. On a very coarse-grained level, all regions with high net charge and low hydrophobicity were considered to be natively unfolded [30] . The rationale for this assumption is that repulsion from equal charge-charge interactions and the reduced Òfolding driving forceÓ in regions of low hydrophobicity account for flexibility. Dunker and his group introduced another radical approach that considers all regions with missing coordinates in X-ray structures as ÒdisorderedÓ, and applied neural networks to predict such regions [31, 1, 32] . Other groups have used to same definition to develop related methods to predict such ÒdisorderÓ [33, 34, 35] . Our group took a much simpler angle to identify long regions with no regular secondary structure (NORS), i.e. stretches of 70 or more sequence-consecutive residues depleted of helices and strands [3, 4] . Analyzing all proteins in entirely sequenced organisms, i.e. full proteomes, we found [3] that many more proteins have such regions in full proteomes than in the proteins of known three-dimensional (3D) structure deposited in todayÕs PDB [5, 6] , that eukaryotes have at least five times more such proteins than other organisms, and that NORS regions are over-represented in regulatory and promiscuously interacting proteins. Similar findings were reported based on the Dunker-disorder [36, 35] .
The Debye-Waller factor (B-value) measures local residue flexibility. B-values (also referred to as B-factors) reported in experimental atomic-resolution structures provide information about local mobility. They represent the decrease of intensity in diffraction due to both the dynamic disorder caused by the temperature-dependent vibration of the atoms and due to the static disorder, which is related to the orientation of the molecule [37] . The B-value is defined by 8¹2<u2> to the unidirectional mean-square displacement, u2, averaged over the lattice [38] . B-values of C-alpha atoms are commonly used to represent motion of the backbone [39, 40] . However, the experimentally determined B-value is not an absolute quantity; instead it depends on other factors such as the overall resolution of the structure, crystal contacts and importantly on the particular refinement procedures [41, 42] . B-values from different structures can therefore not be reasonably compared without some normalization [42, 43, 44] . Typically, the following normalization is applied [43] :
where s is the standard deviation and <B> is the average over the B-values in a given structure. Generally, data of higher resolution provides more reliable B-values. We mainly analyzed structures with resolutions below 2 Angstr¿m (0.2 nm); however, in order to significantly increase the size of our data set and to cover a larger fraction of sequence space we also included structures down to 2.5 Angstr¿m (0.25 nm). At this resolution limit, the correlation between resolution and the log of the mean diffraction intensity is close to linear and B-value assignment is still quite reliable [38] . Furthermore, our results were qualitatively rather similar for data sets with different cut-offs (1.5A, 2.0A, and 2.5A).
Flexibility informative about protein structure and function. Experimental data about B-values and predictions of flexibility were shown to be useful for predicting residues that cannot be crystallized [34, 45] . More importantly, packing density is inversely proportional to thermal motion [46] ; therefore, the prediction of flexibility may help to unravel protein function. While our manuscript is in reviewing, a recent study has shown that B-values can be predicted from sequence [47] . However, that study contained no direct evidence as to whether or not the functionally important residues were predicted correctly. In addition, Andersen et al. [29] suggested that secondary structure assignments can be continuous. For instance, not all helices are the same, some are 'fuzzy' and some are well defined. Therefore, combining flexibility and secondary structure prediction can be a useful tool for predicting these local structures.
Aims of this work. Firstly, we hoped that an analysis of B-values based on a recent comprehensive and unbiased data set of high-resolution structures might unravel correlations between sequence and flexibility that had previously been overlooked. Secondly, we wanted to develop a method that predicts normalized B-values as accurately as possible from sequence. Thirdly, we wanted to demonstrate that this method could be applied to solving two different, yet related problems, namely the prediction of conformational switches and related folds.
Data set. All proteins were taken from the PDB [6] . Sequence redundancy was reduced at HSSP-values [48, 49, 50] of 0 (corresponding to less than 22% pairwise sequence identity for long alignments). In other words, for no pair of proteins used for training and testing could we predict their structural similarity from sequence. Note that we also removed any pair between training and testing set that could be aligned at PSI-BLAST [51] E-values <10-3 according to our standard procedure of three automated iterations [52] . Technically, the largest sequence-unique subset was taken from the EVA server [53, 54] that maintains a sequence-unique subset of PDB that is updated every week. We included only structures with resolutions ≤0.25nm, i.e. better than 2.5A and normalized the B-values according to eqn. 1. Our final sequence-unique subset contained 1513 X-ray structures. We split this set into three sets: one used for training, i.e. for optimizing the bulk of all free parameters, one for validation, i.e. for choosing additional free parameters (such as the number of hidden units and the stop of training), and the third for testing. We then repeated this experiment three times such that each protein in our data set was used for testing exactly once. Note that all estimates for performance that we report is valid for the testing set, in particular, we never reported any values that had been subject to any optimization. We argued that no prediction method will ever be as accurate as experiments; therefore, we compiled a data set of 716 unique proteins each of which had more than one experimental structure, i.e. more than one answer for what the real B-values are. We considered the difference between these alternative experimental solutions to be the 'upper limit' of prediction.
Data set of enzymes. In order to study whether our B-value predictions were correlated somehow with active sites in enzymes, we included 69 high resolution (²2.5A) X-ray structures of apo-enzymes; the set was sequence-unique in the sense that no pair of enzymes had >25% pairwise percentage sequence identity. Additionally, these structures had R-factors below 0.2 and did not contain any disordered regions (except for the termini by 'Dunker-disorder'). The active sites of these enzymes were taken from the lines annotated with SITE at their corresponding PDB files. If there was no annotation of the active site, the information was obtained from a different structure of the same protein from the PDB. This data set had originally been used to establish the observation that active sites in enzymes are unexpectedly rigid [26] .
Data set of NMR order parameters. NMR spin relaxation spectroscopy experiments are widely used to detect and characterize internal motions in proteins [12, 55, 56, 57, 28] . Specifically, S2 is the square of the generalized order parameter that represents motions on the pico- to nano-second time scales. S2 values range from 0 to 1; lower values indicate larger amplitudes of internal motions. For this study, we obtained a very refined experimental data set of the order parameter values for Escherichia coli Ribonuclease H (RNase H) from a study done in the Palmer group [58] . Due to resonance overlap, internal motions and chemical exchange broadening (caused by slower motions), reliable data could not be collected for all residues; the final data set included order parameter values for 81 of the 149 residues in RNase H [58] .
Prediction method. A straightforward back-propagated feed-forward artificial neural-network, similar to networks used to predict secondary structure, solvent accessibility, nuclear localization and protein-protein interaction sites [59, 60, 7, 61, 62] , was used to predict residue flexibility. We sampled the Òspace of network parametersÓ by testing networks with 5-35 hidden nodes. The input nodes used both local and global information in the following way. Local information: for a window of w sequence-consecutive residues, we used the evolutionary profile (below) for each residue (w=9, i.e. 9 x 21 input units), the 3-state secondary structure predicted by PROFsec ( [9] ; w=5, i.e. 5 x 3 units), and the 2-state solvent accessibility predicted by PROFacc ( [9] ; w=5, i.e. 5 x 2 units). Global information: the entire protein was represented by its predicted secondary structure content (3 units), its predicted fraction of surface residues (2 units), and by its length (3 units). To facilitate ÒlearningÓ, a 10-state description that corresponds to different B-values was assigned to the output nodes. The 10-state description reduced the problem of identical samples with similar, but not identical, Bnorm values. The problem that we tried to address here was that the distribution of normalized B-values has no clear cut-off between 'flexible' and 'rigid'. Instead, most residues have normalized B-values somewhere between the two extremes, i.e. the distribution has a peak (Fig. 2). Deciding on a cut-off right inside of this peak raises the problem that against the physical reality the neural networks have to learn that one residue with a value b1 is flexible and another with a value b1+epsilon is rigid. In analogy to our work on predicting solvent accessibility [60] , we tried to reduce this problem by introducing 10 states by portioning the observed distribution with 10 identical intervals. Furthermore, we trained two different networks: one on all residues, the other trained on the subset of residues predicted a high reliability by PROFacc (cutoff =6 in Fig. 1).
|
Fig. 1 : Prediction system.
|
Evolutionary information. We obtained multiple sequence alignments by searching with PSI-BLAST [51] against all known sequences contained in SWISS-PROT [63] , TrEMBL [63] , and PDB [5, 6] . All hits below a PSI-BLAST E-value of 10-3 were subsequently filtered [49, 52] and included in the sequence profile. The profiles were further filtered to remove sequences that were extremely similar to each other by simply removing all proteins with levels of pairwise sequence identity >80% to any previously added sequence (this value was chosen by intuition, instead of by optimization). This was done in order to maximize the number of aligned sequences that are not nearly identical to the query sequence and therefore obtain evolutionary information from more distant sequences. Note that while we know that this level of sequence similarity implies that all proteins included in the profile have similar structures, the thresholds did not guarantee that all members of the sequence-structure family also had similar B-values.
Conversion of neural network output into values for predicted normalized B-values. The raw neural networks have two output values: One coding for flexible the other for rigid residues. Mostly, we evaluated the accuracy in the prediction of the extremes, i.e. flexible or rigid residues. However, our predictions contained information beyond this. In order to convert the raw output into values similar to the experimental normalized B-values (eqn. 1), we simply applied this formula to the difference between the two output units:
where out1 and out2 were the values for the raw network output for the units coding for flexible and rigid respectively; <Æ> was the average over all residue predictions in one protein and s was the standard deviation of the corresponding distribution of predicted Æ.
Evaluating the results. In order to simplify the prediction task from one of continuous values (normalized B-values range from -3.13 to 12.46) to a two-state problem (flexible/rigid), we defined a residue to be flexible according to the following thresholds: (i) Strict: If the normalized Bnorm value ³ 0.03 and (ii) flexible: if the normalized Bnorm ³-0.3. All residues in between these two extreme states were ignored for both training and testing. This training method turned out to be suboptimal, the best performance was obtained using balanced training [59] , i.e. the neural network ÔsawÕ the samples from the extremes about twice as often as the samples from the peak in the distribution (Fig. 2). Note that for testing we always used all values (residues at the extremes were predicted much more accurately). In the non-strict threshold most of the residues are flexible, therefore, if we find a rigid residue on the surface it is likely to have a functional role. On the other hand, in the strict mode only about a third of the residues are flexible, therefore, if we find a flexible stretch of residues this stretch might be functionally important. We evaluated our results by calculating the accuracy/specificity and coverage/sensitivity according to:
where TP (true positives) is the number of residues correctly predicted to be flexible, TN (true negatives) is the number of residues correctly predicted not to be flexible, FP (false positives) is the number of residues predicted to be flexible and observed to be rigid, and FN (false negatives) is the number of residues predicted to be rigid and observed to be flexible. We combined these two values through their harmonic mean:
Different amino acids have preferences for different B-values. Previous studies have shown that hydrophobic residues, which are usually buried, tend to be more rigid whereas charged residues tend to be more flexible [39, 40, 30, 45] . Here, we performed a large-scale analysis of residues with different Bnorm values. In order to prove the significance of our findings, we used the find-self test that is applicable to related entities such as proteins [64] . This test basically measures the degree of consistency between data sets with different labels (here: flexible, rigid, other). As expected, prolines are significantly over-represented in regions of high flexibility (Fig. 3), while cysteines are over-represented in non-flexible regions (supposedly due to forming rigid disulfide-bridges). Previous studies confirmed the intuition that glycines are flexible [40, 45] . In contrast, we find in more detail that glycines are over-represented in both extremes, i.e. very flexible and very rigid regions (Fig. 3). Supposedly, this preference is explained by hydrophobic glycines embedded into the protein cores. This is not unexpected because as the only amino acid without side chains glycines can adopt very unusual conformations that may be both rather flexible and rigid and that are often structurally important [65, 66] .
|
Fig. 2: Distribution of normalized B-values.
|
|
Fig. 3: Amino acid preferences of for rigid and flexible residues.
|
Secondary structure and Bnorm values correlate. It is well known that long loops tend to be more flexible than regular secondary structures such as helices and strands. Here, we verified this difference in a quantitative way on a large set of proteins (Fig. 4). Interestingly, residue mobility is correlated in a different way with beta-strand-related structures (states E and B in DSSP [2]) than alpha-helix-related structures (states H, G, and I in DSSP [2], Fig. 4). In particular the following observations were remarkable. First, strands were, on average, more rigid than helices: while both had similar percentages for B-values<-1 (most rigid, Fig. 4), only 20% of all residues are in strands, while about 30% are in helices, i.e. the same overall fraction for very rigid residues over-represented residues in strands. Second, the transition from rigid to flexible was very different for helix (smooth) and strand (almost sigmoid). In other words, very few strands are partially flexible. This may originate from the simple fact that strands are only half as long as helices, on average, i.e. breaking a few hydrogen bonds is statistically more likely to break strands than helices. It may also originate from the particular DSSP definition that we used: two overlapping helices that fulfill the minimum hydrogen bond requirements can overlap forming a longer helix that is missing hydrogen-bonds [2]. The correlation of irregular helices with normalized B-values might be different from the correlation of perfect helices. Finally, our observation might also be explained by the fact that strands are more often found in the protein core than helices.
|
Fig. 4: Correlation between secondary structure state and normalized B-values.
|
Upper boundary for predictions of extreme B-values flexibility: 77-83%. B-values are influenced by crystal contacts [41] , by the experimental resolution [67, 38] , by interactions with ligands [68] , by packing density [46] , and by intrinsic properties of the crystal that result from particular refinement methods [42] . Arguably then the upper boundary on methods that predict B-values is given by the ranges of differences in B-values between different experimental results for the structures of the same proteins. We compared 716 such pairs with resolutions better than 2.5A and found that they agreed for 77-83% of their residues in the assignment of extreme B-values (flexible/rigid). The interval for this upper boundary is explained by using different thresholds to decide which residues are flexible, in particular 0.03 yielded 77%, while -0.3 yielded 83%.
| Prediction method b | Non-strict a | Strict a | ||||
| ACC c | COV c | F d | ACC c | COV c | F d | |
| 2 experiments | 83.0+/-0.18 | 83.9+/-0.18 | 83.4+/-0.14 | 77.5+/-0.24 | 78.2+/-0.23 | 77.9+/-0.19 |
| PROFacc | 69.7+/-0.32 | 61.3+/-0.34 | 65.3+/-0.27 | 62.9+/-0.55 | 36.7+/-0.40 | 46.4+/-0.41 |
| sequence+1D | 70.0+/-0.18 | 66.4+/-0.19 | 68.1+/-0.15 | 63.1+/-0.32 | 37.5+/-0.24 | 47.0+/-0.25 |
| profile+1D | 70.0+/-0.17 | 73.3+/-0.17 | 71.5+/-0.14 | 63.0+/-0.25 | 45.7+/-0.18 | 52.9+/-0.20 |
| PROFbval | 70.1+/-0.18 | 73.9+/-0.17 | 71.9+/-0.14 | 63.1+/-0.25 | 46.2+/-0.18 | 53.3+/-0.20 |
a
Non-strict / strict refers to different thresholds in the classification of normalized B-values into the two classes flexible/rigid (arrows in Fig. 2)
b
Methods: 2 experiments marked the agreement between proteins for which B-factors were determined by more than one experiment (note: in this case we consider one of the experiments as the 'truth', the other as the prediction); PROFacc marked the success in simply considering all residues that are predicted as solvent accessible (by PROFacc [60] ) as flexible; sequence+1D marked a simple neural network that uses only single sequences and all the predicted 1D structure and sequence features described in methods, profile+1D marks a network that uses profiles instead of single sequence, otherwise as 'sequence+1D'; and PROFbval marked our final prediction system using two layers of networks and profiles (Methods).
c
Accuracy (ACC) and Coverage (COV) as defined in eqn. 1; Note that the +/- values mark the standard errors compiled over our data set
d
F-measure as defined in eqn. 3
Lower boundary provided by predicted solvent accessibility: 46%-65%. Trivially, solvent accessibility is correlated with flexibility [41, 43] , i.e. flexible residues are found more often on the surface than in the core of proteins. Surprisingly, no group that developed methods to predict flexibility compared their results to simple accessibility predictions. Given the correlation between accessibility and flexibility and the accuracy in predicting accessibility, we could consider predictions of solvent accessibility to constitute a lower threshold for predicting flexibility. We found that relative solvent accessibility predicted by PROFacc correlates rather well with normalized B-values (Table 1). When we used solvent accessibility prediction information only, we achieved almost 63% accuracy at over 37% coverage in the strict mode (corresponding to F~46%, Table 1 ). In fact, when analyzing the results from networks that explicitly used predicted solvent accessibility as input in detail, we found that these networks basically predicted all buried residues to be rigid, and all exposed residues to be flexible. Additionally, we found that extreme incorrect predictions for flexibility often resulted from mistakes in the accessibility predictions. We reduced the seriousness of the second problem by including the reliability of the accessibility prediction as explicit input units. We addressed the first problem by training a separate set of networks trained only on reliably predicted buried residues (Fig. 1). The rationale for this solution is that the factors that determine a buried residue to be flexible might differ from those that determine the flexibility of an exposed residue. For example, a buried charged residue usually will be in a salt bridge and adopt a rigid conformation while when exposed it forms contacts with the water and be more flexible.
Sequence + 1D structure improves marginally over accessibility-based predictions. The local features from sequence and predicted 1D structure that we found to be correlated with flexibility, together with global information improved the performance of our neural network-based prediction method by three percentage points (F-measure in non-strict mode, Table 1) over predictions that use only predicted solvent accessibility to predict flexibility. However, evaluated in the strict mode performance was almost identical.
Evolutionary information significantly improves prediction. The use of alignment profiles instead of raw sequence significantly improves predictions of secondary structure [59, 7, 9] , solvent accessibility [60] , and nuclear localization [61] . To improve our predictions we used profiles using alignments produced by PSI-BLAST [51] (Fig. 1). These profiles along with all other features significantly improved performance in both non-strict and strict modes (Table 1). To put these value into perspective: the upper boundary (different experiments) was F=83 and F=78 (non-strict, strict respectively), the lower boundary (prediction based on solvent accessibility was) F=65 and F=46. The difference between these two extremes was DELTA_F=18 (83-65, non-strict) and DELTA_F=32 (78-46, strict). On this scale our prediction method (PROFbval in Table 1) covered about seven percentage points.
Very high accuracy for most strongly predicted residues. The strength of our final prediction system (compiled as the difference between the two output units) correlated very well with the reliability of the prediction (Fig. 5). This feature will enable users to focus on more reliably predicted regions. For example, users can focus on the subset of all residues predicted at 90% accuracy; these corresponded to about 10% of all residues in the non-strict and to about 1% of all residues in the strict mode.
Correlation between observed and predicted normalized B-values. eqn. 2describes a simple way of transferring the output of our final network into Òreal-valuedÓ predictions for B-values. The Pearson correlation coefficient between these predicted and the experimentally observed normalized B-values over all the proteins in our test sets was 0.44. The best prediction method published by Yuan et al. was reported to reach a correlation of 0.53 [47] . On the same data set, our method reached an unusually high level of 0.50. This was still slightly below the method from Yuan et al. that, in contrast to our method, had been optimized on that data set and on the objective of optimizing the correlation coefficient rather than the binary accuracy (as our method had). How high a correlation above 0.4 is, is illustrated by the results presented in Fig. 7 .
Case study I: switch II region in Ras. Structure and function are closely linked, i.e. the 3D structure of a protein determines its function. However, for some biological functions the ability of a protein or a region within a protein not to adopt a rigid structure but instead, to be in motion is crucial [13, 14, 15, 16, 18, 19, 21] . Switch II regions in the Ras protein are known to be critical for the GTPase activity of Ras; this region is defined by 11 consecutive residues of the following sequence GQEEYSAMRDQ (residues 60 to 70 in SWISS-PROT file RASH_HUMAN). When bound to GTP the switch II region is rigid; upon GTP hydrolysis the switch II region becomes flexible. The key residue in this 9 residues segment is Gln61 which is the catalytic residue of the reaction [69, 70] . A recent study showed that the Òhyper-flexibilityÓ of this conserved residue is critical for the GTPase reaction [23] . Although the whole switch II region is exposed to solvent our prediction method singled out the crucial residue Gln61 and its three sequence neighbors to be the most flexible residues; this prediction is confirmed by experimental results (Fig. 6 A). We did not select Ras because our method worked for this protein; instead we chose it because it is a common oncogene that has been studied in great detail. In particular, the bio-chemical significance of the Ras flexibility has been proven not only by crystallographic B-values. Methods such as heteronuclear NMR [71] , time resolved crystallography [72] , molecular dynamics simulations [73, 74, 75, 76] , and the use of fluorescent GTP analogue [77] have shown how important the flexibility of the switch II region is for the function of Ras.
|
Fig. 6: Functional flexibility and rigidity. Colors: the more red, the higher the B-value (flexible); the more blue, the lower (rigid); the GTP analogue is colored in CPK. (A) PROFbval correctly identified the key residues in the switch II region of ras as highly flexible. (B) PROFbval also correctly predicted the residues in the tunnel of a beta-propeller fold to be rigid although these residues are surface loops and strands (structure from [89] ). |
Case study II: identification of functionally important rigid regions in propeller folds. Beta-propeller folds (Fig. 6 B) are characterized by 4-8 symmetrical repeats of four stranded anti-parallel and twisted beta-sheets that are arranged around a central 'tunnel' [78, 79] . The majority of these proteins use the tunnel or the entrance to it for the coordination of a ligand or as the site of catalytic activity. The structural rigidity of the propeller domain has been suggested as crucial for the function of these proteins [79] . A combination of repeat detection, secondary structure prediction, and fold recognition can be exploited to predict beta-propeller folds from sequence [78, 79] . When we applied our prediction method to propeller folds, we correctly detected the regions around the tunnel as rigid although this region is exposed (and correctly predicted as such). Again, we did not choose the propeller folds because our method worked on these - in fact, we only ever looked at the two cases of Ras and propeller folds in more detail; instead, this class of proteins is currently being under investigation in the group of our colleagues in the group of Wayne Hendrickson (Columbia). However, this choice is also rationalized by that a PubMed search with the keywords Òprotein structural rigidityÓ brought propeller folds up as the first relevant hit.
Case study III: active sites in enzymes predicted as more rigid. Over the past decades the number of enzyme structures in the PDB has increased significantly. These data resulted in several attempts at the characterization of residues that participate in the catalytic reactions, i.e. the active site residues [80, 81, 82, 83, 26] . Specifically, two groups showed that active site residues have, on average, lower normalized B-values than do non-active site residues and therefore, are more rigid [81, 26] . Yuan et al investigated the normalized B-value differences between active and non-active site residues in 69 sequence non-identical enzyme apo-structures. They confirmed that active site residues are less flexible than non-active site residues [26] . Using the same data set as Yuan et al (Table 2), we compared active/non-active site residues according to the observed normalized B-values (as pioneered by Yuan et al.) as well as by our predicted B-values (eqn. 2, Table 2). We found that the difference 'normalized B-values of active site - normalized B-values of non-active site residues' was similar for the observed and predicted B-values (-0.39 and -0.48 respectively). In other words, our method correctly predicted that the active site residues were considerably more rigid than all other residues. Interestingly, our method also correctly solved a more difficult task, namely the correct prediction that also the exposed active site residues are more rigid than the non-active site surface residues (Table 2).
| Structure subset c | Average normalized B-values a | Average predicted normalized B-values b | ||||
| All residues | REL³5% | REL³16% | All residues | PREL³5% | PREL³16% | |
| Active-site residues | -0.39 | -0.21 | -0.04 | -0.48 | 0.18 | 0.09 |
| Non-active site residues | 0.00 | 0.23 | 0.35 | 0.01 | 0.36 | 0.39 |
a
Normalized B-values were calculated using Eqn 1 and then averaged over the residues from all 69 enzyme structures used in a previous study [26] ; REL marked the relative solvent accessibility from X-ray structures (in percentage as compiled from DSSP [2] using the normalization with maximal values observed in isolation [60] )
b Predicted normalized B-values were calculated through Eqn 2; here, PREL marked the relative solvent accessibility predicted by PROFacc [60]
c
Data set of enzymes: active-site residues marked the residues from the lines annotated with SITE in the PDB files. If there was no annotation of the active site, the information was obtained from a different structure of identical protein from the PDB; non-active site residues marked all the residues that are not annotated on the SITE line
Case study IV: predicted B-values correlate with NMR order parameters. Solution NMR spectroscopy methods are commonly used to characterize the dynamic properties of proteins [56, 57, 28] . It has been previously shown that order parameter data obtained from NMR spin relaxation experiments is, to an extent, correlated with experimental B-value data [84, 85] . However, unlike B-values from X-ray structures, order parameter data is independent of crystal packing; it probes dynamics on time scales that are relevant for biological function [85] . Here, we compared (15)N nuclear magnetic spin relaxation order parameters of ribonuclease (RNase HI) to experimental normalized B-values taken from the apo X-ray structure (PDB identifier 2RN2 [86] ) and to the normalized output from our network of the same protein (eqn. 2Fig. 7). This example also illustrated more explicitly to what extent our predictions correlated with the experimental values - note RNase H was representative of the overall performance. The correlation between experimental and predicted B-values was higher than that between experimental B-values/order parameters and between predicted B-values/order parameters (Fig. 7). However, the correlations were clearly higher for the functionally most important residues. These include the segment of residues 11-23 that is known to bind a DNA/RNA hybrid and the three active site residues (D10, E48, D70) that bind an Mg2+ ions [86, 87] . Interestingly, the active site residues are harder to predict to be rigid due to the fact that they are partly accessible to solvent. In fact, when ranking all the residues that are predicted to be on the surface (³5%) and by the strength of our prediction of rigidity, E48 was ranked first. This was probably due to the fact that we used evolutionary conservation as an input to the neural network, since E48 is very conserved. Our prediction method ÔmisplacedÕ one peak, namely that between residues 120-130 that was shifted by five residues in our prediction. This was probably due to an incorrect prediction for accessibility: W118 and W120 were incorrectly predicted as completely buried (note that our prediction method used the predicted solvent accessibility of the residue and its neighbors as input).
|
Fig. 7: Predicted and observed flexibility in RNase H.
|
How well can we predict flexibility? Our final flexibility predictions achieved F values (eqn. 3) between 53% (strict) and 72% (non-strict). The addition of all the attributes that were found to be correlated with high B-values such as predicted solvent accessibility, secondary structure, protein length, the use of multiple sequence alignment as an input, together with our unique training method made the final method, PROFbval, become a rather complex new prediction method; the orchestration of all features was required to make the method become significantly more accurate than predictions based on predicted accessibility. The performance was clearly worse than the agreement between different experimental determinations of B-values. In the absence of comparable methods, we cannot conclude whether or not the performance of PROFbval will suffice to make the method become an important milestone on the way toward predicting protein function. However, our two case studies suggested that the method could contribute important information that was not available through other means. Combining our predictions with data, e.g. from NMR experiments could substantially increase the fraction of the sequence space covered, hence, the performance of the prediction system.
Recent studies have shown that protein flexibility and protein function are strongly linked [13, 14, 1, 88, 15, 16, 17, 18, 3, 19, 21, 24, 35] . Numerous proteins have regions that adopt different conformation under different conditions allowing them to take part in cellular and molecular regulation [3] . In this study we first showed that flexible residues (i.e. residues with high normalized B-values) differ from regular and rigid residues in local features such as secondary structure, solvent accessibility and amino acid preferences. For our analysis of B-values we used the largest, unbiased dataset that had so far been explored. Possibly due to this representative data set, the only results that could be phrased in form of simple rules - such as glycines populate the extremes of very high and very low B-values - were not very surprising. Interestingly, we showed that local sequence features alone did not suffice to develop a good prediction method. Instead, global features, and evolutionary profiles significantly improved performance. We improved further by adding another neural network, specialized for reliably predicted buried residues. Lastly, we presented two applications of this method, namely to the flexible switch II region in Ras and to the rigid region in propeller folds. Both, the flexible region in Ras and the rigid region in propeller folds are indicative of function and were correctly predicted by our method. Lacking a large data set of such examples, we cannot guarantee that these two case studies are fully representative. However, the two cases were not chosen because Òthey workedÓ but because of the readily available experimental data for both. Therefore, we hypothesized that flexibility/rigidity prediction method, together with other methods can serve both as a tool to identify functional residues in protein and identify specific folds. Our results for a large set of enzymes and for the order parameter measurements of RNAse H added evidence to strengthen this hypothesis.
Thanks to Jinfeng Liu and Megan Restuccia (both Columbia) for computer assistance; to Sven Mika, Andrew Kernytsky and Dariusz Przybylski (all Columbia) for providing preliminary information and programs; particular thanks to Mickey Kosloff, Marco Punta and Yanay Ofran (all Columbia) for very valuable suggestions. Particular thanks to Art Palmer and Joel Butterwick (Columbia) for providing the data set of order parameters for RNase H. Thanks to both anonymous reviewers for their helpful suggestions. The work was supported by the grants RO1-GM63029-01 from the National Institutes of Health (NIH) and R01-LM07329-01 from the National Library of Medicine (NLM). Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases.
| 1. | Dunker, A. K. & Obradovic,Z. (2001). The protein trinity-linking function and disorder. Nat.Biotechnol., 19, 805-806. |
| 2. | Kabsch, W. & Sander, C.(1983). Dictionary of protein secondary structure: pattern recognition ofhydrogen-bonded and geometrical features. Biopolymers, 12, 2577-2637. |
| 3. | Liu, J., Tan, H. & Rost,B. (2002). Loopy proteins appear conserved in evolution. J. Mol. Biol., 322, 53-64. |
| 4. | Liu, J. & Rost, B. (2003).NORSp: predictions of long regions without regular secondary structure. Nucl.Acids Res., 31, 3833-3835. |
| 5. | Bernstein, F. C., Koetzle, T.F., Williams, G. J. B., Meyer, E. F., Brice, M. D. et al. (1977). The ProteinDataBank: a computer based archival file for macromolecular structures. J.Mol. Biol., 112, 535-542. |
| 6. | Berman, H. M., Battistuz, T.,Bhat, T. N., Bluhm, W. F., Bourne, P. E. et al. (2002). The Protein Data Bank. ActaCrystallogr D Biol Crystallogr, 58, 899-907. |
| 7. | Rost, B. (1996). PHD:predicting one-dimensional protein structure by profile based neural networks. Meth.Enzymol., 266, 525-539. |
| 8. | Rost, B. (2001). Proteinsecondary structure prediction continues to rise. J. Struct. Biol., 134, 204-218. |
| 9. | Rost, B. (2005). How to useprotein 1D structure predicted by PROFphd. In The Proteomics Protocols Handbook(Walker, J. E., eds.), pp. 875-901, Humana, Totowa NJ. |
| 10. | Tainer, J. A., Getzoff, E.D., Alexander, H., Houghten, R. A., Olson, A. J. et al. (1984). The reactivityof anti-peptide antibodies is a function of the atomic mobility of sites in aprotein. Nature, 312, 127-134. |
| 11. | Carr, P. A., Erickson, H. P.& Palmer, A. G., 3rd (1997). Backbone dynamics of homologous fibronectintype III cell adhesion domains from fibronectin and tenascin. Structure, 5, 949-59. |
| 12. | Akke, M., Liu, J., Cavanagh,J., Erickson, H. P. & Palmer, A. G., 3rd (1998). Pervasive conformationalfluctuations on microsecond time scales in a fibronectin type III domain. Nat.Struct. Biol., 5, 55-59. |
| 13. | Wright, P. E. & Dyson, H.J. (1999). Intrinsically unstructured proteins: re-assessing the proteinstructure-function paradigm. J. Mol. Biol., 293, 321-331. |
| 14. | Demchenko, A. P. (2001).Recognition between flexible protein molecules: induced and assisted folding. JMol Recognit, 14, 42-61. |
| 15. | Dunker, A. K., Brown, C. J.,Lawson, J. D., Iakoucheva, L. M. & Obradovic, Z. (2002). Intrinsic disorderand protein function. Biochem., 41, 6573-82. |
| 16. | Dunker, A. K., Brown, C. J.& Obradovic, Z. (2002). Identification and functions of usefully disorderedproteins. Adv Protein Chem, 62, 25-49. |
| 17. | Dyson, H. J. & Wright, P.E. (2002). Coupling of folding and binding for unstructured proteins. Curr.Opin. Str. Biol., 12, 54-60. |
| 18. | Iakoucheva, L. M., Brown, C.J., Lawson, J. D., Obradovic, Z. & Dunker, A. K. (2002). Intrinsic disorderin cell-signaling and cancer-associated proteins. J. Mol. Biol., 323, 573-84. |
| 19. | Tompa, P. (2002).Intrinsically unstructured proteins. TIBS,27, 527-33. |
| 20. | Uversky, V. N. (2002). Whatdoes it mean to be natively unfolded? Eur. J. Biochem., 269, 2-12. |
| 21. | Uversky, V. N. (2002).Natively unfolded proteins: a point where biology waits for physics. Prot.Sci., 11, 739-56. |
| 22. | Daniel, R. M., Dunn, R. V.,Finney, J. L. & Smith, J. C. (2003). The role of dynamics in enzymeactivity. Annu. Rev. Biophys. Biomol. Struct., 32, 69-92. |
| 23. | Kosloff, M. & Selinger,Z. (2003). GTPase catalysis by Ras and other G-proteins: Insights fromsubstrate directed superimposition. J. Mol. Biol., 331, 1157-1170. |
| 24. | Teague, S. J. (2003).Implications of protein flexibility for drug discovery. Nat Rev Drug Discov, 2, 527-41. |
| 25. | Uversky, V. N. (2003).Protein folding revisited. A polypeptide chain at thefolding-misfolding-nonfolding cross-roads: which way to go? Cell Mol LifeSci, 60, 1852-71. |
| 26. | Yuan, Z., Zhao, J., Wang,Z.X. (2003). Flexibility analysis of enzyme active sites by crystallographictemperature factors. Protein Eng.,16, 109-114. |
| 27. | Dyson, H. J. & Wright, P.E. (2004). Unfolded proteins and protein folding studied by NMR. Chem Rev, 104, 3607-22. |
| 28. | Dyson, H. J. & Wright, P.E. (2005). Intrinsically unstructured proteins and their functions. Nat RevMol Cell Biol, 6, 197-208. |
| 29. | Andersen, C. A. F., Palmer,A. G., Brunak, S. & Rost, B. (2002). Continuum secondary structure capturesprotein flexibility. Structure, 10, 175-185. |
| 30. | Uversky, V. N., Gillespie, J.R. & Fink, A. L. (2000). Why are "natively unfolded" proteinsunstructured under physiologic conditions? Proteins, 41, 415-427. |
| 31. | Romero, P., Obradovic, Z.& Dunker, A. K. (1999). Folding minimal sequences: the lower bound forsequence complexity of globular proteins. FEBS Lett., 462, 363-367. |
| 32. | Obradovic, Z., Peng, K.,Vucetic, S., Radivojac, P., Brown, C. J. et al. (2003). Predicting intrinsicdisorder from amino acid sequence. Proteins, 53, 566-572. |
| 33. | Jones, D. T. & Ward, J.J. (2003). Prediction of disordered regions in proteins from position specificscore matrices. Proteins, 53, 573-578. |
| 34. | Linding, R., Jensen, L. J., Diella,F., Bork, P., Gibson, T. J. et al. (2003). Protein disorder prediction:implications for structural proteomics. Structure, 11, 1453-1459. |
| 35. | Ward, J. J., Sodhi, J. S.,McGuffin, L. J., Buxton, B. F. & Jones, D. T. (2004). Prediction andfunctional analysis of native disorder in proteins from the three kingdoms oflife. J. Mol. Biol., 337, 635-645. |
| 36. | Romero, P., Obradovic, Z.,Kissinger, C., Villafranca, J. E., Garner, E. et al. (1998). Thousands ofproteins likely to have long disordered regions. Pac. Symp. Biocomput., 3, 437-448. |
| 37. | Creighton, T. (1993).Proteins: structures and molecular properties. W. H. Freeman, New York. |
| 38. | Blow, D. (2002). Outline ofCrystallography for biologists. Oxford University Press, New York. |
| 39. | Karplus, P. A. & Schultz,G. E. (1985). Prediction of chain flexibility of peptide antigens. Naturwissenchaften, 72, 212-213. |
| 40. | Vihinen, M., Torkkila, E.& Riikonen, P. (1994). Accuracy of protein flexibility predictions. Proteins, 19, 141-149. |
| 41. | Sheriff, S., Hendrickson, W.A., Stenkamp, R. E., Sieker, L. C. & Jensen, L. H. (1985). Influence ofsolvent accessibility and intermolecular contacts on atomic mobilities inhemerythrins. Proc. Natl. Acad. Sci. U.S.A., 82, 1104-1107. |
| 42. | Tronrud, D. E. (1996).Knowledge-based B-factor restraints for the refinement of proteins. J. Appl.Cryst., 29, 100-104. |
| 43. | Carugo, O. & Argos, P.(1997). Correlation between side chain mobility and conformation in proteinstructures. Prot. Engin., 10, 777-787. |
| 44. | Smith, D. K., Radivojac, P.,Obradovic, Z., Dunker, A. K. & Zhu, G. (2003). Improved amino acidflexibility parameters. Prot. Sci.,12, 1060-1072. |
| 45. | Radivojac, P., Obradovic, Z.,Smith, D. K., Zhu, G., Vucetic, S. et al. (2004). Protein flexibility andintrinsic disorder. Prot. Sci., 13, 71-80. |
| 46. | Halle, B. (2002). Flexibilityand packing in proteins. Proc. Natl. Acad. Sci. U.S.A., 99, 1274-1279. |
| 47. | Yuan, Z., Bailey, T. L. &Teasdale, R. D. (2005). Prediction of protein B-factor profiles. Proteins, 58, 905-12. |
| 48. | Sander, C. & Schneider,R. (1991). Database of homology-derived structures and the structural meaningof sequencealignment. Proteins, 9, 56-68. |
| 49. | Rost, B. (1999). Twilightzone of protein sequence alignments. Prot. Engin., 12, 85-94. |
| 50. | Mika, S. & Rost, B.(2003). UniqueProt: creating representative protein sequence sets. Nucl.Acids Res., 31, 3789-3791. |
| 51. | Altschul, S. F., Madden, T.L., Schaeffer, A. A., Zhang, J., Zhang, Z. et al. (1997). Gapped BLAST andPSI-BLAST: a new generation of protein database search programs. Nucl. AcidsRes., 25, 3389-33402. |
| 52. | Przybylski, D. & Rost, B.(2002). Alignments grow, secondary structure prediction improves. Proteins, 46, 195-205. |
| 53. | Eyrich, V., Mart’-Renom, M.A., Przybylski, D., Fiser, A., Pazos, F. et al. (2001). EVA: continuousautomatic evaluation of protein structure prediction servers. Bioinformatics, 17, 1242-1243. |
| 54. | Koh, I. Y. Y., Eyrich, V. A.,Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S. et al. (2003). EVA:evaluation of protein structure prediction servers. Nucl. Acids Res., 31, 3311-3315. |
| 55. | Palmer, A. G., 3rd (2001).Nmr probes of molecular dynamics: overview and comparison with othertechniques. Annu. Rev. Biophys. Biomol. Struct., 30, 129-55. |
| 56. | Palmer, A. G., 3rd, Kroenke,C. D. & Loria, J. P. (2001). Nuclear magnetic resonance methods forquantifying microsecond-to-millisecond motions in biological macromolecules. MethodsEnzymol, 339, 204-38. |
| 57. | Palmer, A. G., 3rd (2004).NMR characterization of the dynamics of biomacromolecules. Chem Rev, 104, 3623-40. |
| 58. | Kroenke, C. D., Rance, M.& Palmer, A. G., 3rd (1999). Variability of the 15N Chemical ShiftAnisotropy in Escherichia coli Ribonuclease H in Solution. J. Am. Chem. Soc., 121, 10119-10125. |
| 59. | Rost, B. & Sander, C.(1993). Prediction of protein secondary structure at better than 70% accuracy. J.Mol. Biol., 232, 584-599. |
| 60. | Rost, B. (1994). Conservationand prediction of solvent accessibility in protein families. Proteins, 20, 216-226. |
| 61. | Nair, R., and Rost, B.(2003). Better prediction of sub-cellular localization through evolution andstructure. Proteins, 53, 917-930. |
| 62. | Ofran, Y. & Rost, B.(2003). Predicted protein-protein interaction sites from local sequenceinformation. FEBS Lett., 544, 236-239. |
| 63. | Bairoch, A. & Apweiler,R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBLin 2000. Nucl. Acids Res., 28, 45-48. |
| 64. | Ofran, Y. & Rost, B.(2003). Analysing Six Types of ProteinÐProtein Interfaces. J. Mol. Biol., 325, 377-387. |
| 65. | BrŠndŽn, C. & Tooze, J.(1991). Introduction to Protein Structure. Garland Publ., New York, London. |
| 66. | Lesk, A. M. (2004).Introduction to protein architecture - The structural biology of proteins. OUP,Oxford. |
| 67. | Drenth, J. (1999). Principlesof protein X-ray crystallography. Springer-Verlag, New York. |
| 68. | Carugo, O. & Argos, P.(1998). Accessibility to internal cavities and ligand binding sites monitoredby protein crystallographic thermal factors. Proteins, 31, 201-213. |
| 69. | Sprang, S. R. (1997). Gproteins, effectors and GAPs: structure and mechanism. Curr. Opin. Str.Biol., 7, 849-856. |
| 70. | Vetter, I. R. &Wittinghofer, A. (2001). The Guanine Nucleotide-Binding Switch in ThreeDimensions. Science, 294, 1299-1304. |
| 71. | Ito, Y., Yamasaki, K.,Iwahara, J., Terada, T., Kamiya, A. et al. (1997). Regional polysterism in theGTP-bound form of the human c-Ha-Ras protein. Biochem., 36, 9109-19. |
| 72. | Schlichting, I., Almo, S. C.,Rapp, G., Wilson, K., Petratos, K. et al. (1990). Time-resolved X-raycrystallographic study of the conformational change in Ha-Ras p21 protein onGTP hydrolysis. Nature, 345, 309-15. |
| 73. | Diaz, J. F., Wroblowski, B.& Engelborghs, Y. (1995). Molecular dynamics simulation of the solutionstructures of Ha-ras-p21 GDP and GTP complexes: flexibility, possible hinges,and levers of the conformational transition. Biochem., 34, 12038-12047. |
| 74. | Ma, J. & Karplus, M.(1997). Ligand-induced conformational changes in ras p21: a normal mode andenergy minimization analysis. J. Mol. Biol., 274, 114-131. |
| 75. | Farrar, C. T., Ma, J.,Singel, D. J. & Halkides, C. J. (2000). Structural changes induced inp21Ras upon GAP-334 complexation as probed by ESEEM spectroscopy and molecular-dynamicssimulation. Structure, 8, 1279-87. |
| 76. | Kosztin, I., Bruinsma, R.,O'Lague, P. & Schulten, K. (2002). Mechanical force generation by Gproteins. Proc. Natl. Acad. Sci. U.S.A.,99, 3575-80. |
| 77. | Moore, K. J., Webb, M. R.& Eccleston, J. F. (1993). Mechanism of GTP hydrolysis by p21N-rascatalyzed by GAP: studies with a fluorescent GTP analogue. Biochem., 32, 7451-9. |
| 78. | Springer, T. A. (1997).Folding of the N-terminal, ligand-binding region of integrin a-subunitsinto a b-propeller domain. Proc. Natl. Acad. Sci. U.S.A., 94, 65-72. |
| 79. | Fulop, V., Jones, D.T.(1999). Beta propellers: structural rigidity and functional diversity. Curr.Opin. Str. Biol., 9, 715-721. |
| 80. | Zvelebil, M. J. &Sternberg, M. J. (1988). Analysis and prediction of the location of catalyticresidues in enzymes. Prot. Engin., 2, 127-38. |
| 81. | Bartlett, G. J., Porter, C.T., Borkakoti, N. & Thornton, J. M. (2002). Analysis of catalytic residuesin enzyme active sites. J. Mol. Biol.,324, 105-21. |
| 82. | Todd, A. E., Orengo, C. A.& Thornton, J. M. (2002). Sequence and structural differences betweenenzyme and nonenzyme homologs. Structure,10, 1435-51. |
| 83. | Todd, A. E., Orengo, C. A.& Thornton, J. M. (2002). Plasticity of enzyme active sites. TIBS, 27, 419-26. |
| 84. | Haliloglu, T. & Bahar, I.(1999). Structure-based analysis of protein dynamics: comparison of theoreticalresults for hen lysozyme with X-ray diffraction and NMR relaxation data. Proteins, 37, 654-67. |
| 85. | Wang, C., Karpowich, N.,Hunt, J. F., Rance, M. & Palmer, A. G. (2004). Dynamics of ATP-bindingcassette contribute to allosteric control, nucleotide binding and energytransduction in ABC transporters. J. Mol. Biol., 342, 525-37. |
| 86. | Katayanagi, K., Miyagawa, M.,Matsushima, M., Ishikawa, M., Kanaya, S. et al. (1992). Structural details ofribonuclease H from Escherichia coli as refined to an atomic resolution. J.Mol. Biol., 223, 1029-52. |
| 87. | Goedken, E. R., Keck, J. L.,Berger, J. M. & Marqusee, S. (2000). Divalent metal cofactor binding in thekinetic folding trajectory of Escherichia coli ribonuclease HI. Prot. Sci., 9, 1914-21. |
| 88. | Namba, K. (2001). Roles ofpartly unfolded conformations in macromolecular self-assembly. Genes Cells, 6, 1-12. |
| 89. | Beisel, H. G., Kawabata, S.,Iwanaga, S., Huber, R. & Bode, W. (1999). Tachylectin-2: crystal structureof a specific GlcNAc/GalNAc-binding lectin involved in the innate immunity hostdefense of the Japanese horseshoe crab Tachypleus tridentatus. EMBO J., 18, 2313-2322. |
| Contact: cubic@cubic.bioc.columbia.edu | Version: Mar 5, 2005 |