bottom - CUBIC-papers - CUBIC

Title: Long membrane helices and short loops predicted less accurately
Author:Chien Peter Chen & Burkhard Rost
Quote: Protein Science, 2002, 11, 2766-73

Long membrane helices and short loops predicted less accurately

Chien Peter Chen 1 & Burkhard Rost 1, 2, 3,*

1 CUBIC, Dept. of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
2 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA
3 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
* Corresponding author:  email = rost@columbia.edu URL http://cubic.bioc.columbia.edu/  Tel: +1-212-305-3773, fax: +1-212-305-7932

 

This article is published in (Protein Science, issue, 2002 and pages) © copyright Cold Spring Harbor Laboratory Press (2002). CSHL Press is the only authorised source. All copying of this article including placing on another website requires the written permission of the copyright owner.

Abstract

Low-resolution experiments suggest that most membrane helices span over 17-25 residues and that most loops between two helices are longer than 15 residues. Both constraints have been used explicitly in the development of prediction methods. Here, we compared the largest possible sequence-unique data sets from high- and low-resolution experiments. For the high-resolution data, we found that only half of the helices fall into the expected length interval and that half of the loops were shorter than ten residues. We compared the accuracy of detecting short loops and long helices for 28 advanced and simple prediction methods: All methods predicted short loops less accurately than longer ones. In particular, loops shorter than seven residues appeared to be very difficult to detect by current methods. Similarly, all methods tended to be more accurate for longer than for shorter helices. However, helices with more than 32 residues were predicted less accurately than all other helices. Our findings may suggest particular strategies for improving prediction of membrane helices.

 

Key words: membrane proteins; protein structure prediction; predicting transmembrane helices; bioinformatics.

Abbreviations used

; 3Dthree-dimensional
DSSPprogram assigning secondary structure [1]
HMMHidden Markov Model
PDBProtein Data Bank of experimentally determined 3D structures of proteins [2, 3]
SWISS-PROTdata base of protein sequences [4]
TMtransmembrane
TMHtransmembrane helix.
 
Terminology used:
 
 
advanced prediction methodsall methods that do not exclusively use a hydrophobicity scale
simple prediction methodsmembrane prediction methods exclusively based on hydrophobicity scales.
loopthroughout this paper we used the term loop to refer to the region that connects two transmembrane helices in sequence. Note that such loops could consist of entire structural domains.

 

Introduction

Predictions of membrane helices relatively successful. Despite the great biological and medical importance of helical membrane proteins, we still know few three-dimensional (3D) structures. Fortunately, bioinformatics can contribute substantially to bridging the gap between what we do and what we want to know by predicting membrane helices. In fact, predicting the locations of transmembrane helices appears to be a simpler problem than predicting globular helices [5, 6] . Nevertheless, while some estimated the levels of accuracy to reach an incredibly high value of 99% [7] , recent re-evaluations of many prediction methods [8, 9, 10] somewhat damped this optimism by concluding that only the very best advanced methods predict all membrane helices correctly for over 70% of all proteins, and that simple hydrophobicity scale-based methods tend to be about 20 percentage points less accurate.

Distribution of membrane helix length crucial parameter for prediction. Prediction methods typically explore that transmembrane (TM) helices are predominantly apolar and believed to be between 17 and 25 residues long [11] . The upper- and lower- bounds for the length of membrane helices are explicitly used by most prediction methods in two ways. (1) Some methods identify only hydrophobic regions as membrane helices that fall into the typical length interval [12, 13, 14, 15, 8, 7] . (2) Other methods search the best path through some predicted membrane helix propensity landscape that is compatible with such upper and lower bounds [16, 17, 18, 19, 20] . James Bowie found that the length distribution of three high-resolution structures was shifted toward longer helices [21] .

Here, we re-evaluated the distribution of the length of transmembrane helices and that of the loops in between helices based on significantly larger data set than previously used [21] . Then, we analysed 28 prediction methods in terms of their performance on short loops and long membrane helices.

 

 

Results and Discussion


Many long helices and short loops observed in high-resolution structures

Many membrane helices longer than 32 residues! Half of all membrane helices annotated by low-resolution experiments were 20-24 residues long, while only about one fourth of the high-resolution helices fall into this length interval ( Fig. 1 inset). Helices from 17-27 residues accounted for less than half of the high-resolution and for 93% of the low-resolution data. The distribution of lengths was clearly shifted toward longer helices in the high-resolution data ( Fig. 1 ). In particular, 12 high-resolution helices (9%) were longer than 34 residues, i.e. fall outside the range of what the low-resolution experiments suggested as possible lengths for membrane helices. The following four proteins had the longest helices: (1) the cytochrome BC complex (1BGY:D 34 residues, 1BGY:G 43 residues [22] ), (2) the calcium ATPase (1EUL:A, TMH 6, 39 residues [23] ), (3) the cytochrome C oxidase (2OCC:I, TMH 1, 39 residues [24] ), and (4) the fumarate reductase (1FUM:C, TMH 2, 38 residues [25] ). Typically, the long helices were either slightly bent (1BGY:D, 1fum:C) or extended into globular domains (1eul:A, Fig. 2 ). Overall, the recent high-resolution data appeared to strongly challenge the assumption of many developers of prediction methods, namely that the vast majority of membrane helices are 17-25 residues long. This incorrect assumption has been implemented as a more or less rigid constraint into most existing prediction methods. In fact, to implement such a constraint is important since many regions in membrane proteins consist of over 40-60 consecutive hydrophobic residues that usually form more than one membrane helix. These long helices have to be 'dissected' by prediction methods, not the least to accurately predict topology. Thus, the unexpected reality observed in high-resolution structures ( Fig. 1 and Fig. 2 ) complicates the prediction task.



Fig. 1
fig1.gif

Fig. 1. : Length distributions for membrane helices. The lengths of the membrane helices were assigned using DSSP for the high-resolution data (36 unique chains; 131 helices), and using the annotation in SWISS-PROT for the low-resolution data (165 unique proteins; 339 helices). The inset gives the cumulative percentages of helices. About half the high-resolution helices (47%) are 17-27 residues long, while 93% of the low-resolution helices fall into this interval (grey line). Half of the low-resolution helices (50%) are 20-24 residues long, while 25% of the high-resolution helices fall into this interval (dashed line).







Fig. 2
fig2.gif

Fig. 2. : Long membrane helices in high-resolution structures. The plots were generated using the RASMOL program [63] . All transmembrane helices shown in black extend over more than 38 residues.





High-resolution structures revealed considerable proportion of short loops. Monne, von Heijne and co-workers experimentally established the propensities of amino acids to form tight turns (loops) between membrane helices [26, 27, 28] . They find that the charged and polar amino acids DEQNRK as well as the flexible P and G have the highest preferences to form tight turns. However, in their data set the authors found only very few proteins with loops shorter than 7 residues. Plotting the length distribution of loops, we noticed two important results ( Fig. 3 ). (1) Low-resolution experiments tended to suggest significantly longer loops than high-resolution structures. (2) About half of all loops in high-resolution structures were 10 residues or shorter and over 20% of the high-resolution loops were ≤ 5 residues long. Obviously, we cannot expect that the 36 sequence-unique high-resolution chains used in our study (Methods) are fully representative for all helical membrane proteins. Given that we predict about 20,000 helical membrane proteins in the five entirely sequenced eukaryotes alone [29, 30] , we also doubt that the 165 sequence-unique low-resolution proteins (Methods) are more representative. Clearly, the high-resolution data is more accurate than the low-resolution data. Thus, our data suggested that a considerable percentage of all loops between membrane helices are very short.



Fig. 3
fig3.gif

Fig. 3. : Length distributions for loops between two membrane helices. The lower graph gives the percentage of all loops between two membrane helices that have N (shown 0-25) residues; the upper graph shows the cumulative data, e.g., 65% of all loops in high-resolution structures (black lines with solid triangles) are ≤ 15 residues long, while 65% of the loops in low-resolution experiments are ≤25 residues. Significantly more short loops are observed in the high- than in the low-resolution data: while about 40% of the high-resolution loops were shorter than 9 residues, only half as many loops in the low-resolution set were that short.





 


Long helices and short loops challenge prediction methods

Short loops predicted at lower accuracy. As discussed above, about half of all loops connecting two membrane helices are shorter than 10 residues. Most prediction methods compile averages over windows of 13-25 consecutive residues. Thus, the signal from the flanking helices may override that for a short loop. If so, we expect short loops to be predicted less accurately. Our data clearly confirmed this suspicion: shorter loops are predicted by all methods less accurately than long loops ( Fig. 4 Table 1 ). The low-resolution data suggested that prediction accuracy dropped significantly for loops shorter than 10 residues, while the high-resolution data suggested the significant decrease to occur for loops shorter than 7 residues ( Fig. 4 ). For example, while about 90% of the loops longer than 15 residues were correctly detected by the advanced prediction methods, less than 60% of the loops ≤ 5 residues were identified ( Fig. 4 left graph). These data suggested to explicitly embed loop preferences such as the ones derived by the von Heijne group [26, 27, 28] into methods that predict membrane helices.



Fig. 4
fig4.gif

Fig. 4. : Accuracy in predicting short loops. Left: percentage of loops with N (0-30) residues that were correctly predicted by all advanced prediction methods; the bars indicate the error estimates for these values. Note that the high-resolution data was too small to display non-cumulative distributions. Right: difference in prediction accuracy between loops shorter and longer than the respective loop length (Eq. 1). All values were negative implying that longer loops were always predicted at higher accuracy than shorter ones.


 

Table 1: Performance for short loops +

Method

High-resolution data

Low-resolution data

 

Qok

Qloop

Qok

Qloop

ERROR

± 18

± 17

± 9

± 8

 

 

 

 

 

DAS

88

93

76

83

TopPred2

61

60

40

43

TMHMM1

61

55

24

38

PRED-TMR

50

49

45

53

SOSUI

49

68

26

38

PHDpsiHtm08

38

44

13

19

PHDhtm08

32

33

14

18

HMMTOP2

30

40

40

48

PHDhtm07

28

33

13

19

 

 

 

 

 

Wolfenden

91

90

54

72

Ben-Tal

48

52

46

63

KD

29

38

18

26

WW

28

44

26

34

GES

21

23

26

33

Sweet

21

22

21

30

A-Cid

20

28

9

16

Bull-Breese

20

21

19

23

Lawson

18

20

12

17

EM

12

11

16

25

Eisenberg

10

17

24

27

Nakashima

10

11

24

33

Heijne

10

11

16

24

Roseman

10

11

17

28

Levitt

10

11

13

18

Radzicka

10

10

11

20

Fauchere

9

10

14

23

Av-Cid

0

11

12

20

Hopp-Woods

0

0

15

23

 

 





Prediction accuracy depended on helix length. When we correlated prediction accuracy to the length of the observed transmembrane helix, we observed three overall trends ( Fig. 5 ). (1) For any chosen threshold in the number of residues N with N≤ 32 residues used to group membrane helices into short and long, helices shorter than N were predicted less accurately than were helices longer than N. (2) The trend was inverted for helices longer than 32 residues (only available for high-resolution data). These very long helices were predicted less accurately than all other helices. (3) Helices shorter than 17-20 residues posed an even stronger challenge to prediction methods than shorter helices did in general (significant drop of accuracy in Fig. 5 ). At first sight, the drop in prediction accuracy for helices longer than 32 residues may appear irrelevant in context of predicting membrane helical proteins for entire proteomes [31, 18, 32, 33, 34, 35, 36, 19, 29] . However, if we can generalise from the currently known high-resolution structures, we expect that almost 20% of all membrane helices are longer than 32 residues ( Fig. 1 ). For the five entirely sequenced eukaryotic proteomes, this translates to about 5,000 proteins with a helix longer than 32 residues [29, 30] .



Fig. 5
fig5.gif

Fig. 5. : Accuracy in predicting long membrane helices. Given is the difference in prediction accuracy between membrane helices shorter and longer than N residues (Eq. 2). Negative values imply that short helices were predicted less accurately than longer ones.





 


Detailed analysis of the mistakes in predicting long helices

Advanced methods miss long helices, simple methods incorrectly split these. In order to explore why membrane protein prediction methods have trouble with long helices, we visually classified the predictions for long helices (≥ 33 residues) as being (1) correct, (2) incorrectly cut into two membrane helices, and (3) not predicted at all. Advanced methods are correct at predicting these long helices at an accuracy of over 90%. However, when these methods fail, it is about three times more likely to not predict a helix at all than it is to incorrectly predict the helix as two shorter helices. This may suggest that advanced methods do mostly distinguish correctly between the membrane and the non-membrane parts of long helices. Simple hydrophobicity-based methods identified only about 71% of the long helices correctly. In contrast to the advanced methods, the errors of simple methods had a six times higher rate of incorrectly splitting long helices than they had in missing the helix. This may suggest that the difficulty of simple methods with long helices is primarily due to over-predicting helical regions. This is supported by the fact that simple methods have great sensitivity but poor specificity at detecting transmembrane helices (Chen et al., 2002). In contrast, advanced methods have better specificity at detecting a transmembrane helix as is indicated by their high accuracy of predicting even long helices. However, the price for being highly specific is that they can miss some transmembrane helices.



Fig. 6
fig6.gif

Fig. 6. : Visual inspection of errors for long membrane helices. Prediction methods incorrectly split long helices about 17% and miss them 5% of the time. When advanced methods predict long helices incorrectly, it is about three times more likely to not predict a helix at all than to split it. In contrast, simple hydrophobic methods are six times more likely to incorrectly predict a long helix as two shorter ones than to not predict the helix at all.




 

Visual inspections of a few cases suggests to combine advanced and simple methods. If one examines the cytochrome BC complex, one of the helices (residues 197-231 of 1BGY:D) was not predicted by half of the advanced methods. One explanation for the difficulty in detecting this helix is that there is no consecutive stretch of at least 17 hydrophobic residues. Even when the methods did predict its presence, they failed to predict residues at the N- and C-termini of the helix (6 residues or more at either end). The residues not predicted as TM were often either polar or charged amino acids. For instance, residues 197-203 are EHDHRKR and residues 223-231 are KRHKWSVLK. This theme of predicting the core hydrophobic region of a transmembrane helix but not detecting the more polar amino- and carboxyl-ends for long helices was repeated for most of the predictions by the advanced methods. The simple hydrophobic methods tended to identify these long helices. For instance, for 1BGY:D, many of the simple hydrophobicity-based methods missed the first seven but correctly detected the last seven residues. In fact three simple methods captured the entire length of the observed helix. This example might suggest a potential strategy to deal with long membrane helices. (1) Identify consensus regions for long membrane helices through simple hydrophobicity scales. (2) Determine the core of the membrane segment through advanced prediction methods. (3) Extend the predicted helix in the directions of both the N- and C-termini by both using a scoring matrix that is optimised for non-transmembrane residues and by setting the boundaries of extension within the region defined by the simple methods.

 

 

Methods

Data sets. For the high-resolution data, we started with 105 chains from helical membrane proteins with high-resolution structures deposited in PDB [3] . Next, we reduced the bias in this data set resulting from multiple copies of similar proteins. This left a set of 36 high-resolution proteins that were sequence-unique in the sense that no pair in that list had an HSSP-distance above 0 [37] ( [10] for more details). We identified membrane regions through DSSP [1] . For the low-resolution data, we used a sequence-unique subset of the expert-curated set of helical membrane proteins for which good low-resolution experimental evidence about localisation was available [38] . The final sequence-unique subset contained 165 proteins.

Advanced prediction methods. We referred to prediction methods as 'advanced' when they implement more than 'simple' hydrophobicity scales. We tested the following programs: DAS, HMMTOP (version 2), PHDhtm, PHDpsihtm, PRED-TMR, SOSUI, TMHMM (version 2), and TopPred2. TopPred2 averages the GES-scale of hydrophobicity [39] using a trapezoid window [12, 40] . PHDhtm combines a neural network using evolutionary information with a dynamic programming optimisation of the final prediction [41, 18] . PHDpsihtm uses PSI-BLAST [42] alignments as input (Rost, unpublished). DAS optimises the use of hydrophobicity plots [43] . SOSUI [15] uses a combination of hydrophobicity and amphiphilicity preferences to predict membrane helices. TMHMM is the most advanced – and seemingly most accurate - current method to predict membrane helices [44] . It embeds a number of statistical preferences and rules into a Hidden Markov model to optimise the prediction of the localisation of membrane helices and their orientation (note: similar concepts are used for HMMTOP [45] ). PRED-TMR uses a standard hydrophobicity analysis with emphasis on detecting the ends and beginnings of membrane helices [46] .

Simple methods exclusively based on hydrophobicity scales. We also implemented our in-house prediction methods that simply used various hydrophobicity scales for prediction. In particular, we tested the following scales. A-Cid: normalised hydrophobicity scale for alpha-proteins [47] , Av-Cid: normalised average hydrophobicity scale [47] , Ben-Tal: Hydrophobicity scale representing free energy of transfer of an amino acid from water into the centre of the hydrocarbon region of a model lipid bilayer [48] , Bull-Breese: Bull-Breese hydrophobicity scale [49] , Eisenberg: normalised consensus hydrophobicity scale [50] , EM: Solvation free energy [51] , Fauchere: hydrophobic parameter pi from the partitioning of N-acetyl-amino-acid amides [52] , GES: hydrophobicity property [39] , Heijne: transfer free energy to lipophilic phase [53] , Hopp-Woods: Hopp-Woods hydrophilicity value [54] , KD: Kyte-Doolittle hydropathy index [55] , Lawson: transfer free energy [56] , Levitt: Hydrophobic parameter [57] , Nakashima: normalised composition of membrane proteins [58] , Radzicka: transfer free energy from 1-octanol to water [59] , Roseman: solvation corrected side-chain hydropathy [60] , Sweet: optimal matching hydrophobicity [61] , Wolfenden: hydration potential [62] , and WW: Wimley-White scale [7] . Replacing the WW scale with each of the above-mentioned hydrophobicity indices, we used the WW algorithm to evaluate the predictive performance of each index.

Measuring accuracy. In order to establish whether or not short loops and long membrane helices pose particular problems for prediction methods, we have to deviate from the scores used to evaluate performance of membrane predictions methods [10] . In particular, we introduced the following scores that describe the difference in performance between short and long loops (∆QL(N), Eq. 1), and that between short and long transmembrane helices (∆QT(N), Eq. 2).

(1) Short loops. We evaluated the performance of predicting short ‘loops’;, i.e. regions connecting two membrane helices with ≤ N residues by by compiling the difference between the accuracy in predicting short and long loops:

  ∆QL(N) =  (Eq. 1)

where N is the number of residues; ‘Nloop < n identified’ is the number of loops with < N residues that were ‘correctly predicted’ and ‘Nloop < N observed’ the number of loops observed to have < N residues. We considered a loop of N residues to be correctly predicted, if at least one residue in that loop was predicted, i.e. if the presence of a break between two helices was correctly identified. ∆QL(n) could adopt values between –100 and 100; negative values indicate that longer loops are predicted more accurately than shorter ones.

(2) Long helices. In analogy to the score describing the performance for short loops, we evaluated the performance of predicting long transmembrane helices by compiling the difference between the accuracy in predicting short and long helices:

  ∆QT(N) =  (Eq. 2)

where ‘Ntm ≥ N identified’ is the number of transmembrane helices with ≥ N residues that were ‘correctly predicted’ and ‘Ntm ≥ N’ the number of transmembrane helices with ≥ N residues observed. We considered a helix to be correctly predicted if it overlapped at least for 3 residues with the observed helix and if it was predicted as one continuous helix (over the region of the observed helix). This measure is illustrated in the following example for a prediction (T=Transmembrane, ‘-‘ loop):

Observed: ---------TTTTTTTTTTTTTTTTTTTTT--------------
Predict 1:--------------------------TTTTTTTTTT--------
Predict 2:---TTTTTTTTTTTTTT-TTTTTTTTTTTTTTTT----------

In this example ‘Predict 1’ is right, ‘Predict 2’ is wrong, since all we are trying to capture is whether or not methods tended to split long transmembrane helices. ∆QT(N) ranges from -100 to 100; it becomes negative if helices shorter than N residues are predicted more accurately than helices ≥ N.

 

 

Acknowledgements

Thanks to Jinfeng Liu (Columbia) for computer assistance and the collection of genome data sets. The work of BR was supported by the grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institute of Health (NIH) and by the grant DBI-0131168 from the National Science Foundation (NSF). Last, not least, thanks to all those who deposit their experimental data in public databases, and to those who maintain these databases.

 

 

References

1.Kabsch, W. & Sander, C. (1983).Dictionary of protein secondary structure: pattern recognition of hydrogenbonded and geometrical features. Biopolymers, 22, 2577-2637.
2.Bernstein, F. C., Koetzle, T. F.,Williams, G. J. B., Meyer, E. F., Brice, M. D. et al. (1977). The Protein DataBank: a computer based archival file for macromolecular structures. J. Mol.Biol., 112, 535-542.
3.Berman, H. M., Westbrook, J., Feng,Z., Gillliland, G., Bhat, T. N. et al. (2000). The Protein Data Bank. Nucl.Acids Res., 28, 235-242.
4.Bairoch, A. & Apweiler, R.(2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in2000. Nucl. Acids Res., 28, 45-48.
5.Rost, B. (1996). PHD: predictingone-dimensional protein structure by profile based neural networks. Meth.Enzymol., 266, 525-539.
6.Rost, B. (2001). Protein secondarystructure prediction continues to rise. J. Struct. Biol., 134, 204-218.
7.Jayasinghe, S., Hristova, K. &White, S. H. (2001). Energetics, stability, and prediction of transmembranehelices. J. Mol. Biol., 312, 927-934.
8.Ikeda, M., Arai, M., Lao, D. M.& Shimizu, T. (2001). Transmembrane topology prediction methods: Are-assessment and improvement by a consensus method using a dataset ofexperimentally-characterized transmembrane topologies. In Silico Biol. , 1,http://www.bioinfo.de/isb/2001/02/0003/.
9.Möller, S., Croning, D. R.& Apweiler, R. (2001). Evaluation of methods for the prediction of membranespanning regions. Bioinformatics, 17, 646-653.
10.Chen, C. P., Kernytsky, A. &Rost, B. (2002). Transmembrane helix predictions revisited. Prot. Sci., inpress.
11.von Heijne, G. (1996). Predictionof transmembrane protein topology. In Protein structure prediction (Sternberg,M. J. E., eds.), pp. 101-110, Oxford Univ. Press, Oxford.
12.von Heijne, G. (1992). Membraneprotein structure prediction. J. Mol. Biol., 225, 487-494.
13.Casadio, R., Fariselli, P., Taroni,C. & Compiani, M. (1996). A predictor of transmembrane a-helix domains ofproteins based on neural networks. European Journal of Biophysics, 24, 165-178.
14.Persson, B. & Argos, P. (1996).Topology prediction of membrane proteins. Prot. Sci., 5, 363-371.
15.Hirokawa, T., Boon-Chieng, S. &Mitaku, S. (1998). SOSUI: classification and secondary structure predictionsystem for membrane proteins. Bioinformatics, 14, 378-379.
16.Jones, D. T., Taylor, W. R. &Thornton, J. M. (1994). A model recognition approach to the prediction ofall-helical membrane protein structure and topology. Biochem., 33, 3038-3049.
17.Rost, B., Casadio, R. &Fariselli, P. (1996). Refining neural network predictions for helicaltransmembrane proteins by dynamic programming. In Fourth InternationalConference on Intelligent Systems for Molecular Biology (States, D., Agarwal,P., Gaasterland, T., Hunter, L. & Smith, R. F., eds.), pp. 192-200, MenloPark, CA: AAAI Press, St. Louis, M.O., U.S.A..
18.Rost, B., Casadio, R. &Fariselli, P. (1996). Topology prediction for helical transmembrane proteins at86% accuracy. Prot. Sci., 5, 1704-1718.
19.Krogh, A., Larsson, B., von Heijne,G. & Sonnhammer, E. L. (2001). Predicting transmembrane protein topologywith a hidden Markov model: application to complete genomes. J. Mol. Biol.,305, 567-580.
20.Tusnady, G. E. & Simon, I.(2001). Topology of membrane proteins. J Chem Inf Comput Sci, 41, 364-368.
21.Bowie, J. U. (1997). Helix packingin membrane proteins. J. Mol. Biol., 272, 780-799.
22.Iwata, S., Lee, J. W., Okada, K.,Lee, J. K., Iwata, M. et al. (1998). Complete structure of the 11-subunitbovine mitochondrial cytochrome BC1 complex. Science, 281, 64.
23.Toyoshima, C., Nakasako, M.,Nomura, H. & Ogawa, H. (2000). Crystal structure of the calcium pump ofsarcoplasmic reticulum at 2.6 Ångstrøm resolution. Nature, 405,647.
24.Tsukihara, T., Aoyama, H.,Yamashita, E., Tomizaki, T., Yamaguchi, H. et al. (1996). The whole structuureof the 13-subunit oxidized cytochrome C oxidase at 2.8 Å. Science, 272,1136.
25.Iverson, T. M., Luna-Chavez, C.,Cecchini, G. & Rees, D. C. (1999). Structure of the E. coli fumaratereductase respiratory complex. Science, 284, 1961.
26.Monne, M., Hermansson, M. & vonHeijne, G. (1999). A turn propensity scale for transmembrane helices. J. Mol.Biol., 288, 141-145.
27.Monne, M., Nilsson, I., Elofsson,A. & von Heijne, G. (1999). Turns in transmembrane helices: determination ofthe minimal length of a "helical hairpin" and derivation of afine-grained turn propensity scale. Journal of Molecular Biology, 293, 807-814.
28.Monne, M. & von Heijne, G.(2001). Effects of 'hydrophobic mismatch' on the location of transmembrane helicesin the ER membrane. FEBS Lett., 496, 96-100.
29.Liu, J. & Rost, B. (2001).Comparing function and structure between entire proteomes. Prot. Sci., 10,1970-1979.
30.Liu, J. & Rost, B. (2002).Target space for structural genomics revisited. Bioinformatics, 18, 922-933.
31.Goffeau, A., Nakai, K., Slonimski,P. & Risler, J.-L. (1993). The membrane proteins encoded by yeastchromosome III genes. FEBS Lett., 325, 112-117.
32.Arkin, I. T., Brünger, A. T.& Engelman, D. M. (1997). Are there dominant membrane protein families witha given number of helices? Proteins, 28, 465-466.
33.Frishman, D. & Mewes, H. W.(1997). Protein structural classes in five complete genomes. Nat. Struct.Biol., 4, 626-628.
34.Jones, D. T. (1998). Do transmembraneprotein superfolds exist? FEBS Lett., 423, 281-285.
35.Wallin, E. & von Heijne, G.(1998). Genome-wide analysis of integral membrane proteins from eubacterial,archaean, and eukaryotic organisms. Prot. Sci., 7, 1029-1038.
36.Gupta, R., Jung, E., Gooley, A. A.,Williams, K. L., Brunak, S. et al. (1999). Scanning the available Dictyosteliumdiscoideum proteome for O-linked GlcNAc glycosylation sites using neuralnetworks. Glycobiology, 9, 1009-1022.
37.Rost, B. (1999). Twilight zone ofprotein sequence alignments. Prot. Engin., 12, 85-94.
38.Moller, S., Kriventseva, E. V.& Apweiler, R. (2000). A collection of well characterised integral membraneproteins. Bioinformatics, 16, 1159-1160.
39.Engelman, D. M., Steitz, T. A.& Goldman, A. (1986). Identifying nonpolar transbilayer helices in aminoacid sequences of membrane proteins. Annu. Rev. Biophys. Biophys. Chem., 15,321-353.
40.Sipos, L. & von Heijne, G.(1993). Predicting the topology of eukaryotic membrane proteins. Eur. J. Biochem.,213, 1333-1340.
41.Rost, B., Casadio, R., Fariselli,P. & Sander, C. (1995). Prediction of helical transmembrane segments at 95%accuracy. Prot. Sci., 4, 521-533.
42.Altschul, S., Madden, T., Shaffer,A., Zhang, J., Zhang, Z. et al. (1997). Gapped Blast and PSI-Blast: a newgeneration of protein database search programs. Nucl. Acids Res., 25,3389-3402.
43.Cserzö, M., Wallin, E., Simon,I., von Heijne, G. & Elofsson, A. (1997). Prediction of transmembranea-helices in prokaryotic membrane proteins: the dense alignment surface method.Prot. Engin., 10, 673-676.
44.Sonnhammer, E. L. L., von Heijne,G. & Krogh, A. (1998). A hidden Markov model for predicting transmembranehelices in protein sequences. In Sixth International Conference on IntelligentSystems for Molecular Biology (ISMB98) (Glasgow, J., Littlejohn, T., Major, F.,Lathrop, R., Sankoff, D. et al., eds.), pp. 175-182, AAAI Press, Montreal,Canada.
45.Tusnady, G. E. & Simon, I.(1998). Principles governing amino acid composition of integral membraneproteins: application to topology prediction. J. Mol. Biol., 283, 489-506.
46.Pasquier, C., Promponas, V. J.,Palaios, G. A., Hamodrakas, J. S. & Hamodrakas, S. J. (1999). A novelmethod for predicting transmembrane segments in proteins based on a statisticalanalysis of the SwissProt database: the PRED-TMR algorithm. Prot. Engin., 12,381-385.
47.Cid, H., Bunster, M., Canales, M.and Gazitua, F. (1992). Hydrophobicity and structural classes in proteins.Prot. Engin., 5, 373-375.
48.Kessel, A. & Ben-Tal, N.(2002). Free energy determinants of peptide association with lipid bilayers. InPeptide-lipid interactions (Simon, S. & McIntosh, T., eds.), pp. in press,Academic Press, San Diego.
49.Bull, H. B. a. B., K. (1974). Surfacetension of amino acid solutions: A hydrophobicity scale of the amino acidresidues. Arch. Biochem. and Biophys., 161, 665-670.
50.Eisenberg, D., Weiss, R. M. &Terwilliger, T. C. (1984). The hydrophobic moment detects periodicity inprotein hydrophobicity. Proc. Natl. Acad. Sci. U.S.A., 81, 140-144.
51.Eisenberg, D. & McLachlan, A.D. (1986). Solvation energy in protein folding and binding Nature, 319,199-203.
52.Fauchere, J. L. & Pliska, V.(1983). Hydrophobic parameters pi of amino-acid side chains from thepartitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem., 18, 369-375.
53.von Heijne, G. & Blomberg, C.(1979). Trans-membrane translocation of proteins: The direct transfer model.Eur. J. Biochem., 97, 175-181.
54.Hopp, T. P. & Woods, K. R.(1981). Prediction of protein antigenic determinants from amino acid sequences.Proc. Natl. Acad. Sci. U.S.A., 78, 3824-3828.
55.Kyte, J. & Doolittle, R. F.(1982). A simple method for displaying the hydrophathic character of a protein.J. Mol. Biol., 157, 105-132.
56.Lawson, E. Q., Sadler, A. J.,Harmatz, D., Brandau, D. T., Micanovic, R. et al. (1984). A simple experimentalmodel for hydrophobic interactions in proteins. J. Biol. Chem., 259, 2910-2912.
57.Levitt, M. (1976). A simplifiedrepresentation of protein conformations for rapid simulation of proteinfolding. J. Mol. Biol., 104, 59-107.
58.Nakashima, H., Nishikawa, K. &Ooi, T. (1990). Distinct character in hydrophobicity of amino acid compositionof mitochondrial proteins. Proteins, 8, 173-178.
59.Radzicka, A. & Wolfenden, R.(1988). Comparing the polarities of the amino acids: Side-chain distributioncoefficients between the vapor phase, cyclohexane, 1-octanol, and neutralaqueous solution. Biochem., 27, 1664-1670.
60.Roseman, M. A. (1988).Hydrophilicity of polar amino acid side-chains is markedly reduced by flankingpeptide bonds. J. Mol. Biol., 200, 513-522.
61.Sweet, R. M. & Eisenberg, D.(1983). Correlation of sequence hydrophobicities measures similarity inthree-dimensional protein structure. J. Mol. Biol., 171, 479-488.
62.Wolfenden, R., Andersson, L.,Cullis, P. M. & Southgate, C. C. B. (1981). Affinities of amino acid sidechains for solvent water. Biochem., 20, 849-855.
63.Sayle, R. A. & Milner-White, E.J. (1995). RASMOL: biomolecular graphics for all. TIBS, 20, 37. 

Contact:    rost@columbia.edu Version:    Sep 12, 2002
top - CUBIC-papers - CUBIC