Topology prediction for helical transmembrane proteins at 86% accuracy
Burkhard Rost 1, Piero Fariselli 2 & Rita Casadio 2
1 Corresponding author: then: EMBL
now: Dept. Biochemistry & Mol. Biophys., Columbia Univ., 650 West 168th Street BB217, New York, NY 10032, USA, rost@columbia.edu, http://cubic.bioc.columbia.edu/
2 then and now: Lab. of Biophysics, Dep. of Biology, Univ. of Bologna, 40126 Bologna, Italy
QUOTE:
Protein Science 1996 Vol. 5, pp. 1704-1718
Abstract
Previously, we introduced a neural network system predicting locations of transmembrane helices based on evolutionary profiles (PHDhtm, (Rost et al., 1995). Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that optimises helices compatible with the neural network output. The extension is the prediction of topology (orientation of first loop region with respect to membrane) by applying to the refined prediction the observation that positively charged residues are more abundant in extra-cytoplasmic regions. Furthermore, we introduce a method to reduce the number of false positives, i.e., proteins falsely predicted with membrane helices. The evaluation of prediction accuracy is based on a cross-validation and a double-blind test set (in total 131 proteins). The final method appears to be more accurate than other methods published. (1) For almost 89% (3%) of the test proteins all transmembrane helices are predicted correctly. (2) For more than 86% (3%) of the proteins topology is predicted correctly. (3) We define reliability indices which correlate with prediction accuracy: for one half of the proteins segment accuracy raises to 98%; and for two-thirds accuracy of topology prediction is 95%. (4) The rate of proteins for which transmembrane helices are predicted falsely is below 2% (1%). Finally, the method is applied to 1616 sequences of Haemophilus influenzae. We predict 19% of the genome sequences to contain one or more transmembrane helices. This appears to be lower than what we predicted previously for the yeast VIII chromosome (about 25%).
Introduction
Integral membrane proteins comprise an important class of proteins for which experimental techniques for three-dimensional (3D) structure determination are often not applicable. Fortunately, theoretical prediction of structural aspects is simpler for membrane proteins than it is for globular proteins as the lipid bilayer imposes strong constraints on the degrees of freedom for the 3D structure (von Heijne, 1981, Eisenberg et al., 1984, Engelman et al., 1986, von Heijne & Gavel, 1988, von Heijne, 1989, von Heijne, 1992, Taylor et al., 1994, Rost et al., 1995).
Prediction of transmembrane helices. 3D structures are experimentally determined for two types of membrane proteins: (1) helical proteins consisting of typically apolar helices of about 20 residues that cross the membrane perpendicular to its surface (photo-reaction centre (Deisenhofer et al., 1985); bacteriorhodopsin (Henderson et al., 1990); light harvesting complex II (Wang, 1994)), cytochrome C oxidase (Iwata et al., 1995); and (2) b proteins consisting of 16-stranded b-barrels (porin (Weiss & Schulz, 1992, Cowan & Rosenbusch, 1994, Kreusch & Schulz, 1994)). Methods for the prediction of transmembrane segments usually focus on helical transmembrane proteins for which more experimental data is available. Prediction methods were designed to predict the locations of transmembrane helices (von Heijne, 1981, Argos et al., 1982, Kyte & Doolittle, 1982, Engelman et al., 1986, von Heijne, 1986b, von Heijne, 1986a, Cornette et al., 1987, von Heijne & Gavel, 1988, Degli Esposti et al., 1990, von Heijne & Manoil, 1990, Landolt-Marticorena et al., 1992, von Heijne, 1992, Donnelly et al., 1993, Edelman, 1993, O'Hara et al., 1993, Sipos & von Heijne, 1993, Jones et al., 1994, Persson & Argos, 1994, Donnelly & Findlay, 1995, Casadio et al., 1996), and the orientation of transmembrane helices with respect to the cell (dubbed topology, Fig. 1, (von Heijne & Gavel, 1988, von Heijne, 1989, Nilsson & von Heijne, 1990, von Heijne, 1992, Sipos & von Heijne, 1993, Jones et al., 1994, Casadio & Fariselli, 1996)). If the locations of the transmembrane helices and the topology are known at sufficient accuracy, 3D structure can be successfully predicted for the membrane spanning segments by an exhaustive search of the entire possible structure space (Taylor et al., 1994).
Accuracy of prediction methods. One of the problems in predicting structure for helical transmembrane proteins is the lack of accurate experimental information. Most prediction methods designed for globular water-soluble proteins are typically based on more than 100 proteins (Rost & Sander, 1994, Rost & Sander, 1995) of known 3D structure as stored in PDB (Bernstein et al., 1977). To obtain sufficiently large data sets, prediction methods for membrane proteins use data from experimental sources other than crystallography or spectroscopy (Manoil & Beckwith, 1986, Park et al., 1992, Hennessey & Broome-Smith, 1993). There are numerous examples for proteins for which 'reliable experimental information' obtained from different groups is contradictory. To list a few controversial cases: (1) nicotinic acetylcholine receptor channel: four a-helices vs. two a-helices and two b-strands (Hucho et al., 1994); (2) P-type ATPases: eight vs. ten a-helices (Stokes et al., 1994); (3) a-subunit of the FO channel E. coli: topology out (Lewis et al., 1990) vs. topology in (Bjorbaek et al., 1990); (4) mitochondrial cytochrome B: 7-9 a-helices (Degli Esposti et al., 1993). One consequence of this is that prediction methods are likely to become more accurate as reliable experimental information about integral membrane proteins is being added to the databases. Another consequence, however, is the problem to adequately estimate prediction accuracy. Thus, estimates for expected accuracy have to be taken with caution.
Further improvement of prediction accuracy necessary? Advanced methods for the prediction of transmembrane helices (Jones et al., 1994, Persson & Argos, 1994, Rost et al., 1995) reach levels of about 90% accuracy (correctly predicted transmembrane helices). Thus, predictions of transmembrane helices are significantly more accurate than are two-state secondary structure predictions of, e.g., helix, non-helix for globular proteins (Rost & Sander, 1993b). Is there any need for improving 1D predictions for transmembrane proteins further? Indeed, two methods that start from 1D predictions of transmembrane helices to predict further aspects of 3D structure would presumably benefit from better 1D predictions. (1) Taylor and colleagues (Taylor et al., 1994) achieve to predict 3D structure for the membrane spanning helices when starting from the knowledge of the exact locations of the helices. In general, current 1D predictions are not accurate enough to provide the demanded precision in locating the helices. (2) A simple and successful technique to predict topology is the positive-inside rule (von Heijne & Gavel, 1988, Hartmann et al., 1989, von Heijne, 1989, Boyd & Beckwith, 1990, Dalbey, 1990, Nilsson & von Heijne, 1990, von Heijne, 1992, Sipos & von Heijne, 1993): positively charged residues occur more often in intra-cytoplasmic than in extra-cytoplasmic regions. Applying this rule for the prediction of topology relies crucially on a correct prediction of the non-transmembrane regions. We shall show, that relatively small improvements in 1D predictions of transmembrane helices can result in significantly better predictions of topology.
An improvement and extension of a previously described technique to predict locations of transmembrane helices (Rost et al., 1995) is presented, here. The initial method (PHDhtm) used information derived from multiple sequence alignments as input for a system of neural networks (Fig. 2: step 1). The neural network preferences were used in two ways. (1) A region of 18 adjacent residues was searched that had the highest propensity in the protein to be in a transmembrane helix (HTM) (Fig. 2 : step 2). Then two thresholds were applied (eq. 5 ) to decide whether or not the protein was predicted to contain, at least, one HTM. (2) The preferences for HTM and not-HTM were input to a dynamic programming algorithm that produced a model (locations and number of HTM's) that was optimally compatible with the neural network preferences and the assumption that the protein contains transmembrane helices of lengths 18-25 residues (Fig. 2 : step 3; Fig. 6 and Fig. 7). By working on the preferences for the entire protein, the refinement procedure introduced an aspect global in sequence, i.e., the resulting model was not as constrained to signals local in sequence (17 adjacent residues used as input to the neural networks) as the previous network prediction. Finally, the refinement model was used to predict topology (Fig. 1) by applying the positive-inside rule (Fig. 2 : step 4; Fig. 6). The main elements of the method are described in mathematical details elsewhere (Rost et al., 1996). Here, we focused on the new aspects (reduction of false positives; definition of reliability indices for the prediction) and present a thorough analysis of the performance of the novel method. Finally, the tool was applied to the first entirely sequenced genome of Haemophilus influenzae (Fleischmann et al., 1995) and particular aspects of the results were compared to an analysis of the yeast VIII chromosome (Rost et al., 1995).

Fig. 1. Topology for helical transmembrane proteins. In one class of membrane proteins, typically apolar helical segments are embedded in the lipid bilayer oriented perpendicular to the surface of the membrane. The helices can be regarded as more or less rigid cylinders. The orientation of the helical axes, i.e. the topology of the transmembrane protein, can be defined by the orientation of the first N-terminal residues with respect to the cell. The topology is defined as out when the protein N-term starts on extra-cytoplasmic region (protein A) and as in if the N-term starts on the intra-cytoplasmic side (proteins B and C).
|
|
Fig. 2. From sequence to topology prediction. Step 1: Sequences similar to the input were found in SWISS-PROT (Bairoch & Boeckmann, 1994) using BLAST (Altschul et al., 1990, Karlin & Altschul, 1990); likely homologues were picked re-aligned by MAXHOM (Sander & Schneider, 1991, Sander & Schneider, 1994); and the alignment was fed into the neural network system PHDhtm (Rost et al., 1995). The network preferences for each residue to be in a transmembrane helix (H) or to be outside of the lipid bilayer (L) were used as for the post-processing methods described here. Step 2: The region of 18 adjacent residues with maximal preference for H was picked, normalised by the preferences for L and a decision-threshold was applied to manage the distinction between proteins with and without transmembrane helices. Step 3: The network preferences were used as input to a dynamic programming algorithm that found the model (number and locations of HTM's) representing the best path through all possible models consisting of HTM's between 18 and 25 residues by optimising the compatibility of the model with the neural network outputs. Step 4: The final refined model output from the dynamic programming was used to apply the positive-inside rule (von Heijne & Gavel, 1988, von Heijne, 1992). |
Results
Correct prediction of all transmembrane helices for almost 90% of the proteins
Refinement procedure significantly better than original neural network. The refinement algorithm (PHDhtm_ref), used here, systematically optimised the transmembrane segments compatible with the output of the neural network system PHDhtm. The success was that the number of proteins for which all transmembrane helices (dubbed HTM's) were predicted correctly almost doubled (Table 1). More than 98% of all observed transmembrane helices were predicted correctly by PHDhtm_ref (337 of 341 observed; Table 1). Tendency was a marginal over-prediction (341 observed, 354 predicted; Table 1). Prediction accuracy was higher for proteins which were observed to contain more than one HTM (data not shown).
Table 1. Accuracy of predicting transmembrane helices and

(a) Results given for cross-validation set (83 proteins; electronic appendix or (Rost WWW, 1996b)), double-blind set (48 proteins; electronic appendix or (Rost WWW, 1996b)), and for the sum of these two. Methods: PHDhtm_nof , neural network results (no filter); PHDhtm_fil , neural network with empirical filter (Rost et al., 1995); PHDhtm_ref , refined version of PHDhtm described here; Jones et al., 1994 , prediction method of Jones et al., 1994. Scores and numbers: Nprot, number of proteins; Nobs , number of transmembrane helices (HTM's) observed; Nprd, number of HTM's predicted; Ncor, number of HTM's correctly predicted; Q2 , percentage of residues predicted correctly in either of the two states: HTM, or not-HTM; QM, percentage of proteins for which all HTM's were predicted correctly ; QT , percentage of proteins for which the topology and all HTM's were predicted correctly. Note: as a rule of thumb: for an evaluation set of 131 proteins and two standard deviations of 2 ¥ 3.1%, an improvement of > 0.6% would be significant.
(b) Results compiled from literature (Jones et al., 1994).
(c) Subsets with all eukaryotic and all prokaryotic proteins.
(d) Estimated error: x , where x was one standard deviation for a binomial distribution.
|
|
Fig. 3. Reliability of predicting correct model. A: Reliability of model vs. number of transmembrane helices observed. Note: to separate the points on the horizontal axis, we added a random number between 0 and 1 to the number of transmembrane helices, i.e. all entries between two grey vertical lines represent the same number of helices. Crosses mark proteins for which all segments were predicted correctly; open circles proteins for which some helices were predicted falsely. For example, the highest index for a falsely predicted protein was 5 (myp0_human). B: Percentage of proteins for which all transmembrane helices were correctly predicted versus the cumulative percentage of proteins predicted with a reliability index RiS(M) n, n = 0 (low), 1, ..., 8 (high) . RiS(M) 0 is the rightmost point representing 100% of the proteins. For example, more than 60% of all proteins were predicted with RiS(M) 2 ; for 95% of these all transmembrane helices were predicted correctly. |
Refinement procedure better at predicting segments than empirical filter. Trans-membrane helices predicted by PHDhtm alone were too long (266 predicted vs. 341 observed; Table 1). The reason is that loop regions between two transmembrane segments are often very hydrophobic. Since the neural network only 'sees' bio-chemical properties of amino acids, the second level of neural networks introduced to account for correlations between adjacent residues (Rost et al., 1995, Rost, 1996) frequently predicted helices extending over more than 40 residues. Thus, the network system could not learn external constraints imposed on the structure. Previously, we have corrected this shortcoming by introducing an empirical filter that simply chopped too long helices into several shorter ones (Rost et al., 1995); PHDhtm_fil: 340 HTM's predicted vs. 341 observed; Table 1). The refinement algorithm pursued systematically a similar goal. PHDhtm_ref predicted slightly less residues correctly than PHDhtm_fil, but was slightly better at predicting correctly transmembrane helices (Table 1).
Expected accuracy verified by double-blind test. After we had completed all tests with the cross-validation set of 83 membrane proteins, we tested all methods on the double-blind set of 48 proteins. The results corrected our previous estimates for prediction accuracy to higher values. In particular, PHDhtm_ref performed even better when applied to a set of proteins which had never been used before (Table 1). (Note: most results presented in the following hold for the entire set of 131 proteins, i.e., cross-validation plus double-blind set.)
Reliability index guide for expert-driven improve-ment of accuracy. The reliability index defined for the final best refined model (eq. 3 ) correlated well with prediction accuracy (Fig. 3). In practice, this allows to focus on the subset of proteins that were predicted more reliably. For example, 66 proteins were predicted at levels of RiSDO3( M) 3; for 65 of these 66 proteins all predicted HTM's were correct (Fig. 3; outlier: myp0_human for which the signal peptide was predicted as HTM; see electronic appendix or (Rost WWW, 1996a)).
Second best model occasionally correct. The dynamic programming-like algorithm yielded a list of possible models. Results reported refer to the best model (best according to eq. 1). However, the second best model was occasionally better: five of the 14 proteins (out of 131) predicted with errors (cox2_parde, ig1r_human, il2b_human, myp0_human, rfpb_salty; electronic appendix or (Rost WWW, 1996a)) were correctly predicted by the second best model. For another seven (of the 14), the second best model was more accurate than the best. Thus, additional expert information may have had reduced the error from 11% to 7% or even to 2%. Expert decisions could have been based on the reliability index that was > 2 for only one of the 14 proteins (myp0_human; for comparison: average reliability for all correctly predicted proteins = 3.4; Fig. 3; for details: electronic appendix or (Rost WWW, 1996a)).
Correct topology prediction for more than 85% of the proteins
Refinement most successful in predicting topology. The empirical filter was slightly superior to the refinement in predicting residues, and slightly inferior in predicting segments. Which was more crucial for using the resulting models (i.e. predictions of all HTM's) to predict topology? Using the refinement procedure as basis for the positive-inside rule, we correctly predicted topology (and all HTM's) for 86% of all proteins (vs. 82% for PHDhtm_fil; Table 1). Thus, PHDhtm_ref was significantly more useful as input for topology prediction than PHDhtm_fil. Furthermore, for more than 90% of the proteins the orientation of the first non-membrane region was correctly predicted (data not shown; note: a random prediction would be correct in about 52% of all cases).
Positive-inside rule not the limiting factor. For 117 proteins all HTM's were correctly predicted; for 113 of these the topology was correctly predicted. For three of the four proteins for which the predicted topology was not in accordance with the SWISS-PROT entries (4f2_human, lh4_rhoac, and ssrg_rat), the application of the positive-inside rule yielded the wrong topology even when starting from HTM locations annotated in SWISS-PROT. The simple positive-inside rule yielded the correct topology for almost 97% of the proteins given HTM locations annotated in SWISS-PROT. Thus, the simplicity of the positive-inside rule was not the limiting factor for prediction accuracy.
Reliability index correlates with prediction accuracy. The value of the charge difference between extra- and intra-cytoplasmic non-transmembrane regions correlated with prediction accuracy (Fig. 4). The reliability index RiSDO3( T) (eq. 4 ) was > 5 for only three falsely predicted proteins (myp0_human, iggb_strsp, and gaa4_bovin). For all three some HTM's were falsely predicted (electronic appendix or (Rost WWW, 1996a)). For only one of these three (myp0_human) the predicted model had, as well, a high reliability, and thus could not have been suspected as a wrong prediction by an expert. For two chains from the cytochrome C oxidase (cox1_parde and cox3_parde; electronic appendix or (Rost WWW, 1996a)) we trusted our prediction more than the SWISS-PROT annotations for a homologue. The X-ray determination of the structure for cytochrome C oxidase (Iwata et al., 1995) revealed the correctness of the prediction (and consequently the mistake in SWISS-PROT; details in electronic appendix or (Rost WWW, 1996a)).
Eukaryotic proteins predicted at higher accuracy. Separating the results for eukaryotic, prokaryotic and viral proteins revealed three results. (1) Topology and all transmembrane helices were predicted better than average for eukaryotic proteins (Table 1). (2) The positive-inside rule was about equally successful for both classes, i.e., given a correct prediction of all HTM's, the topology prediction was correct for 96.6% of the eukaryotic and for 96.0% of the prokaryotic proteins (Table 1). (3) The five viral proteins in our set were predicted correctly, although they all had single membrane spanning (expected accuracy below average; data not shown). However, five proteins are too few to justify any conclusion from this evidence. Why was prediction accuracy significantly higher for Eukaryotes than for Prokaryotes? We failed to find a satisfying answer. Several factors may have contributed to the higher accuracy for Eukaryotes: (i) the multiple sequence alignments were more informative for the Eukaryotes (20% of the alignments for Eukaryotes had less than 4; 30% less than 10 sequences aligned; the respective numbers for Prokaryotes: 40% and 70%!); (ii) eukaryotic HTM's are longer (on average 23 residues, vs. 21 for Prokaryotes; longer HTM's are predicted more reliably); (iii) there are marginally more hydrophobic residues in eukaryotic HTM's (subclass of residues for which prediction accuracy was highest) and slightly more charged residues in eukaryotic non-HTM regions (second best predicted class of residues).
Reliable discrimination between proteins with and without transmembrane helices
Significant reduction of false positives by evaluating strongest HTM. The usefulness of transmembrane predictions for the analysis of entire genomes depends crucially on the rate of false positives (i.e., proteins falsely predicted to contain transmembrane helices). Here, we introduced a method tailored to reduce false positives. The method based on the hypothesis that proteins with and without transmembrane helices separate most clearly when comparing a single region predicted with highest average propensity for HTM. Applying a strict decision threshold (eq. 5 ), the percentage of false positives was reduced below 2% (Table 2, note that the low rate of false positives was obtained at the expense of a higher false negative rate). False classifications occurred for proteins with very hydrophobic patches (for two of the falsely predicted seven proteins, HTM's were predicted for observed strands: TATA-box binding protein, 1ytbA and the Racemase, 2mnr).
Will the estimate for false classifications hold for entire genomes? The investigated set of 435 globular proteins resulted in more conservative estimates for the error rate than did smaller sets used previously (sets with 278, resp. 155 proteins; Table 2). The difference between the error rate for the maximal unique data set of 18 months ago and the maximal set used now (PHDhtm_fil for 238 vs. 435 proteins, Table 2) indicates that the estimated rate of false positives ought to be viewed with scepticism. Improved experimental techniques may determine structures for proteins with very hydrophobic regions that could be predicted falsely as HTM's. Furthermore, the analysis based on proteins contained in PDB which do not contain signal peptides, i.e., the problem that the refined prediction frequently confused transmembrane helices and signal peptides is not taken into account. Thus, an expected rate of less than 2% false positives (Table 2) may prove to be too optimistic.

Fig. 4: Reliability of topology prediction. A: Charge difference vs. number of transmembrane helices observed. Note: to separate the points on the horizontal axis, we added a random number between 0 and 1 to the number of transmembrane helices, i.e. all entries between two grey vertical lines represent the same number of helices. Crosses mark proteins for which all segments and the topology were predicted correctly; filled triangles proteins for which the topology prediction was wrong although all helices were correctly predicted; open circles proteins for which some helices and the topology were predicted falsely. High values for false topology predictions occurred only for proteins for which also the model was predicted falsely (circles). B: Accuracy of topology prediction versus the cumulative percentage of proteins predicted with a reliability index RiS(T) n, n = 0 (low), 1, ..., 9 (high) . RiS3(T) 0 is the rightmost point representing 100% of the proteins. For example, more than 60% of all proteins were predicted with RiS3(T) 5 ; for 95% of these the topology and model were predicted correctly. C: The number of proteins predicted with a certain reliability index is shown to indicate that the drop of accuracy for RiS3(T) 6 (B) is partly due to low count rates.
|
|
Fig. 5: Helical transmembrane proteins for Haemophilus influenzae. A: Cumulative percentage of helical transmembrane proteins vs. number of HTM's predicted (total number of proteins 1616). We separated between predictions based on multiple alignments (expected accuracy higher; filled diamonds) and predictions based on single sequence information only (expected accuracy lower; open diamonds; sums over all proteins given as dotted line). For example, 10% of the 1616 Haemophilus sequences were predicted with at 5 transmembrane helices. B: Number of proteins predicted vs. number of predicted HTM's. Open bars give all proteins predicted with topology 'in' ; filled grey bars all proteins predicted with topology 'out' ; dark bars give the sum over both. |
Table 2. Accuracy of distinguishing proteins with and without transmembrane helices

(a) Methods: PHDhtm, J(strict) = 0.8 : strict decision threshold applied to PHDhtm output (designed to reduce false positives; eq. 5 ); PHDhtm, J(loose) = 0.7 , loose decision threshold (designed to include all possible helical membrane proteins; eq. 5 ); PHDhtm_fil , PHDhtm plus empirical filter; Jones et al., 1994 , statistics-based method for predicting transmembrane helices (Jones et al., 1994); Edelmann, 1993 , statistics-based prediction method (Edelman, 1993), the question mark indicates that published results for predicting membrane proteins are not based on cross-validation tests and thus are not comparable. Scores: N3(glob) , number of globular proteins, i.e. proteins without HTM's; E(glob) , percentage of proteins without HTM's for which HTM's were falsely predicted; N(memb) , number of proteins with HTM's; E(memb) , percentage of proteins with HTM's for which no HTM's were predicted. The following proteins without HTM's were predicted to contain HTM's by the strict threshold: 1bmdA, Oxidoreductase; 1pfiA, Viral coat protein; 1ribA, Reductase; 1spf, Lipoprotein; 1ytbA, TATA-box binding protein; 2mnr, Racemase; 2ohxA, Oxidoreductase.
(b) Results taken from literature (Edelman, 1993, Jones et al., 1994, Rost et al., 1995).
(c) Estimated error: x , where x was one standard deviation for a binomial distribution.
Total number of false classifications lower for strict threshold. The two decision thresholds introduced allow to focus on either predicting as many helical transmembrane proteins as possible (loose threshold, eq. 5 ) or on minimising the rate of false positives (strict threshold, eq. 5 ). The strict threshold was better in classifying proteins without HTM's (lower rate of false positives; higher rate of false negatives); the loose threshold in classifying proteins with HTM's (lower rate of false negatives; higher rate of false positives; Table 2). The strict threshold yielded a higher total error rate (false positives + false negatives = 3.9%) than the loose threshold (3.7%). However, for analysing a large number of proteins by an automatic prediction service (Rost et al., 1994a, Rost, 1996), e.g. entire genomes, the total number of falsely classified proteins would be lower for the strict than for the loose threshold as the number of proteins without HTM's is supposedly below 30%.
Refined version of PHDhtm compared favourably with other methods
Better prediction of topology. The final topology predictions were more than eight percentage points superior to the best alternative method for prediction of topology published when evaluated on an identical data set of 83 proteins (Jones et al., 1994); Table 1). An empirically derived method was evaluated on 24 bacterial inner membrane proteins by von Heijne (1992). A crucial idea of that method was to choose the predicted HTM's such that the charge difference became maximal. A similar algorithm, in our hands, resulted in significantly worse predictions than those obtained by the methods described here. The result published by von Heijne (1992) suggests a prediction accuracy of 96% for the correct prediction of all HTM's and topology. Leaving out the three proteins for which the assignments of HTM's published by von Heijne did not correspond to the SWISS-PROT assignments (cyoa_ecoli, cyoe_ecoli, uhpt_ecoli), we achieved the same accuracy on this specially selected data set.
Lower rate of false positives. Judging from the results published, the method of Jones et al. (1994) is the best in distinguishing between proteins with and without transmembrane helices. Our method tailored to manage this distinction yielded a lower error rate although based on a larger and more conservative data set (Table 2).
Analysing the entire Haemophilus influenzae genome
Most predictions based on single sequence information Prediction accuracy is significantly higher if the evolutionary information contained in multiple alignments is used as input to the neural network system PHDhtm (Rost et al., 1995). For 332 out of 1616 H.i. proteins we predicted at least one HTM. For 129 of the 332 predicted HTM proteins (40%), the prediction was based on alignments; for only 76 (23%!) predictions were based on multiple alignments containing at least 4 sequences (results for the 37 of these predicted to contain at least two HTM's in Table 3; for more details see the electronic appendix or (Rost WWW, 1996a)). About 80% of predicted membrane-bound proteins (238) were predicted to contain more than a single transmembrane helix (electronic appendix or (Rost WWW, 1996b)).
Fewer helical membrane proteins in Haemophilus influenzae than in yeast VIII. When subtracting the expected error rate for false positives (1.60.7%; Table 2) and adding the expected under-prediction of membrane proteins (2.31.5%; Table 2), the results suggested that about 19% of all H.i. proteins contain transmembrane helices; and about 16% more than one HTM. A similar analysis of the yeast VIII chromosome with our previous prediction method (PHDhtm_fil), predicted transmembrane helices for about 25% of the proteins; and about 16% with more than one HTM. Given the higher error rate for false positives of our previous method (Table 2), the results suggested that there are slightly more proteins with transmembrane helices in yeast VIII than in Haemophilus influenzae.
More proteins predicted with topology 'in'. About 57% of the proteins predicted with HTM's were predicted with topology 'in' (Fig. 5). Significant exceptions were proteins predicted with five and seven transmembrane helices, for which the topology 'out' dominated (Fig. 5). Interestingly, a higher percentage of the proteins predicted with topology 'out' had both terminal non-transmembrane regions on the outside (86 out of 144) than proteins predicted with topology 'in' (79 out of 188). In other words, proteins predicted with topology 'out' were more often predicted with an odd number of HTM's.
Discussion
Significant improvement of prediction accuracy by refinement algorithm. The segment optimising refinement of the profile-based neural network system PHDhtm proved to be successful in four ways. (1) Prediction accuracy was significantly better than for the simple neural network prediction; for about 89% (3.1%, one standard deviation) of the proteins all HTM's were correctly predicted (Table 1). (2) The refined version of PHDhtm was significantly more accurate at predicting all HTM's correctly than was the previously implemented empirical filter (Table 1). (3) The refinement algorithm was less sensitive to the choice of free parameters (eq. 2 ) than the empirical filter as the results were better for the double-blind set that was used after the methods had been set up (Table 1). (4) The reliability index defined for the final prediction (eq. 3 ) correlated well with prediction accuracy: for 65 of the 66 proteins (i.e. 98%) predicted with RiS(M) 3 all transmembrane helices were predicted correctly (Fig. 3).
Table 3. Proteins with transmembrane helices predicted for Haemophilus influenzae

We listed all proteins for which we predicted more than one transmembrane helix based on multiple sequence alignments (information for all 332 protein predicted in the electronic appendix or (Rost WWW, 1996b). Sequence names as in Fleischmann et al. (1995); alignments from 'http://cubic.bioc.columbia.edu/) , number of transmembrane helices predicted; Top , predicted topology; N-term , first ten residues of sequence; Segments , positions of predicted HTM's.
Prediction of topology better than 86% by combining refinement and positive-inside rule. The success of the refined version of PHDhtm showed most clearly for the prediction of topology (Fig. 1). (1) For more than 86% (3.1%, one standard deviation) of the proteins all transmembrane helices and the topology were predicted correctly (Table 1). (2) The limiting step for topology prediction was not the simplicity of the positive-inside rule: for 97% of the proteins for which all transmembrane regions had been predicted correctly, the positive-inside rule yielded the correct topology (Table 1). (3) The predicted reliability correlated well with accuracy: 83 proteins were predicted with a reliability 5 (eq. 4 ); for 79 of these the prediction was correct (Fig. 4). (4) Prediction accuracy was better than average for eukaryotic proteins (Table 1). (5) The final prediction of topology was significantly more accurate than the best alternative method published on a set of 83 eukaryotic and prokaryotic proteins (Jones et al., 1994). (6) A minor improvement in 1D accuracy resulted in a major improvement when using the 1D prediction to predict other aspects of protein structure. A similar effect, although less marked, is observed for prediction-based threading (Rost, 1995). One of the reasons for this effect is that the refinement algorithm successfully used information not local in sequence, i.e., extending over the windows of 17 adjacent residues input to the neural network system.
Reduction of false positives below 2% by evaluating strongest HTM. The analysis of entire genomes requires an accurate distinction between proteins with and without transmembrane helices. Here we introduced an algorithm that distinguished the two classes based on a single helix for which PHDhtm predicted the highest average propensity (eq. 5 ). Less than 4% of the proteins were classified falsely by this procedure. In particular, for only 1.6% (0.7%, one standard deviation) from a large set of unique proteins (435) we falsely predicted transmembrane helices (false positives; Table 2). This was significantly better than our previous method and results published by others (Edelman, 1993, Jones et al., 1994). Lower rates of false positives implied higher rates of false negatives (proteins with HTM's that were not detected). The balance between the two can be shifted by switching between a strict threshold (1.6% false positives; 2.3% false negatives) and a loose threshold (3.7% false positives; 0% false negatives).
Method available by automatic prediction service. The refinement of PHDhtm and the topology prediction is available via an automatic prediction service (Rost et al., 1994a, Rost, 1996); for information send the word help to the internet address PredictProtein@columbia.edu , or use the World Wide Web (WWW) site http://cubic.bioc.columbia.edu/predictprotein/ ). Alternative models are provided to enable expert users to focus on more reliably predicted HTM's. Note that it may lead to errors in predicting topology if the sequence starts or ends with HTM regions.
Haemophilus influenzae, an organism with few helical transmembrane proteins? Finally, we scanned the entire Haemophilus influenzae genome (Fleischmann et al., 1995) for helical membrane proteins (CPU time for prediction: several hours on a SUN SPARC10). Given the error rate in distinguishing between proteins with and without HTM's (Table 2), the results suggested that about 19% of the H.i. proteins contain transmembrane helices; and about 16% more than one HTM. These numbers were clearly lower than those previously (Rost et al., 1995) obtained for the entire yeast VIII chromosome (>25%). Will the difference in the percentage of helical membrane proteins between yeast and Haemophilus influenzae hold up for the entire genomes? And how about the percentage of helical membrane proteins for other organisms? The tool to answer by dissecting genomes as they are being sequenced is set up.
Methods
Database and evaluation of method
Selection of proteins. We based our analyses on proteins for which experimental information about the locations of transmembrane helices is annotated in the SWISS-PROT database (Manoil & Beckwith, 1986, von Heijne & Gavel, 1988, von Heijne, 1992, Sipos & von Heijne, 1993, Bairoch & Boeckmann, 1994, Jones et al., 1994). The proteins were chosen to meet two criteria: (1) reliability: experimental information should be as reliable as possible (Manoil & Beckwith, 1986, von Heijne, 1992); (2) comparability: the data set should be similar to those used by others (Jones et al., 1994). For the few known 3D structures, locations of transmembrane helices were taken from DSSP (Kabsch & Sander, 1983). For all others, locations of transmembrane helices are often controversial. For making the results easily reproducible for others, we decided to always use the definitions found in SWISS-PROT (Bairoch & Boeckmann, 1994). Locations and topology used are listed in the electronic appendix and on WWW (Rost WWW, 1996a).
Cross-validation test. For the prediction of transmembrane propensities by the neural network system (PHDhtm, (Rost et al., 1995)), the cross-validation set of 83 transmembrane proteins was divided into 66 proteins used for training; and 17 for evaluating the results (test set). This was repeated five times (five-fold cross-validation), until each protein had been in a test set once. The sets were separated such that no protein in the multiple alignments used for training had more than 25% pairwise sequence identity to any protein in the multiple alignments of the test proteins. The cross-validation procedure yields estimates for prediction accuracy that are likely to hold for proteins of yet unknown topology (Rost & Sander, 1993a, Rost & Sander, 1995).
Double-blind set. Although rigorous cross-validation experiments may yield sufficiently reliable estimates prediction accuracy, prediction methods should always be evaluated additionally in a double-blind experiment which proceeds in the following manner. First, the prediction method is developed and evaluated in a cross-validation experiment. Second, all parameters are frozen and the method is tested on a new set (double-blind set) of proteins which were not used before (ideally until the day the paper is submitted). We implemented this concept by: (1) optimising free parameters on a subset of 10 proteins (chosen at random from cross-validation set); (2) compiling prediction accuracy by the cross-validation experiment; (3) evaluating the method on an additional double-blind set which was not used before. The double-blind set was selected by applying two criteria: (i) the entries for the locations of transmembrane helices and topology should be labelled as 'probable' by the SWISS-PROT notation; and (ii) for similar proteins of different species only one protein was taken. The 48 proteins used as double-blind set are listed in the electronic appendix or (Rost WWW, 1996a); all taken from SWISS-PROT release 32 (Bairoch & Boeckmann, 1994) that met those criteria and were not already contained in our cross-validation set.
Data for the Haemophilus influenzae genome. To illustrate the usefulness of our method, we report 332 'blind predictions' listing all proteins likely to contain transmembrane helices for the entire Haemophilus influenzae (H.i.) genome. The sequences of the H.i. genome were taken from the TIGR Internet server (Fleischmann et al., 1995). The multiple sequence alignments for some of the 1616 protein sequences of H.i. are publicly available (Casari et al., 1995).
Measuring prediction accuracy. In contrast to globular proteins for which the definition of segment-based scores for prediction accuracy is problematic (Rost et al., 1994b), evaluating methods predicting transmembrane helices is relatively straightforward. Here we regarded a transmembrane helix to be predicted correctly, if the overlap between observed and predicted helix was, at least, five residues.
Prediction methods
The dynamic programming-like algorithm and the prediction of topology are conceptually simple methods. Here we focused on describing the main idea of both methods and attempted to provide the details to the extent to make the work reproducible. A mathematically more explicit description is given elsewhere (Rost et al., 1996). The elements of the method introduced here were presented in more detail. These were the definitions of empirical reliability indices (1) for the prediction of the refined model, and (2) for the topology prediction; and (3) the new method to distinguish proteins with and without transmembrane helices.
Neural network predictions of transmembrane preferences. Input for the refinement algorithm was the output of the profile-based neural network system PHDhtm (Fig. 2; (Rost et al., 1995)). The output of the networks consists of two values for each residue, giving the preferences of that residue to be in a transmembrane helix (H) or in a region outside of the lipid bilayer (L).
Finding the optimal path through all predicted propensities (dynamic programming). The simplest way to derive predictions for helix locations from network preferences is to predict each residue to be in the state (H or L) with largest preference (winner-takes-all decision). The problem of this approach that resulting transmembrane helices were too long was corrected by an empirical filter chopping too long helices into several shorter ones (Rost et al., 1995). A less arbitrary alternative for generating predictions from preferences is to find the optimal positioning of transmembrane helices compatible with the network output (a similar dynamic programming method has been implemented for topology prediction by Jones et al., 1994). Since transmembrane helices are observed to extend over about 18-25 residues, all possible transmembrane helices (HTM's) can be enumerated. The dynamic programming-like algorithm was implemented by the following steps (Fig. 6).
(1) Convert network output to propensity: The preferences (from PHDhtm) were normalised to propensities to yield preference H + preference L = 1 for each residue.
(2) Compile pool of possible HTM's: The average propensity per helix was computed for all possible HTM's. Note that the number of possible helices is usually much larger than the number of residues (Fig. 7).
(3) Generation of models with increasing number of HTM's: Starting from the assumption that the protein contained no HTM (=0), we successively picked
|
|
Fig. 6. Refinement of PHDhtm and prediction of topology. Refinement: the dynamic programming algorithm comprised the following three steps. (1) Compilation of pool of possible HTM's: for all possible transmembrane helices the preferences from the neural network output were summed over 18-25 residues; the results were stored (shown for best eight HTM's). (2) Generation of models with increasing number of HTM's: all possible models containing successively more helices, i.e. = 0, 1, ..., n, were generated by selecting at each step that helix from the pool with maximal sum and no overlap to any of the helices added at previous steps ' < . (3) Selection of best model: finally the model with maximal sum over the network preferences was selected as prediction (here =3). Topology prediction: The number of positively charged residues (R: Arginine; K: Lysine) was summed separately over all odd (first, third, ...) and over all even (second, fourth, ...) non-transmembrane regions of the optimal model (highest sum over neural network preferences, here =3). The final prediction of topology was assigned according to the sign of the difference between the number of charged residues in odd and even regions. For example, for a positive difference, the first residues of the protein N-term were predicted as starting on the extra-cytoplasmic side. |

Fig. 7. Explicit example for the refinement algorithm. For simplicity the following unrealistic parameters were used: minimal length of HTM = 3 residues; maximal length of HTM = 4 residues (for the real implementation we used 18 residues for the minimal and 25 for the maximal length; eq. 2 ). A: Output from PHDhtm for a sequence of ten residues converted to the propensities for each residue to be in a transmembrane helix (H) or not (L). B: Pool of all possible HTM's (A-O) of length 3 and 4; given are the numbers for the N- and C-term and the average helix propensity for each HTM. C: Starting from the model with no HTM (=0), successively the best HTM's are added; given the number of helices, the final prediction for all ten residues and the resulting score for that model (eq. 1 ). D: Best model is the one with =2 HTM's.
(4) Selection of best model: The final prediction was the model with highest sum over all propensities. The score PSDO3() for the model with helices was defined by:

Predicting topology based on the positive-inside rule. Gunnar von Heijne established that membrane proteins of certain species contain more positively charged residues (Arginine and Lysine) on the intra-cytoplasmic side of the membrane than on the extra-cytoplasmic side (von Heijne & Gavel, 1988, von Heijne, 1989, Nilsson & von Heijne, 1990, von Heijne, 1992). Indeed, the rule was valid for more than 95% of the proteins in our data sets (data not shown). The application of this rule to the models obtained by PHDhtm (no filter), PHDhtm_fil, or the refined version of PHDhtm_ref, required three steps (Fig. 6).
(1) Compiling the positive charges: The positive charges C were compiled as percentages of positively charged residues (R and K) present in the entire sequence alignment of the protein. The percentages were summed separately for even and odd loop regions. (Note: for globular regions of more than 60 residues, we included only the 25 residues on the terminal sides.)
(2) Computing the charge difference: The charge difference was compiled by subtracting positive charges of odd loop regions from positive charges of even regions (C ).
(3) Prediction according to sign of charge difference: If the charge difference was negative (C 0 ) the first loop was predicted to be extra-cytoplasmic; if it was positive (C > 0 ) to be intra-cytoplasmic.
Reliability index for predicting topology. The underlying hypothesis for defining a reliability index for the predicted topology was that the reliability would be proportional to the charge difference. We empirically favoured the following definition:
( 4 )
where INT(x) was the integer value of x , min{x,y} the minimum of x and y , and |C| the absolute value of the charge difference. The definition normalises the reliability index to values between 0 (unreliable) and 9 (reliable).
Distinguishing proteins with and without HTM's based on strongest HTM. Predictions of transmembrane helices could be used to keep track with the flow of genome data (Oliver et al., 1992, Johnston et al., 1994, Fleischmann et al., 1995) by quickly scanning entire genomes for possible membrane associated proteins. For this purpose we need methods to distinguish between proteins with and without transmembrane helices. Previously, we used the empirical filter to accomplish the distinction (Rost et al., 1995, Rost, 1996). The segment-oriented refinement algorithm provided an alternative solution to the problem that was applied to PHDhtm network output by the following three steps.
(1) Converting output to propensities: For all residues the neural network output was converted to propensities (i.e., preference H + preference L = 1).
(2) Compiling propensity for best HTM: We scanned the protein for the segment of 18 (minimal length of HTM, eq. 2 ) consecutive residues with the maximal HTM propensity.
(3) Applying decision thresholds: Finally, we predicted the protein to be globular if the average propensity for the best HTM was below a decision threshold J.
We introduced two different thresholds for the decision to address two different possible goals of the user. (i) As many as possible helical membrane proteins should be found with as few as possible false positives (JSUP5(strict)). (ii) All helical membrane proteins should be found even at the expense of including many false positives in the list (JSUP5(loose)). The following values were used:
( 5 )
Results will be given for both constants (Table 2).
Acknowledgements
BR thanks Chris Sander (EBI Cambridge) for financial support; Reinhard Schneider (EMBL Heidelberg) for providing the latest version of the alignment program MaxHom; Antoine de Daruvar (EMBL Heidelberg) for his assistance in running the PredictProtein service; and the GeneQuiz consortium (Heidelberg-Madrid-Menlo Park-Cambridge) - especially Georg Casari and Reinhard Schneider (EMBL Heidelberg) - for the H.i. sequence alignments. Last, but not least, thanks to all those who deposit experimental results in public databases and who maintain these databases, in particular thanks to Amos Bairoch (Basel) and colleagues: the results of this work crucially depended on the quality of SWISS-PROT.
References