| Title: | The PredictProtein server |
| Author: | Burkhard Rost , Guy Yachdav & Jinfeng Liu |
| Quote: | Nucleic Acids Research 2004,32:W321-W326. |
The PredictProtein server
| 1 | CUBIC, Dept. of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA |
| 2 | Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA |
| 3 | North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA |
| * | Corresponding author: cubic@cubic.bioc.columbia.edu URL http://cubic.bioc.columbia.edu/ Tel: +1-212-305-4018, fax: +1-212-305-7932 |
This article is published in (Nucleic Acids Research, issue, date and pages) © copyright Oxford University Press (2004). OUP is the only authorised source. All copying of this article including placing on another website requires the written permission of the copyright owner.
PredictProtein (PP, http://www.predictprotein.org) is an Internet service for sequence analysis and the prediction of protein structure and function. Users submit protein sequences or alignments; PredictProtein returns multiple sequence alignments, PROSITE sequence motifs, low-complexity regions (SEG), nuclear localisation signals, regions lacking regular structure (NORS) and predictions of secondary structure, solvent accessibility, globular regions, transmembrane helices, coiled-coil regions, structural switch regions, disulfide-bonds, sub-cellular localization, and functional annotations. Upon request fold recognition by prediction-based threading, CHOP domain assignments, predictions of transmembrane strands and inter-residue contacts are also available. For all services, users can submit their query either by electronic mail, or interactively from World Wide Web.
Key words: sequence analysis, prediction of protein structure and function, globularity, protein domains, secondary structure, solvent accessibility, multiple alignments, transmembrane helices.
PredictProtein (PP) is an automatic service that searches up-to-date public sequence databases, creates alignments, and predicts aspects of protein structure and function. Users send a protein sequence and receive a single file with results from database comparisons and prediction methods. PP went online in 1992 at the European Molecular Biology Laboratory (EMBL, Heidelberg), since 1999 it operates from Columbia University (New York). Although many servers have implemented particular aspects, PP remains the most widely used public server for structure prediction: over 1.5 million requests from users in 104 countries have been handled; over 13,000 users submitted ten or more different queries. PP web pages are mirrored in 17 countries on four continents. Our goal has always been to develop a system optimised to meet the demands of experimentalists not experienced in bioinformatics. This implied that we focused on incorporating only high-quality methods, and tried to collate results omitting less reliable or less important ones.
Attempt to simplify output by incorporating hierarchy of thresholds. The attempt to 'pre-digest' as much information as possible to simplify the ease of interpreting the results is another unique pillar of PP. For example, by default PP returns only those proteins found in the database that are very likely to have a similar structure as the query protein [1] . Particular predictions such as those for membrane helices, coiled-coil regions, signal peptides, nuclear localization signals are not returned if found below given probability thresholds. Over the years, we have added so many methods into the output of PP that our original goal 'easy-to-interpret' is challenged. We hope that a variety of improvements in the near future will reduce this problem.
Each request triggers the application of over 20 different methods. Currently, users receive a single output file with the following results (some of these are optional, Table 1 ). Database searches: similar sequences are reported and aligned by a standard, pairwise BLAST [2] , an iterated PSI-BLAST search [3] , and by the dynamic-programming method MaxHom [4] . While the pairwise BLAST searches are identical to those obtainable from the NCBI site, the iterated PSI-BLAST is performed on a carefully filtered database to avoid accumulating false positives during the iteration [5, 6] . The dynamic-programming method MaxHom is only available through PP. Additionally database searches comprise a standard BLAST-based search through ProDom [7] and a standard search for functional motifs in the PROSITE database [8] . PP now also identifies putative boundaries for structural domains through the CHOP-procedure (below). Optionally, users can request searches for remotely similar proteins by the prediction-based threading method TOPITS+ [9, 10] . Structure prediction methods: secondary structure, solvent accessibility, and membrane helices predicted by the PHD and PROF programs [11, 12, 13] , membrane strands predicted by PROFtmb [14] , coiled-coil regions by COILS [15] , bonded cysteine residues by CYSPRED [16] , and inter-residue contacts through PROFcon08 [17] . Putative structural switching regions are detected by the program ASP [18, 19] , low-complexity regions are marked by SEG [20] , and long regions with no regular secondary structure are identified by NORSp [21, 22] . The PHD/PROF programs and TOPITS are only available through PP. The particular way in which PP automatically iterates PSI-BLAST searches and the way in which we decide what to include into sequence families is also unique to PP. The particular aspects of function that are currently embedded explicitly into PP are all somehow related to sub-cellular localization: we detect nuclear localization signals through PredictNLS [23, 24] , Endoplasmic reticulum and Golgi related signals through another in-house data set [25] , predict localization independent of targeting signals through LOCnet [26] , and annotate homology to proteins involved in cell-cycle control [27] .
| Method | Task | Main Author(s) | Quote |
| Database | |||
|
Swiss-Prot * |
annotated protein sequences | A Bairoch (SIB) & R Appweiler (EBI) | [47] |
| TrEMBL * | raw protein sequences | R Appweiler (EBI) | [47] |
| PDB * | protein structures | P Bourne (UCSD) | [48] |
| BIG | non-redundant combination of Swiss-Prot, TrEMBL, PDB | D Przybylski (Columbia) | [6] |
| Alignment | |||
| MaxHom | dynamic programming, multiple alignment | R Schneider (LION) and C Sander (Sloan Kettering) | [4] |
| BLASTP * | pairwise alignment | S Karlin & S F Altschul (NCBI) | [2] |
| PSI-BLAST * | profile based alignment | S F Altschul (NCBI) | [3] |
| HMMer * | Hidden Markov model search | S Eddy (Washington University) | [38] |
| TOPITS | prediction-based threading | B Rost | [9, 49, 50] |
| Protein domains and unusual regions | |||
| ProDom * | structural domain-like regions | F Corpet, F Servant, J Gouzy & D Kahn (Toulouse) | [51] |
| Pfam-A * | protein families | A Bateman (Sanger) et al. | [37] |
| CHOP | structural domain-like fragments | J Liu (Columbia) | [35] |
| SEG * | low-complexity regions | J C Wootton & S Federhen (NCBI) | [20] |
| NORSp | floppy regions | J Liu & B Rost | [21, 22] |
| Protein structure | |||
| PHDsec | secondary structure | B Rost | [52, 53, 11] |
| PHDacc | solvent accessibility | B Rost | [54, 11] |
| PHDhtm | membrane helices | B Rost | [55, 11, 56] |
| PROFsec | secondary structure | B Rost | [12] |
| PROFacc | solvent accessibility | B Rost | unpublished |
| GLOBE | globularity | B Rost | unpublished |
| COILS | coiled-coiled regions | A Lupas (TŸbingen) | [57] |
| CYSPRED * | disulfide-bonds | P Fariselli & R Casadio (Bologna) | [16] |
| ASP | structural switches | M Young & S Highsmith (Sandia) | [19] |
| PROFcon08 | inter-residue contacts | M Punta (Columbia) | [17] |
| PROFtmb | membrane barrels | H Bigelow | [14] |
| Protein function | |||
| PredictNLS | nuclear localisation signals | R Nair, M Cokol & B Rost (Columbia) | [23, 24] |
| PROSITE * | functional sequence motifs | K Hofmann, P Bucher & A Bairoch (SIB) | [8] |
| LOCnet | prediction of sub-cellular localization | R Nair | [26] |
| Tools integrated into PP | |||
| MView * | HTML alignment viewer | N Brown | [58] |
| ESPript * | Ready-to-publish alignments and predictions | P Gouet & E Courcelle (IPS Toulouse) | [59] |
* Original URLs:
Swiss-Prot: http://www.expasy.org/sprot/
TrEMBL http://www.ebi.ac.uk/trembl/
PDB: http://www.rcsb.org/pdb/
BLASTP/PSI-BLAST: http://www.ncbi.nlm.nih.gov/BLAST/
HMMer: http://hmmer.wustl.edu/
ProDom: http://protein.toulouse.inra.fr/prodom.html
Pfam-A: http://www.sanger.ac.uk/Software/Pfam/
SEG: http://trex.musc.edu/manuals/unix/seg.html
CYSPRED: http://prion.biocomp.unibo.it/cyspred.html
PROSITE: http://www.expasy.org/prosite/
MView: http://mathbio.nimr.mrc.ac.uk/~nbrown/mview/
ESPript: http://prodes.toulouse.inra.fr/ESPript
A detailed review about the strengths, weaknesses, and pitfalls of the many methods applied by PP is far beyond the scope of this description. We give only a brief overview over trends in the following.
(i) Alignment methods: while the dynamic programming method MaxHom still appears best in aligning pairs of proteins, the iterated PSI-BLAST tends to be more sensitive in unravelling more distantly related proteins and also in correctly aligning them provided the underlying profiles contain enough information. Note, however, that PSI-BLAST tends to over-estimate the relevance of short matches, and that PSI-BLAST expectation values have to be viewed with extreme caution when inferring similarity in function [28, 29, 30] .
(ii) Protein domains and unusual regions: like for instance SMART [31] , ProDom tends to identify regions that are significantly shorter than structural domains [32] ; this is not the case for CHOP. However, CHOP misses many domain boundaries since it heavily relies on similarities to domains annotated by others (PrISM, Pfam-A). Note that short regions of low-complexity (SEG) are fairly common and not necessarily informative.
(iii) Protein structure (EVA [33] for an up-to-date evaluation of structure prediction): (a) PROFsec secondary structure prediction: on average, 76% of all residues are correctly predicted (only about 71% by PHDsec). (b) PROFacc accessibility prediction: almost 80% of all residues are correctly predicted as either buried or exposed, and over 80% of the surface residues are correct. (c) PHDhtm membrane helix prediction: about 80% of the membrane helices are correctly predicted, for about 66% of all tested proteins all membrane helices and the topology was correctly predicted [34] ; at the default threshold, membrane helices are incorrectly detected in about 2% of the tested globular proteins [34] ; about one fourth of all signal peptides (for secreted proteins) are mistaken for membrane helices [34] . (d) PROFtmb membrane barrel prediction: at high levels of reliability PROFtmb never confuses proteins with and without membrane strands; over 80% of the membrane strand residues are correctly predicted; up- and down-strands are rarely confused. (e) GLOBE: not accurate enough to identify domain boundaries. GLOBE often correctly captures trends such as 'very unlike a globular protein'. Multi-domain proteins with globular and non-globular domains - such as NORS regions - are misclassified, however. (f) COILS: perceived to be correct most of the time. (g) CYSPRED: most disulfide-bonding residues are correctly identified, however, most predicted bonds are wrong. (h) ASP: if the protein has a structural switching region, this is usually detected correctly. (i) PROFcon08: most inter-residue contacts that are predicted are wrong, in fact, even at a coverage of 10%, only 27-40% of all contacts are correctly predicted. (j) NORSp predictions of non-regular regions: so far there is no example of a protein with regular structure that we predicted to be irregular. Note the PROF and PHD series and CYSPRED all are based on artificial neural network systems (except for PROFtmb which is based on a Hidden Markov model).
(iv) Protein function: our signal-motif based predictions reach levels of accuracy from as high as close to 100% (NLS) to as low as 50% (Endoplasmic reticulum and Golgi apparatus). Homology-transfer and keyword based annotations are returned at levels above 70% accuracy. Our system for de novo prediction of sub-cellular localization reaches levels around 60% accuracy (extra-cellular space, cytoplasm, nucleus, mitochondria, other).
CHOP [35] is a hierarchical procedure that chops proteins into structural domain-like fragments through similarity to domains of known structure (taken from PrISM [36] ), or to Pfam-A domain-like fragments [37] (searches through HMMer [38] ), or to full-length natively expressed proteins taken from Swiss-Prot [39] . The major mistakes of CHOP result from incorrect original annotations (in PrISM or Pfam-A). The major shortcoming is that the procedure misses many domains that have no significant level of sequence similarity to known domain-like fragments. CHOP is currently an option, i.e. not run by default.
PROFtmb predicts beta-barrel membrane proteins, their topology and the residues in membrane strands (in four states. The method is so accurate in distinguishing proteins with and without beta-membrane barrels that at the default threshold we do not expect any error [14] . Over 80% of the residues are correctly classified in one of the four states up- and down-strand, inner- and outer-loop. PROFtmb is currently not run by default.
PROFcon08 appears to be one of the most accurate existing methods in predicting inter-residue contacts [17] . However, this comes with a caveat: most non-local contacts predicted are not observed, and most observed contacts are not predicted. As a rule of thumb: if we predict 1/10 of the observed contacts 1/3 of our predictions are right. PROFcon08 is currently not run by default.
We built a database of proteins involved in cell cycle control (CellCycleDB [27] ). We used this database to estimate problem-specific levels of accuracy and coverage in homology-transfer of experimental information. These estimates allow the controlled, automatic search with proteins against CellCycleDB. This search is currently not run by default.
LOCnet appears to be the most accurate, general method for the de novo prediction of sub-cellular localization with a four-state accuracy around 65% [26] . Performance is best for extra-cellular and worst for mitochondrial proteins. LOCnet is currently not run by default.
CHOPnet is a neural network-based method for the de novo prediction of structural domains in fragments that could not be treated by CHOP [40] . The method correctly predicts about 55% of all known two-domain proteins to have two domains; for about one half of these the domain boundary is correctly placed within 20 residues of the observed boundary. Performance is worse for proteins with more than two domains. However, by pre-digesting the query with CHOP, in many cases the task for CHOPnet will resemble the prediction of single or two-domain proteins (for which the prediction accuracy is reasonably high).
ISIS is a method that specifically predicts residues involved in transient, external protein-protein interactions [41, 42] . The current system is based on neural networks that use information from alignments and other prediction methods. The method returns predictions at different levels of accuracy/coverage: at 5% coverage the accuracy reaches about 60%.
LOCi is a hierarchical system that predicts sub-cellular localization through a variety of sources, namely through homology to proteins of experimentally known localization (LOChom [29, 43] ), through Swiss-Prot keyword searches (LOCkey [44] ), localization signals (SignalP [45] , TargetP [46] , PredictNLS [23, 24] ), and a combination of de novo prediction methods based on Support Vector machines and neural networks (Nair & Rost, unpublished). Prediction accuracy exceeds 70% making the method become the most comprehensive and most accurate means of predicting sub-cellular localization.
Default output. The output format is self-documenting. The output contains: (i) a list of likely homologues found in the protein database (BIG) and - upon request - the multiple alignments of these sequences (by default in 'HTML' format from MView). Note that we now have switched to no longer return the entire PSI-BLAST alignments by default since these often are of considerable size. (ii) If found: a list of the putative PROSITE motifs. (iii) If found: a list of ProDom and/or CHOP domain-like fragment assignments. (iv) If found: a prediction of coiled-coil regions. (v) Information about the expected levels of accuracy of structure predictions. (vi) Prediction of aspects of protein structure. These are grouped in the following way: (a) Prediction of secondary structure for all residues, (b) prediction of secondary structure for reliably scored residues only, with an expected three-state accuracy for these residues of > 85%, (c) prediction of solvent accessibility for all residues, (d) prediction of solvent accessibility for reliably scored residues only, with an expected correlation between experimental observation and prediction of 0.69, (e) prediction of transmembrane helices and their topology (if any detected). Note: for the prediction of transmembrane helices and strands a conservative threshold is chosen. Thus, a membrane segments may not be detected using the default parameter settings.
Advanced input options. By default users submit proteins through its one-letter residue sequence. However, PP also accepts submissions in FASTA, PIR and Swiss-Prot format or through the Swiss-Prot identifier. Most predictions methods applied use the information from the multiple alignments created by PP; prediction accuracy increases with the quality of the alignment. PP's alignments are fully automated, thus may not be as accurate as the alignment that experts have hand-edited. Therefore, users may also submit their favourite alignment directly. PP accepts alignments as FASTA lists, PIR lists, as well as in SAF and MSF format. The fold recognition/prediction-based threading method TOPITS uses predictions of secondary structure and solvent accessibility to search through a library of proteins of known structure. Predictions can be submitted through a simple column-based format.
Advanced prediction/job options. Not all methods are executed by default; some methods (like the prediction of membrane helices) use particular 'conservative' thresholds when included automatically and different thresholds when requested explicitly. In particular, the following methods can be toggled (switch on or off): MaxHom, BLASTP, PSI-BLAST, SEG, PHDsec, PHDacc, PHDhtm, PROFsec, PROFacc, COILS, CYSPRED, ASP, PROSITE, ProDom, CHOP, NORSp, PROFtmb, PROFcon08, LOCkey, LOChom, PredictNLS, LOCnet. Users can also explicitly request TOPITS+ or can evaluate the prediction accuracy of a secondary structure prediction method (EvalSec). Note that switching off methods has two advantages: it speeds up the execution and it reduces the size of the output. However, bear in mind that the database searches and their results are the limiting factor for speed and bytes produced.
Advanced output options. The default output now is an HTML formatted file, i.e. ready to display in any browser. Users can change this default to output in raw text in the following alignment formats: BLAST, no alignment, HSSP, HSSP profiles only, MSF, SAF, FASTA list. The results from the predictions are also available in a variety of machine-readable formats. (Developers: please do not write parsers for the human-readable PP output; in doubt, contact us, we can write almost any reasonable format if need be!) Due to the size of multiple alignments, we no longer email the results rather the output will be stored for a week on our web site (remember to download it in that period). Results can also be requested by email.
Interactive versus batch jobs. By default, the user submits requests to a batch queue and will be notified by email where to find the results (or will be sent these results). While PP also has an interactive mode that will write the results directly into the requesting web browser, this option comes with a restriction in the length of time for which the web connection is kept open: if PP has not completed a request within five minutes, we automatically switch the job to a batch mode and notify users by email. In practise, this implies that interactive jobs will only finish in time if (i) the PP queue is empty (works on a first-come-first-serve principle) and (ii) that the request does not require more than five minutes of CPU (typically the case if an alignment is submitted, and/or the query protein is short, and/or has few homologues in today's databases). We have just upgraded the CPU resources for PP (now running on a LINUX farm); this has increased the probability of successful interactive queries.
Job queuing system. In order to maximise processing usage, requests to PP are queued and maintained by a mechanism that balances the work load by monitoring the status of the 10 CPUs currently dedicated to the server in normal operation. Users can query job and overall workload statuses through the web interface.
Portable versions. Most in-house programs are - or will be - available under general GNU licenses (free for academia). Porting the entire PP system is a more complicated enterprise. We are currently optimising the system to increase its portability. It is now available for local LINUX and IRIX installations. Furthermore, to make the system less bound to local OS and hardware constraints, future plans include decoupling some of the core services from the rest of the system and handling communication using innovative technologies such as XML-RPC or SOAP.
Acknowledgements
Making PredictProtein survive a decade was a major effort; many colleagues helped with hands and brains; thanks to all of them! For the first years at EMBL thanks to Antoine de Daruvar (Bordeaux Univ.), Reinhard Schneider (EMBL, Heidelberg), Sean O'Donoghue (LION Biosciences, Heidelberg), and Chris Sander (Sloan Kettering, New York). Thanks to Rolf Appweiler for his continued support at the European Bioinformatics Institute (EBI-EMBL, Hinxton, England), and to Volker Eyrich (Schršdinger, New York) for software support during the move to the USA. Further thanks to all who set up mirror pages and who consented to using their software in particular to Nigel Brown for MView, to Emmanuel Courcelle and Patrice Gouet (IPBS, Toulouse) for ESPript, to Florencio Pazos (London) for Threadlize, to Andrei Lupas (Max Planck, TŸbingen) for COILS, to Piero Fariselli and Rita Casadio (Bologna Univ.) for CYSPRED, to Reinhard Schneider (EMBL, Heidelberg) for MaxHom, to Malin Young (Sandia Labs, Albuquerque) for ASP, and to Rajesh Nair (Columbia Univ.) for his methods predicting sub-cellular localization, and to Dariusz Przybylski (Columbia) for his invaluable scripts optimising automatic PSI-BLAST searches. PredictProtein has attracted its first public support from the grant R01 LM07329-01 from the National Library of Medicine. Last, not least, thanks to Amos Bairoch (SIB, Geneva), Rolf Apweiler (EBI, Hinxton), Cathy Wu (PIR/PSD), Phil Bourne (San Diego Univ.), and their crews for maintaining excellent databases and to all experimentalists who enable computational biology!
| 1. | Rost, B. (1999). Twilight zone ofprotein sequence alignments. Prot. Engin., 12, 85-94. |
| 2. | Altschul, S. F. & Gish, W. (1996).Local alignment statistics. Meth. Enzymol.,266, 460-480. |
| 3. | Altschul, S., Madden, T., Shaffer, A.,Zhang, J., Zhang, Z. et al. (1997). Gapped Blast and PSI-Blast: a newgeneration of protein database search programs. Nucl. Acids Res., 25, 3389-3402. |
| 4. | Sander, C. & Schneider, R. (1991).Database of homology-derived structures and the structural meaning of sequencealignment. Proteins, 9, 56-68. |
| 5. | Jones, D. T. (1999). Protein secondarystructure prediction based on position-specific scoring matrices. J. Mol.Biol., 292, 195-202. |
| 6. | Przybylski, D. & Rost, B. (2002).Alignments grow, secondary structure prediction improves. Proteins, 46, 195-205. |
| 7. | Corpet, F., Gouzy, J. & Kahn, D.(1999). Recent improvements of the ProDom database of protein domain families. Nucl.Acids Res., 27,263-7. |
| 8. | Hofmann, K., Bucher, P., Falquet, L.& Bairoch, A. (1999). The PROSITE database, its status in 1999. Nucl.Acids Res., 27,215-219. |
| 9. | Rost, B. (1995). TOPITS: ThreadingOne-dimensional Predictions Into Three-dimensional Structures. In ThirdInternational Conference on Intelligent Systems for Molecular Biology(Rawlings, C., Clark, D., Altman, R., Hunter, L., Lengauer, T. et al., eds.),pp. 314-321, Menlo Park, CA: AAAIPress, Cambridge, England. |
| 10. | Przybylski, D. & Rost, B. (2004).Improving fold recognition without folds. J. Mol. Biol.,2004-02-03. |
| 11. | Rost, B. (1996). PHD: predictingone-dimensional protein structure by profile based neural networks. Meth.Enzymol., 266,525-539. |
| 12. | Rost, B. (2001). Protein secondarystructure prediction continues to rise. J. Struct. Biol., 134, 204-218. |
| 13. | Rost, B. (2004). How to use protein 1Dstructure predicted by PROFphd. Meth. Mol. Biol.,submitted. |
| 14. | Bigelow, H., Petrey, D., Liu, J.,Przybylski, D. & Rost, B. (2004). Prediction of transmembrane beta-barrelsfor entire proteomes. Nucl. Acids Res.,submitted 2004-01-21. |
| 15. | Lupas, A., Van Dyke, M. & Stock,J. (1991). Predicting coiled coils from protein sequences. Science, 252, 1162-1164. |
| 16. | Fariselli, P., Riccobelli, P. &Casadio, R. (1999). Role of evolutionary information in predicting thedisulfide-bonding state of cysteine in proteins. Proteins, 36, 340-346. |
| 17. | Punta, M. & Rost, B. (2004).Toward good 2D predictions in proteins. Bioinformatics,2004-01-14. |
| 18. | Kirshenbaum, K., Young, M. &Highsmith, S. (1999). Predicting allosteric switches in myosins. Prot. Sci., 8, 1806-1815. |
| 19. | Young, M., Kirshenbaum, K., Dill, K.A. & Highsmith, S. (1999). Predicting conformational switches in proteins. Prot.Sci., 8, 1752-1764. |
| 20. | Wootton, J. C. & Federhen, S.(1996). Analysis of compositionally biased regions in sequence databases. Meth.Enzymol., 266,554-571. |
| 21. | Liu, J., Tan, H. & Rost, B.(2002). Loopy proteins appear conserved in evolution. J. Mol. Biol., 322, 53-64. |
| 22. | Liu, J. & Rost, B. (2003). NORSp:predictions of long regions without regular secondary structure. Nucl. AcidsRes., 31, 3833-3835. |
| 23. | Cokol, M., Nair, R. & Rost, B.(2000). Finding nuclear localisation signals. EMBO Rep., 1, 411-415. |
| 24. | Nair, R., Carter, P. & Rost, B. (2003).NLSdb: database of nuclear localization signals. Nucl. Acids Res., 31, 397-399. |
| 25. | Wrzeszczynski, K. O. & Rost, B.(2004). Annotating proteins from Endoplasmic reticulum and Golgi apparatus ineukaryotic proteomes. CMLS,submitted. |
| 26. | Nair, R. & Rost, B. (2003). Betterprediction of sub-cellular localization by combining evolutionary andstructural information. Proteins, 53, 917-930. |
| 27. | Wrzeszczynski, K. O. & Rost, B.(2003). Cataloguing proteins in cell cycle control. In Cell cycle checkpointcontrol protocols (Lieberman, H., eds.), pp. 219-233, Humana Press, Totowa, NJ. |
| 28. | Devos, D. & Valencia, A. (2001).Intrinsic errors in genome annotation. Trends Genet., 17, 429-431. |
| 29. | Nair, R. & Rost, B. (2002).Sequence conserved for sub-cellular localization. Prot. Sci., 11, 2836-2847. |
| 30. | Rost, B. (2002). Enzyme function lessconserved than anticipated. J. Mol. Biol.,318, 595-608. |
| 31. | Ponting, C. P., Schultz, J., Milpetz,F. & Bork, P. (1999). SMART: identification and annotation of domains fromsignalling and extracellular protein sequences. Nucl. Acids Res., 27, 229-232. |
| 32. | Liu, J. & Rost, B. (2003).Domains, motifs, and clusters in the protein universe. Curr. Opin. Chem.Biol., 7, 5-11. |
| 33. | Koh, I. Y. Y., Eyrich, V. A., Marti-Renom,M. A., Przybylski, D., Madhusudhan, M. S. et al. (2003). EVA: evaluation ofprotein structure prediction servers. Nucl. Acids Res., 31, 3311-3315. |
| 34. | Chen, C. P., Kernytsky, A. & Rost,B. (2002). Transmembrane helix predictions revisited. Prot. Sci., 11, 2774-2791. |
| 35. | Liu, J. & Rost, B. (2004). CHOPproteins into structural domains. Proteins,in press. |
| 36. | Yang, A. S. & Honig, B. (2000). Anintegrated approach to the analysis and modeling of protein sequences andstructures. III. A comparative study of sequence conservation in proteinstructural families using multiple structural alignments. J. Mol. Biol., 301, 691-711. |
| 37. | Bateman, A., Coin, L., Durbin, R.,Finn, R. D., Hollich, V. et al. (2004). The Pfam protein families database. Nucl.Acids Res., 32,D138-41. |
| 38. | Eddy, S. R. (1998). Profile hiddenMarkov models. Bioinformatics, 14, 755-763. |
| 39. | Boeckmann, B., Bairoch, A., Apweiler,R., Blatter, M. C., Estreicher, A. et al. (2003). The SWISS-PROT proteinknowledgebase and its supplement TrEMBL in 2003. Nucl. Acids Res., 31, 365-370. |
| 40. | Liu, J. & Rost, B. (2004).Sequence-based prediction of protein domains. Nucl. Acids Res.,2004-01-20. |
| 41. | Ofran, Y. & Rost, B. (2003).Analysing six types of protein-protein interfaces. J. Mol. Biol., 325, 377-387. |
| 42. | Ofran, Y. & Rost, B. (2003).Predict protein-protein interaction sites from local sequence information. FEBSLett., 544, 236-239. |
| 43. | Nair, R. & Rost, B. (2003). LOC3D:annotate sub-cellular localization for protein structures. Nucl. Acids Res., 31, 3337-3340. |
| 44. | Nair, R. & Rost, B. (2002).Inferring sub-cellular localisation through automated lexical analysis. Bioinformatics, 18, S78-S86. |
| 45. | Nielsen, H., Engelbrecht, J., Brunak,S. & von Heijne, G. (1997). Identification of prokaryotic and eukaryoticsignal peptides and prediction of their cleavage sites. Prot. Engin., 10, 1-6. |
| 46. | Emanuelsson, O., Nielsen, H., Brunak,S. & von Heijne, G. (2000). Predicting subcellular localization of proteinsbased on their N-terminal amino acid sequence. J. Mol. Biol., 300, 1005-1016. |
| 47. | Bairoch, A. & Apweiler, R. (2000).The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl.Acids Res., 28,45-48. |
| 48. | Berman, H. M., Westbrook, J., Feng,Z., Gillliland, G., Bhat, T. N. et al. (2000). The Protein Data Bank. Nucl.Acids Res., 28,235-242. |
| 49. | Rost, B. (1995). Fitting 1-Dpredictions into 3-D structures. In Protein folds: a distance based approach(Bohr, H. & Brunak, S., eds.), pp. 132-151, CRC Press, Boca Raton, Florida. |
| 50. | Rost, B., Schneider, R. & Sander,C. (1997). Protein fold recognition by prediction-based threading. J. Mol.Biol., 270, 471-480. |
| 51. | Corpet, F., Servant, F., Gouzy, J.& Kahn, D. (2000). ProDom and ProDom-CG: tools for protein domain analysisand whole genome comparisons. Nucl. Acids Res.,28, 267-269. |
| 52. | Rost, B. & Sander, C. (1993).Prediction of protein secondary structure at better than 70% accuracy. J.Mol. Biol., 232,584-599. |
| 53. | Rost, B. & Sander, C. (1994).Combining evolutionary information and neural networks to predict proteinsecondary structure. Proteins, 19, 55-72. |
| 54. | Rost, B. & Sander, C. (1994).Conservation and prediction of solvent accessibility in protein families. Proteins, 20, 216-226. |
| 55. | Rost, B., Casadio, R., Fariselli, P.& Sander, C. (1995). Prediction of helical transmembrane segments at 95%accuracy. Prot. Sci., 4, 521-533. |
| 56. | Rost, B., Casadio, R. & Fariselli,P. (1996). Topology prediction for helical transmembrane proteins at 86%accuracy. Prot. Sci., 5, 1704-1718. |
| 57. | Lupas, A. (1996). Prediction andanalyis of coiled-coil structures. Meth. Enzymol., 266, 513-525. |
| 58. | Brown, N., Leroy, C. & Sander, C.(1998). MView: A Web compatible database search or multiple alignment viewer. Bioinformatics, 14, 380-381. |
| 59. | Gouet, P., Courcelle, E., Stuart, D.I. & Metoz, F. (1999). ESPript: multiple sequence alignments in PostScript.Bioinformatics, 15, 305-308. |
| Contact: rost@columbia.edu | Version: Mar 15, 2004 |