pp-logo


sign in   

PP: details for methods used

Contents




List of all available prediction types

Type Methods
Prediction server PredictProtein  
Databases searched for homologues SWISS-PROT   TrEMBL   PDB   BIG (SWISS+TrEMBL+PDB)  
Alignment and database searching methods MaxHom   BLASTP   PSIblast  
Sequence motif searching methods ProSite   ProDom   SEG   PredictNLS  
Prediction of protein structure
PHD   PHDsec   PHDacc   PHDhtm  
PROF   PROFsec   PROFacc   GLOBE   AGAPE  
COILS   DISULFIND   ASP   PROFcon  
Tools used for PP MView  
Tools available with PP output ESPript  



Categories of prediction methods PPluated


Prediction server

 
Server PredictProtein
Site (URL) http://www.predictprotein.org
Quote
  1. B Rost: PHD: predicting one-dimensional protein structure by profile based neural networks. Meth. in Enzymolgy, 266, 525-539, 1996
Authors Burkhard Rost and Jinfeng Liu (CUBIC, Columbia Univ, New York)
Contact Jinfeng Liu (liu@cubic.bioc.columbia.edu)
Version 2000_06




Databases searched for homologues

 
Server SWISS-PROT
Site (URL) http://expasy.cbr.nrc.ca/sprot/
About PredictProtein is the acronym for all prediction programs run.

SWISS-PROT is an annotated protein sequence database established in 1986 and maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library (now the EMBL Outstation - The European Bioinformatics Institute (EBI)). The SWISS-PROT protein sequence data bank consists of sequence entries.

Quote
  1. A Bairoch, and R Apweiler: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45-48, 2000
Authors Amos Bairoch (ExPasy, Geneva, Switzerland) and Rolf Appweiler (EBI, Hinxton, England)
Contact Amos Bairoch (Amos.Bairoch@isb-sib.ch)
Version 39 (05/2000), updated weekly
 
Server TrEMBL
Site (URL) http://www.ebi.ac.uk/swissprot/
About TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT.
Quote
  1. A Bairoch, and R Apweiler: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45-48, 2000
Authors Rolf Appweiler (EBI, Hinxton, England)
Contact Rolf Apweiler (Rolf.Apweiler@ebi.ac.uk)
Version 05/2000, updated weekly
 
Server PDB
Site (URL) http://www.rcsb.org/pdb/
About PDB contains proteins of experimentally known three-dimensional structure.
Quote
  1. H M Berman, J Westbrook, Z Feng, G Gilliland, T N Bhat, H Weissig, I N Shindyalov, P E Bourne: The Protein Data Bank. Nucleic Acids Research, 28, 235-242, 2000
Authors RCSB consortium
Contact Phil Bourne (bourne@sdsc.edu)
Version updated weekly
 
Server BIG (SWISS+TrEMBL+PDB)
Site (URL) local at CUBIC
About BIG is our (CUBIC) in-house version merging SWISS-PROT, TrEMBL and PDB.
Quote
  1. see SWISS-PROT, TREMBL, PDB
Authors CUBIC group, Columbia University, New York
Contact Dariusz Przybylski (dudek@cubic.bioc.columbia.edu)
Version updated weekly




Alignment and database searching methods

 
Server MaxHom
Site (URL) local at CUBIC
About MaxHom is a dynamic multiple sequence alignment program which finds similar sequences in a database.

MaxHom builds up a protein family (defined as all closely related proteins likely to have similar structures) in two steps:

  1. In sweep 1, sequences are aligned consecutively to the search sequence by a standard dynamic programming method. After each sequence has been added a profile is compiled, and used to align the next sequence. added a profile is compiled, and used to align the next sequence.
  2. In sweep 2, after all sequences with significant homology have been picked from the BLASTP output, the profile is recompiled, and the dynamic programming algorithm starts once again to align consecutively the sequences, this time using the conservation profile as derived after completion of sweep 1.
Quote
  1. C Sander, and R Schneider:Database of Homology-Derived Structures and the Structural Meaning of Sequence Alignment. Proteins, 9, 56-68, 1991
Authors Reinhard Schneider (LION, Boston) and Chris Sander (Millenium, Boston)
Contact Burkhard Rost (rost@columbia.edu)
Version 1.2000.06
 
Server BLASTP
Site (URL) http://www.ncbi.nlm.nih.gov/BLAST/
About BLASTP is a fast database search program.

Quote
  1. S Karlin, S F Altschul: Applications and statistics for multiple high-scoring segments in molecular sequences. PNAS, 90,5873-5877,1993
Authors S Karlin and SF Altschul (NCBI, Washington)
Contact BLASTP admin (blast-help@ncbi.nlm.nih.gov)
Version 1.4
 
Server PSIblast
Site (URL) http://www.ncbi.nlm.nih.gov/BLAST/
About PSIblast is a fast, yet sensitive database search program.

We are running the iterated PSIblast on a subset of the BIG database with SWISS-PROT + TrEMBL + PDB sequences. The number of iteration, the cut-off thresholds and the particular details of which sequences are used from BIG has been optimised in our group.

Quote
  1. S F Altschul, T L Madden, A A Schäffer, J Zhang, Z Zhang, W Miller, and D J Lipman: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25,3389-3402, 1997
Authors S F Altschul, T L Madden, A A Schäffer, J Zhang, Z Zhang, W Miller, and D J Lipman
Contact BLASTP admin (blast-help@ncbi.nlm.nih.gov)
Version 2000_06




Sequence motif searching methods

 
Server ProSite
Site (URL) http://www.expasy.ch/prosite/
About PROSITE is a database of functional motifs. ScanProsite, finds all functional motifs in your sequence that are annotated in the ProSite db.

The following description is from the original ProSite site:

ProSite is a method of determining what is the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs.

Quote
  1. K Hofmann , P Bucher, L Falquet, A Bairoch: The PROSITE database, its status in 1999. Nucleic Acids Res, 27, 215-219, 1999
Authors Kay Hofmann, Philip Bucher, and Amos Bairoch (SIB, Geneva, Switzerland)
Contact Christian Sigrist (Christian.Sigrist@isb-sib.ch)
Version 1999_07
 
Server ProDom
Site (URL) http://protein.toulouse.inra.fr/prodom.html
About ProDom is a database of putative protein domains. The database is searched with BLAST for domains corresponding to your protein.

The following description is from the original ProDom site (which supplies a rather useful graphical interface to the ProDom database):

The ProDom protein domain database consists of an automatic compilation of homologous domains detected in the SWISS-PROT database by the DOMAINER algorithm (ELL Sonnhammer & D Kahn, Prot. Sci., 1994, 3, 482-492). It has been devised to assist with the analysis of the domain arrangement of proteins.

ProDom `domains' are inferred on the basis of conserved subsequences as found in various proteins. Such a conservation corresponds frequently, though not always, to genuine structural domains: therefore domain boundaries should be treated with caution. For some domain families experts have been asked to correct domain boundaries on the basis of both sequence and structural information. This expertise will complement the automated process and improve the quality of ProDom domain families.

Quote
  1. F Corpet, F Servant, J Gouzy, and D Kahn: ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons. Nucleic Acids Res, 28, 267-269, 2000
Authors Florence Corpet, Florence Servant, Jerome Gouzy, and Daniel Kahn
Contact Jerome Gouzy (Jerome.Gouzy@toulouse.inra.fr)
Version 2000.1
 
Server SEG
Site (URL) http://trex.musc.edu/manuals/unix/seg.html
About SEG divides sequences into regions of low-, and high-complexity. Low-complexity regions typically correspond to 'simple sequences' or 'compositionally-biased' regions.

The following description is from the original SEG documentation (JC Wootton & S Federhen, 1996, Meth Enzymology, 266, 554-571):

SEG divides sequences into contrasting segments of low-complexity and high-complexity. Low-complexity segments defined by the algorithm represent "simple sequences" or "compositionally-biased regions".

Locally-optimized low-complexity segments are produced at defined levels of stringency, based on formal definitions of local compositional complexity. The segment lengths and the number of segments per sequence are determined automatically by the algorithm.

Quote
  1. J C Wootton, and S Federhen: Analysis of compositionally biased regions in sequence databases. Methods in Enzymology, 266, 554-571, 1996
Authors John C Wootton and Scott Federhen (NCBI, Washington)
Contact Scott Federhen (federhen@ncbi.nlm.nih.gov)
Version 1994
 
Server PredictNLS
Site (URL) http://cubic.bioc.columbia.edu/predictNLS
About PrecitNLS finds experimentally known nuclear localisation in your protein.

Quote
  1. M Cokol, R Nair, and B Rost:in preparation, 2000
Authors Raj Nair, Murad Cokol, and Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Raj Nair (nair@cubic.bioc.columbia.edu)
Version 2000_07




Prediction of protein structure

 
Server PHD
Site (URL) http://www.predictprotein.org
About PHD is a suite of programs predicting 1D structure (secondary structure, solvent accessibility) from multiple sequence alignments.

see PHDsec PHDacc PHDhtm

Quote
  1. B Rost: PHD: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology, 266, 525-539, 1996
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 1996.1
 
Server PHDsec
Site (URL) http://www.predictprotein.org
About PHDsec predicts secondary structure from multiple sequence alignments.

Secondary structure is predicted by a system of neural networks rating at an expected average accuracy > 72% for the three states helix, strand and loop (Rost & Sander, PNAS, 1993 , 90, 7558-7562; Rost & Sander, JMB, 1993 , 232, 584-599; and Rost & Sander, Proteins, 1994 , 19, 55-72). Evaluated on the same data set, PHDsec is rated at ten percentage points higher three-state accuracy than methods using only single sequence information, and at more than six percentage points higher than, e.g., a method using alignment information based on statistics (Levin, Pascarella, Argos & Garnier, Prot. Engng., 6, 849-54, 1993).

PHDsec predictions have three main features:

  1. improved accuracy through evolutionary information from multiple sequence alignments
  2. improved beta-strand prediction through a balanced training procedure
  3. more accurate prediction of secondary structure segments by using a multi-level system
Quote
  1. B Rost: PHD: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology, 266, 525-539, 1996.
  2. B Rost, and C Sander: Prediction of protein secondary structure at better than 70% accuracy. J Molecular Biol, 232, 584-599, 1993.
  3. B Rost, and C Sander: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, 19, 55-77, 1994
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 1996.1
 
Server PHDacc
Site (URL) http://www.predictprotein.org
About PHDacc predicts per residue solvent accessibility from multiple sequence alignments.

Solvent accessibility is predicted by a neural network method rating at a correlation coefficient (correlation between experimentally observed and predicted relative solvent accessibility) of 0.54 cross-validated on a set of 238 globular proteins (Rost & Sander, Proteins, 1994, 20, 216-226). The output of the neural network codes for 10 states of relative accessibility. Expressed in units of the difference between prediction by homology modelling (best method) and prediction at random (worst method), PHDacc is some 26 percentage points superior to a comparable neural network using three output states (buried, intermediate, exposed) and using no information from multiple alignments.

Quote
  1. B Rost: PHD: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology, 266, 525-539, 1996.
  2. B Rost, and C Sander: Conservation and prediction of solvent accessibility in protein families. Proteins, 20, 216-226, 1994
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 1996.1
 
Server PHDhtm
Site (URL) http://www.predictprotein.org
About PHDhtm predicts the location and topology of transmembrane helices from multiple sequence alignments.

Transmembrane helices in integral membrane proteins are predicted by a system of neural networks. The shortcoming of the network system is that often too long helices are predicted. These are cut by an empirical filter. The final prediction (Rost et al., Protein Science, 1995, 4, 521-533) has an expected per-residue accuracy of about 95%. The number of false positives, i.e., transmembrane helices predicted in globular proteins, is about 2% (Rost et al. 1996).

The neural network prediction of transmembrane helices (PHDhtm) is refined by a dynamic programming-like algorithm. This method resulted in correct predictions of all transmembrane helices for 89% of the 131 proteins used in a cross-validation test; more than 98% of the transmembrane helices were correctly predicted. The output of this method is used to predict topology, i.e., the orientation of the N-term with respect to the membrane. The expected accuracy of the topology prediction is > 86%. Prediction accuracy is higher than average for eukaryotic proteins and lower than average for prokaryotes. PHDtopology was more accurate than all other methods tested on identical data sets in 1996 (Rost, Casadio & Fariselli, 1996a and 1996b).

Quote
  1. B Rost: PHD: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology, 266, 525-539, 1996.
  2. B Rost, P Fariselli, and R Casadio: Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Science, 7, 1704-1718, 1996
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 1996.1
 
Server PROF
Site (URL) http://www.predictprotein.org
About Improved version of PHD: Profile-based neural network prediction of protein structure.

Quote
  1. B Rost: PROF: predicting one-dimensional protein structure by profile based neural networks. unpublished, 2000
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 2000_06
 
Server PROFsec
Site (URL) http://www.predictprotein.org
About Improved version of PHDsec: Profile-based neural network prediction of protein secondary structure.

Quote
  1. B Rost: PROF: predicting one-dimensional protein structure by profile based neural networks. unpublished, 2000
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 2000.06
 
Server PROFacc
Site (URL) http://www.predictprotein.org
About Improved version of PHDacc: Profile-based neural network prediction of residue solvent accessibility.

Quote
  1. B Rost: PROF: predicting one-dimensional protein structure by profile based neural networks. unpublished, 2000
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 2000.06
 
Server GLOBE
Site (URL) http://www.predictprotein.org
About GLOBE predicts the globularity of a protein.

An additional result from the prediction of solvent accessibility is that of protein globularity. That method is not published, yet. For more information, you may have a look at the preliminary preprint.

Quote
  1. B Rost:Short yeast ORFs: expressed protein or not? unpublished, 2000
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 1996.1
 
Server TOPITS
Site (URL) http://www.predictprotein.org
About TOPITS is a prediction-based threading program, that finds remote structural homologues in the DSSP database.

Remote homologues (0-25% sequence identity) are detected by a novel prediction-based threading method (Rost 1995a and 1995b). The principle idea is to detect similar motifs of secondary structure and accessibility between a sequence of unknown structure and a known fold . For the recognition of similarities between entire folds, the expected accuracy (first hit of alignment list correct) is about 60% (Rost, ISMB95 Proceedings, 1995, AAAI Press, 314-321). If the goal is to correctly detect even short homologous fragments, still about 30% of the first hits are correct (compared to an accuracy of 14% for simple sequence alignments: full paper). Hits with z-scores above 3.0 are more reliable (accuracy > 60%). (Note: a number of similar or better threading services based on similar principles are available through META: http://www.predictprotein.org/submit_meta.html).

Quote
  1. B Rost: TOPITS: Threading One-dimensional Predictions Into Three-dimensional Structures. In: C Rawlings, D Clark, R Altman, L Hunter, T Lengauer, and S Wodak (eds.) The third international conference on Intelligent Systems for Molecular Biology (ISMB), Cambridge, England, Menlo Park, CA: AAAI Press, 314-321, 1995.
  2. B Rost, R Schneider, and C Sander: Protein fold recognition by prediction-based threading. J of Molecular Biology, 270, 471-480, 1997
Authors Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Burkhard Rost (rost@columbia.edu)
Version 1997.1
 
Server AGAPE
Site (URL) http://www.predictprotein.org
About AGAPE is a prediction-based fold recognition program, that finds remote structural homologues in the PDB database.

Remote homologues (0-25% sequence identity) are detected by a novel prediction-based fold recognition method (Przybylski & Rost 2004). The principle idea is to expand 1-D information of protein sequences by incorporating predicted secondary structure and solvent accessibility states of each amino acid. The resulting 'generalized sequences' are aligned with similarly expanded ('generalized') position specific scoring matrices. The correlation of predicted secondary structure and solvent accessibility states is in most cases higher than between predicted and observed states (Przybylski & Rost 2004). Consequently, AGAPE uses predicted states for PDB proteins in the template library. Alignments produced by AGAPE are on average more similar to structual alignments than the alignments from PSI-BLAST. Regarding pure fold recogniton performance, AGAPE on average improves over PSI-BLAST by about as much as PSI-BLAST improves over BLAST. (Note: a number of fold recognition services are available through META: http://www.predictprotein.org/submit_meta.html).

Quote
  1. D Przybylski & B Rost: Improving fold recognition without folds. Journal of Molecular Biology, 341, 255-269, 2004
Authors Dariusz Przybylski & Burkhard Rost (CUBIC, Columbia Univ, New York)
Contact Dariusz Przybylski (dsp23@columbia.edu) or Burkhard Rost (rost@columbia.edu)
Version 0.3
 
Server COILS
Site (URL) local at CUBIC
About COILS finds coiled-coil regions in your protein.

The following description is from the original COILS site:

COILS is a program that compares a sequence to a database of known parallel two-stranded coiled-coils and derives a similarity score. By comparing this score to the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation.

Quote
  1. A Lupas: Prediction and Analysis of Coiled-Coil Structures. Methods in Enzymology, 266, 513-525, 1996
Authors Andrei Lupas (Max Planck Institute, Tuebingen, Germany)
Contact Andrei Lupas (andrei.lupas@tuebingen.mpg.de)
Version 1999_2.2
 
Server DISULFIND
Site (URL) http://cassandra.dsi.unifi.it/cysteines/index.html
About DISULFIND is a disulphide bridges predictor based on a two steps process. First, the bonding state of each cysteine in the sequence is predicted as either reduced or oxidized. Such prediction is based on a local Support Vector Machines predictor using the context of the cysteine in the form of multiple alignment profiles, and a global refinement using Bidirectional Recurrent Neural Networks. Second, the connectivity pattern for the subset of oxidized cysteines, if any, is predicted with Recursive Neural Networks, pairing each oxidized cysteine with its predicted partner. Experimental assessment on a non redundant subset of the PDB reports state of the art performance for both steps of prediction.
Quote
  1. A. Vullo and P. Frasconi. Disulfide Connectivity Prediction using Recursive Neural Networks and Evolutionary Information, Bioinformatics, 20, 653-659, 2004.
  2. P. Frasconi, A. Passerini, and A. Vullo. A Two-Stage SVM Architecture for Predicting the Disulfide Bonding State of Cysteines, Proc. IEEE Workshop on Neural Networks for Signal Processing, pp.25-34, 2002.
  3. A.Ceroni, P.Frasconi, A.Passerini and A.Vullo. Predicting the Disulfide Bonding State of Cysteines with Combinations of Kernel Machines, Journal of VLSI Signal Processing, 35, 287-295, 2003.
Authors A.Ceroni, P.Frasconi, A.Passerini and A.Vullo
Contact cystein@dsi.unifi.it
Version 2.0
 
Server ASP
Site (URL) Sandia National Lab
About ASP predicts conformational switches, i.e. it looks for switches that involve transitions between secondary structure types. The program was developed by MM Young, K Kirshenbaum, KA Dill and S Highsmith. ASP was designed to identify the location of conformational switches in proteins with known switches. It is NOT designed to predict whether a given sequence does or does not contain a switch. For best results, ASP should be used on sequences of length >150 amino acids with >10 sequence homologues in the SWISS-PROT data bank. ASP has been validated against a set of globular proteins and may not be generally applicable. Please see Young et al., Protein Science 8(9):1752-64. 1999. for details and for how best to interpret this output. We consider ASP to be experimental at this time, and would appreciate any feedback from our users.
Quote
  1. MM Young, K Kirshenbaum, KA Dill & S Highsmith: Predicting conformational switches in proteins. Protein Science, 1999, 8, 1752-64.
  2. K. Kirshenbaum, M.M. Young and S. Highsmith. Predicting Allosteric Switches in Myosins. Protein Science 8(9):1806-1815. 1999.
Authors Malin M Young(Sandia Labs, Livermore), Kent Kirshenbaum, Ken Dill & Stefan Highsmith
Contact Malin Young (mmyoung@sandia.gov)
Version 2001
 
Server PROFcon
Site (URL) http://www.predictprotein.org
About PROFcon predicts contacts between residue pairs in single chains. Our definition of contact is based on Cbeta atoms distances (Calpha for glycines). Two residues whose Cbeta's are closer than 8 Ang are considered to be in contact, not in contact otherwise. The last column of the output is the predicted contact score, (contact probability is high if score is close to 1).
Quote
  1. Marco Punta & Burkhard Rost: Toward good 2D predictions in proteins. Submitted.
Authors Marco Punta & Burkhard Rost
Contact Marco Punta (punta@maple.bioc.columbia.edu)
Version 8.0




Tools used for PP

 
Server MView
Site (URL) http://mathbio.nimr.mrc.ac.uk/~nbrown/mview/
About MView is a program converting multiple sequence alignments into fancy HTML formatted output.

Quote
  1. N P Brown, C Leroy, and C Sander: MView: A Web compatible database search or multiple alignment viewer. Bioinformatics, 14, 380-381, 1998
Authors Nigel Brown
Contact Nigel Brown ( nbrown@nimr.mrc.ac.uk)
Version 1.40.2




Tools available with PP output

 
Server ESPript
Site (URL) http://cubic.bioc.columbia.edu/cgi/pp/ESPript
About ESPript converts the PredictProtein results (and other alignments) into fancy images.

To use the tool, you have to do the following:

  1. Save the PredictProtein results into a file.
  2. Load ESPript (see below)
  3. Provide the file in which you saved the results in the appropriate boxes of ESPript (click 'Browse ..' boxes in upper half)
  4. Run ESPript (click 'Run ESPript' button on lower left corner)
Quote
  1. P Gouet, E Courcelle, D I Stuart, and F Metoz: ESPript: multiple sequence alignments in PostScript. Bioinformatics, 15, 305-308, 1999
Authors Patrice Gouet and Emmanuel Courcelle
Contact Patrice Gouet (gouet@ipbs.fr)
Version 1.9







List of all available prediction types

Type Methods
Prediction server PredictProtein  
Databases searched for homologues SWISS-PROT   TrEMBL   PDB   BIG (SWISS+TrEMBL+PDB)  
Alignment and database searching methods MaxHom   BLASTP   PSIblast  
Sequence motif searching methods ProSite   ProDom   SEG   PredictNLS  
Prediction of protein structure
PHD   PHDsec   PHDacc   PHDhtm  
PROF   PROFsec   PROFacc   GLOBE   TOPITS  
COILS   DISULFIND   ASP  
Tools used for PP MView  
Tools available with PP output ESPript  


Copyright © 2008 Burkhard Rost, CUBIC all rights reserved. Terms of Use | Privacy Policy | Contact Information