banner rostlab-logo
 
Research

Publications

Talks

Services



Software

Web Services

Downloads

Downloads

Docs




Group

People

Contact

Positions

Internal




Introduction Methods Genomes Prediction

Welcome to the LOCtree server

A service for predicting the subcellular localization of proteins (go directly to prediction page).

Prediction algorithm.     LOCtree is a novel system of support vector machines (SVMs) that predict the subcellular localization of proteins, and DNA-binding propensity for nuclear proteins, by incorporating a hierarchical ontology of localization classes modeled onto biological processing pathways. Biological similarities are incorporated from the description of cellular components provided by the gene ontology consortium (GO). GO definitions have been simplified and tailored to the problem of protein sorting. Technically the ontology has been implemented using a decision tree with SVMs as the nodes. LOCtree, was extremely successful at learning evolutionary similarities among subcellular localization classes and was significantly more accurate than other traditional networks at predicting subcellular localization. Whenever available, LOCtree also reports predictions based on the following: 1) Nuclear localization signals found by PredictNLS, 2) Localization inferred using Prosite motifs and Pfam domains found in the protein, and 3) SWISS-PROT keywords associated with a protein. Localization is inferred in the last two cases using the entropy-based LOCkey algorithm. Additional information can be found in the LOCtree manuscript (www, pdf) and associated PredictNLS (www, pdf) and LOCkey (www, pdf) publications.


Comprehensive prediction of localization.     LOCtree can predict the subcellular localization and DNA-binding propensity of non-membrane proteins in non-plant and plant eukaryotes as well as prokaryotes. LOCtree classifies eukaryotic animal proteins into one of five subcellular classes, while plant proteins are classified into one of six classes and prokaryotic proteins are classified into one of three classes . The novel feature of using a hierarchical architecture is the ability to make intermediate localization class predictions at much higher accuracy's. Another source of improvement is the use of 'noisy' training data. 'Noisy' predictions from LOCKey (SWISS-PROT keyword based annotations) and LOCHom (annotations using sequence homology) are used to train the hierarchical SVMs.


Accuracy of localization prediction.     LOCtree achieved a sustained level of 74% accuracy for non-plant eukaryotes, 70% for plants, and 84% for prokaryotes during six fold cross-validation on a non-redundant data set. We rigorously benchmarked LOCtree in comparison to the best alternative methods for localization prediction. LOCtree outperformed all other methods in nearly all benchmarks. Localization assignments using LOCtree agreed quite well with data from recent large-scale experiments. Our preliminary analysis of a few entirely sequenced organisms, namely human (Homo sapiens), yeast (Saccharomyces cerevisiae), and weed (Arabidopsis thaliana) suggested that over 35% of all non-membrane proteins are nuclear, about 20% are retained in the cytosol, and that every fifth protein in the weed resides in the chloroplasm.


Links to other subcellular localization prediction services.

rajesh nair
Last modified: Thu Dec 7 12:17:48 EST 2006
©2008 rostlab.org
1130 St. Nicholas Ave, 8th. floor - (212) 851-4669
columbia.edu | biochemistry | biosof