
Figure
1. Hierarchical architecture of LOCtree. LOCtree uses specialized
architecture to predict subcellular localization of proteins from
different organisms: (a) architecture for eukaryotic non-plant
proteins; (b) architecture for plant proteins; and (c) the architecture
for prokaryotic proteins. At each branch point a support vector machine
(SVM) is used to accomplish a binary classification (either protein
belongs to localization class L or does not belong to L). The
hierarchical architecture has been designed to mimic the biological
protein sorting mechanism as closely as possible. The branches of the
tree represent intermediate stages in the sorting machinery while the
nodes represent the decision points in the sorting machinery. The
different levels of SVMs in the hierarchical tree are labeled Level 0,
Level 1, etc. For example, Level 0 represents the top node SVM which
discriminates between secretory pathway proteins and other
intra-cellular proteins ((a) and (b)) or proteins which remain in the
cytoplasm from the rest (c). The intermediate node SVMs in the next
level are represented as Level 1, and are responsible for separating
extra-cellular proteins from proteins sorted to the organelles and
nuclear proteins from cytoplasmic proteins ((a) and (b)). For the
prokaryotic architecture (c), Level 1 is the terminal level for
Gram-negative bacteria and separates extra-cellular proteins from
periplasmic proteins. In addition, Level 1 also contains the
cytoplasmic leaf which is propagated without branching from Level 0.
For Gram-positive bacteria, Level 0 is the terminal level and separates
cytoplasmic proteins from extra-cellular proteins (non-cytoplasmic
branch). The leaves of the tree, represented by rectangular boxes
represent the final localization classes for which prediction is made.
If a leaf has a depth smaller than the overall depth of the tree it is
propagated without branching for the remainder of the tree. Level 2 is
the terminal level for the eukaryotic non-plant architecture (a) and is
responsible for sorting proteins into one of five subcellular classes
(mitochondria and cytosol plus the three leaves from Level 1), while
Level 3 is the terminal level for the plant architecture (c) and
separates proteins into one of six classes (mitochondria and
chloroplast plus the four leaves from Level 2). The prediction accuracy
of the parent nodes is higher than the child nodes leading to a
significantly improved prediction accuracy for the intermediate
localization states. Abbreviations: EXT, extra-cellular; NUC, nucleus;
CYT, cytosol; MIT, mitochondria; CHLORO, chloroplast; RIP, periplasm;
and ORG, organelle. Organelles are the endoplasmic reticulum, Golgi
apparatus, peroxysomes, lysosomes, and vacuolar compartments.