Bibliography

Abagyan, R. A. and S. Batalov (1997). "Do aligned sequences share the same fold?" Journal of Molecular Biology 273: 355-368.

Adams, M. D., S. E. Celniker, et al. (2000). "The genome sequence of Drosophila melanogaster." Science 287(5461): 2185-2195.

Airozo, D., R. Allard, et al. (1999). "MEDLINE." 1999, from http://www.ncbi.nlm.nih.gov/PubMed/.

Alberts, B., D. Bray, et al. (1994). Molecular Biology of the Cell. New York and London, Garland Publishing.

Alexandrov, N. N. and R. Luethy (1998). "Alignment algorithm for homology modeling and threading." Protein Science 7: 254-258.

Altschul, S., T. Madden, et al. (1997). "Gapped Blast and PSI-Blast: a new generation of protein database search programs." Nucleic Acids Research 25: 3389-3402.

Altschul, S. F. (1993). "A protein alignment scoring system sensitive at all evolutionary distances." Journal of Molecular Evolution 36: 290-300.

Altschul, S. F. and W. Gish (1996). "Local alignment statistics." Methods in Enzymology 266: 460-480.

Altschul, S. F., W. Gish, et al. (1990). "Basic local alignment search tool." J Mol Biol 215(3): 403-10.

Andrade, M. A., N. P. Brown, et al. (1999). "Automated genome sequence analysis and annotation." Bioinformatics 15(5): 391-412.

Andrade, M. A., S. I. O'Donoghue, et al. (1998). "Adaptation of protein surfaces to subcellular location." J Mol Biol 276(2): 517-25.

Andrade, M. A., C. Ouzounis, et al. (1999). "Functional classes in the three domains of life." Journal of Molecular Evolution 49(5): 551-557.

Andrade, M. A. and A. Valencia (1998). "Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families." Bioinformatics 14(7): 600-7.

Apte, C., F. Damerau, et al. (1994). "Towards language independent automated learning of text categorization models." Proceedings of the 17th Annual ACM/SIGIR conference.

Apweiler, R. (2001). "Functional information in SWISS-PROT: the basis for large-scale characterisation of protein sequences." Brief Bioinform 2(1): 9-18.

Apweiler, R., T. K. Attwood, et al. (2000). "InterPro--an integrated documentation resource for protein families, domains and functional sites." Bioinformatics 16(12): 1145-50.

Apweiler, R., T. K. Attwood, et al. (2001). "The InterPro database, an integrated documentation resource for protein families, domains and functional sites." Nucleic Acids Res 29(1): 37-40.

Apweiler, R., A. Bairoch, et al. (2004). "Protein sequence databases." Curr Opin Chem Biol 8(1): 76-80.

Apweiler, R., A. Bairoch, et al. (2004). "UniProt: the Universal Protein knowledgebase." Nucleic Acids Res 32 Database issue: D115-9.

Apweiler, R., A. Gateau, et al. (1997). "Protein sequence annotation in the genome era: the annotation concept of SWISS-PROT+TREMBL." Proc Int Conf Intell Syst Mol Biol 5: 33-43.

Arabidopsis Genome Initiative (2000). "Analysis of the genome sequence of the flowering plant Arabidopsis thaliana." Nature 408(6814): 796-815.

Ashburner, M., C. A. Ball, et al. (2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium." Nat Genet 25(1): 25-9.

Ashburner, M. and R. Drysdale (1994). "FlyBase--the Drosophila genetic database." Development 120(7): 2077-9.

Bairoch, A. and R. Apweiler (1997). "The SWISS-PROT protein sequence data bank and its new supplement TrEMBL." Nucleic Acids Research 25: 31-36.

Bairoch, A. and R. Apweiler (1999). "The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999." Nucleic Acids Res 27(1): 49-54.

Bairoch, A. and R. Apweiler (2000). "The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000." Nucleic Acids Res 28(1): 45-8.

Bairoch, A., R. Apweiler, et al. (2005). "The Universal Protein Resource (UniProt)." Nucleic Acids Research 33 Database Issue: D154-9.

Baker, P. G. and A. Brass (1998). "Recent developments in biological sequence databases." Curr Opin Biotechnol 9(1): 54-8.

Baldi, P., S. Brunak, et al. (2000). "Assessing the accuracy of prediction algorithms for classification: an overview." Bioinformatics 16(5): 412-24.

Bannai, H., Y. Tamada, et al. (2002). "Extensive feature detection of N-terminal protein sorting signals." Bioinformatics 18(2): 298-305.

Bar-Peled, M., D. C. Bassham, et al. (1996). "Transport of proteins in eukaryotic cells: more questions ahead." Plant Mol Biol 32(1-2): 223-49.

Bauer, M. F., S. Hofmann, et al. (2000). "Protein translocation into mitochondria: the role of TIM complexes." Trends in Cell Biology 10(1): 25-31.

Bazzan, A. L., P. M. Engel, et al. (2002). "Automated annotation of keywords for proteins related to mycoplasmataceae using machine learning techniques." Bioinformatics 18 Suppl 2: S35-43.

Bendtsen, J. D., H. Nielsen, et al. (2004). "Improved prediction of signal peptides: SignalP 3.0." J Mol Biol 340(4): 783-95.

Berman, H. M., J. Westbrook, et al. (2000). "The Protein Data Bank." Nucleic Acids Research 28(1): 235-42.

Bernstein, F. C., T. F. Koetzle, et al. (1977). "The Protein Data Bank. A computer-based archival file for macromolecular structures." Eur J Biochem 80(2): 319-24.

Blake, J. D. and F. E. Cohen (2001). "Pairwise sequence alignment below the twilight zone." Journal of Molecular Biology 307(2): 721-735.

Boeckmann, B., A. Bairoch, et al. (2003). "The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003." Nucleic Acids Res 31(1): 365-70.

Bonifaci, N., J. Moroianu, et al. (1997). "Karyopherin beta2 mediates nuclear import of a mRNA binding protein." Proc Natl Acad Sci U S A 94(10): 5055-60.

Bork, P., T. Dandekar, et al. (1998). "Predicting function: from genes to genomes and back." J Mol Biol 283(4): 707-25.

Bork, P. and T. J. Gibson (1996). "Applying motif and profile searches." Methods in Enzymology 266: 162-184.

Bork, P. and E. V. Koonin (1998). "Predicting functions from protein sequences--where are the bottlenecks?" Nat Genet 18(4): 313-8.

Bork, P., C. Ouzounis, et al. (1994). "From genome sequences to protein function." Current Opinion in Structural Biology 4: 393-403.

Boulikas, T. (1993). "Nuclear localization signals (NLS)." Crit Rev Eukaryot Gene Expr 3(3): 193-227.

Boulikas, T. (1994). "Putative nuclear localization signals (NLS) in protein transcription factors." J Cell Biochem 55(1): 32-58.

Brenner, S. E., C. Chothia, et al. (1998). "Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships." Proceedings of the National Academy of Sciences 95: 6073-6078.

Bruce, B. D. (2000). "Chloroplast transit peptides: structure, function and evolution." Trends Cell Biol 10(10): 440-7.

Brutlag, D. L. (1998). "Genomics and computational molecular biology." Curr Opin Microbiol 1(3): 340-5.

Cai, Y. D. and K. C. Chou (2000). "Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins." Mol Cell Biol Res Commun 4(3): 172-3.

Cai, Y. D., X. J. Liu, et al. (2002). "Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect." J Cell Biochem 84(2): 343-8.

Carter, P., J. Liu, et al. (2003). "PEP: Predictions of Entire Proteomes." NAR (submitted).

Casari, G., C. Sander, et al. (1995). "A method to predict functional residues in proteins." Nature Structural Biology 2: 171-178.

Cedano, J., P. Aloy, et al. (1997). "Relation between amino acid composition and cellular location of proteins." J Mol Biol 266(3): 594-600.

Chen, L., J. N. Glover, et al. (1998). "Structure of the DNA-binding domains from NFAT, Fos and Jun bound specifically to DNA." Nature 392(6671): 42-8.

Chervitz, S. A., E. T. Hester, et al. (1999). "Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure." Nucleic Acids Res 27(1): 74-8.

Chothia, C. and A. M. Lesk (1986). "The relation between the divergence of sequence and structure in proteins." EMBO Journal 5: 823-826.

Chou, K. C. and Y. D. Cai (2003). "Prediction and classification of protein subcellular location-sequence-order effect and pseudo amino acid composition." J Cell Biochem 90(6): 1250-60.

Chou, K. C. and D. W. Elrod (1999). "Protein subcellular location prediction." Protein Eng 12(2): 107-18.

Christie, K. R., S. Weng, et al. (2004). "Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms." Nucleic Acids Res 32 Database issue: D311-4.

Claros, M. G. (1995). "MitoProt, a Macintosh application for studying mitochondrial proteins." Comput Appl Biosci 11(4): 441-7.

Claros, M. G., S. Brunak, et al. (1997). "Prediction of N-terminal protein sorting signals." Current Opinion in Structural Biology 7: 394-398.

Claros, M. G. and P. Vincens (1996). "Computational method to predict mitochondrially imported proteins and their targeting sequences." Eur J Biochem 241(3): 779-86.

Cokol, M., R. Nair, et al. (2000). "Finding nuclear localization signals." EMBO Rep 1(5): 411-5.

Connolly, M. L. (1983). "Solvent-accessible surfaces of proteins and nucleic acids." Science 221(4612): 709-13.

Conti, E., M. Uy, et al. (1998). "Crystallographic analysis of the recognition of a nuclear localization signal by the nuclear import factor karyopherin alpha." Cell 94(2): 193-204.

Dasarathy, B. V. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. Las Alamitos, California, IEEE Computer Society Press.

Davis, T. N. (2004). "Protein localization in proteomics." Curr Opin Chem Biol 8(1): 49-53.

Devos, D. and A. Valencia (2000). "Practical limits of function prediction." Proteins 41(1): 98-107.

Devos, D. and A. Valencia (2001). "Intrinsic errors in genome annotation." Trends in Genetics 17(8): 429-431.

Djabali, K., V. M. Aita, et al. (2001). "Hairless is translocated to the nucleus via a novel bipartite nuclear localization signal and is associated with the nuclear matrix." J Cell Sci 114(Pt 2): 367-76.

Doerks, T., A. Bairoch, et al. (1998). "Protein annotation: detective work for function prediction." Trends Genet 14(6): 248-50.

Donnes, P. and A. Hoglund (2004). "Predicting protein subcellular localization: past, present, and future." Genomics Proteomics Bioinformatics 2(4): 209-15.

Doolittle, R. F. (1986). Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. Mill Valley California, University Science Books.

Drawid, A. and M. Gerstein (2000). "A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome." J Mol Biol 301(4): 1059-75.

Durbin, R., S. R. Eddy, et al. (1998). Biological Sequence Analysis. Cambridge, Cambridge University Press.

Eisenberg, D., E. M. Marcotte, et al. (2000). "Protein function in the post-genomic era." Nature 405(6788): 823-6.

Eisenhaber, F. and P. Bork (1998). "Wanted: subcellular localization of proteins based on sequence." Trends in Cell Biology 8: 169-170.

Eisenhaber, F. and P. Bork (1999). "Evaluation of human-readable annotation in biomolecular sequence databases with biological rule libraries." Bioinformatics 15(7-8): 528-35.

Emanuelsson, O., H. Nielsen, et al. (2000). "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence." J Mol Biol 300(4): 1005-16.

Emanuelsson, O., H. Nielsen, et al. (1999). "ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites." Protein Science 8: 978-984.

Emanuelsson, O. and G. von Heijne (2001). "Prediction of organellar targeting signals." Biochim Biophys Acta 1541(1-2): 114-9.

Emanuelsson, O., G. von Heijne, et al. (2001). "Analysis and prediction of mitochondrial targeting peptides." Methods Cell Biol 65: 175-87.

Etzold, T., A. Ulyanov, et al. (1996). "SRS: Information retrieval system for molecular biology data banks." Methods in Enzymology 266: 114-128.

Farabee, M. (2003). On-line Bilogy Book, Beyond Books, Apex Learning, Inc.

Faust, M. and M. Montenarh (2000). "Subcellular localization of protein kinase CK2. A key to its function?" Cell Tissue Res 301(3): 329-40.

Ferrigno, P. and P. A. Silver (1999). "Regulated nuclear localization of stress-responsive factors: how the nuclear trafficking of protein kinases and transcription factors contributes to cell survival." Oncogene 18(45): 6129-34.

Fleischmann, R. D., M. D. Adams, et al. (1995). "Whole-genome random sequencing and assembly of Haemophilus influenzae Rd." Science 269: 496-512.

Fleischmann, W., S. Moller, et al. (1999). "A novel method for automatic functional annotation of proteins." Bioinformatics 15(3): 228-33.

Folsch, H., B. Guiard, et al. (1996). "Internal targeting signal of the BCS1 protein: a novel mechanism of import into mitochondria." Embo J 15(3): 479-87.

Friedman, C., P. Kra, et al. (2001). "GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles." Bioinformatics 17 Suppl 1: S74-82.

Frishman, D. (2000). "PEDANT: protein extraction, description, and analysis tool." from http://pedant.mips.biochem.mpg.de/.

Fujiwara, Y., M. Asogawa, et al. (1997). "Prediction of Mitochondrial Targeting Signals Using Hidden Markov Model." Genome Inform Ser Workshop Genome Inform 8: 53-60.

Gaasterland, T. and M. Oprea (2001). "Whole-genome analysis: annotations and updates." Curr Opin Struct Biol 11(3): 377-81.

Gaasterland, T. and C. W. Sensen (1996). "MAGPIE: automated genome interpretation." Trends Genet 12(2): 76-8.

Galperin, M. Y. and E. V. Koonin (2000). "Who's your neighbor? New computational approaches for functional genomics." Nat Biotechnol 18(6): 609-13.

Gardy, J. L., C. Spencer, et al. (2003). "PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria." Nucleic Acids Res 31(13): 3613-7.

Guda, C. (2006). "pTARGET: a web server for predicting protein subcellular localization." Nucleic Acids Res 34(Web Server issue): W210-3.

Guda, C., E. Fahy, et al. (2004). "MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins." Bioinformatics.

Harrison, P. M., P. Bamborough, et al. (1997). "The prion folding problem." Current Opinion in Structural Biology 7: 53-59.

Hatzivassiloglou, V., P. A. Duboue, et al. (2001). "Disambiguating proteins, genes, and RNA in text: a machine learning approach." Bioinformatics 17 Suppl 1: S97-106.

Hegyi, H. and M. Gerstein (1999). "The relationship between protein structure and function: a comprehensive survey with application to the yeast genome." J Mol Biol 288(1): 147-64.

Herrmann, J. M. and W. Neupert (2000). "Protein transport into mitochondria." Curr Opin Microbiol 3(2): 210-4.

Hertz, G. Z. and G. D. Stormo (1999). "Identifying DNA and protein patterns with statistically significant alignments of multiple sequences." Bioinformatics 15(7-8): 563-77.

Hobohm, U. and C. Sander (1994). "Enlarged representative set of protein structures." Protein Science 3: 522-524.

Hobohm, U., M. Scharf, et al. (1992). "Selection of representative protein data sets." Protein Science 1: 409-17.

Hofmann, K., P. Bucher, et al. (1999). "The PROSITE database, its status in 1999." Nucleic Acids Research 27(1): 215-219.

Hoglund, A., T. Blum, et al. (2006). "Significantly improved prediction of subcellular localization by integrating text and protein sequence data." Pac Symp Biocomput: 16-27.

Horton, P. and K. Nakai (1997). Better prediction of protein cellular localization sites with the k nearest neighbors classifier. Fifth International Conference on Intelligent Systems for Molecular Biology, Halkidiki, Greece, AAAI Press.

Horton, P. and K. Nakai (1997). "Better prediction of protein cellular localization sites with the k nearest neighbors classifier." Ismb 5: 147-52.

Horton, P., K. J. Park, et al. (2007). "WoLF PSORT: protein localization predictor." Nucleic Acids Res.

Hsieh, J. C., Y. Shimizu, et al. (1998). "Novel nuclear localization signal between the two DNA-binding zinc fingers in the human vitamin D receptor." J Cell Biochem 70(1): 94-109.

Hua, S. and Z. Sun (2001). "Support vector machine approach for protein subcellular localization prediction." Bioinformatics 17(8): 721-8.

Huh, W. K., J. V. Falvo, et al. (2003). "Global analysis of protein localization in budding yeast." Nature 425(6959): 686-91.

Iliopoulos, I., A. J. Enright, et al. (2001). "Textquest: document clustering of Medline abstracts for concept discovery in molecular biology." Pac Symp Biocomput: 384-95.

Irie, Y., K. Yamagata, et al. (2000). "Molecular cloning and characterization of Amida, a novel protein which interacts with a neuron-specific immediate early gene product arc, contains novel nuclear localization signals, and causes cell death in cultured cells." J Biol Chem 275(4): 2647-53.

Istrail, S., G. G. Sutton, et al. (2004). "Whole-genome shotgun assembly and comparison of human genome assemblies." Proc Natl Acad Sci U S A 101(7): 1916-21.

Jans, D. A., C. Y. Xiao, et al. (2000). "Nuclear targeting signal recognition: a key control point in nuclear transport?" Bioessays 22(6): 532-44.

Jaroszewski, L., L. Rychlewski, et al. (2000). "Improving the quality of twilight-zone alignments." Protein Science 9(8): 1487-1496.

Jensen, L. J., R. Gupta, et al. (2002). "Prediction of human protein function from post-translational modifications and localization features." J Mol Biol 319(5): 1257-65.

Joachims, T. (2000). Estimating the Generalization Performance of a SVM Efficiently. Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufman.

Jonassen, I. (1997). "Efficient discovery of conserved patterns using a pattern graph." Computer Applications in Biological Science 13: 509-522.

Kabsch, W. and C. Sander (1983). "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features." Biopolymers 22(12): 2577-637.

Kall, L., A. Krogh, et al. (2004). "A combined transmembrane topology and signal peptide prediction method." J Mol Biol 338(5): 1027-36.

Karp, P. D. (1998). "What we do not know about sequence analysis and sequence databases." Bioinformatics 14(9): 753-4.

Karp, P. D., M. Riley, et al. (1999). "Eco Cyc: encyclopedia of Escherichia coli genes and metabolism." Nucleic Acids Research 27(1): 55-8.

Kleffmann, T., D. Russenberger, et al. (2004). "The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions." Curr Biol 14(5): 354-62.

Koonin, E. V. (2000). "Bridging the gap between sequence and function." Trends Genet 16(1): 16.

Koonin, E. v., R. L. Tatusov, et al. (1996). "Protein sequence comparison at genome scale." Methods in Enzymology 266: 295-322.

Kretschmann, E., W. Fleischmann, et al. (2001). "Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT." Bioinformatics 17(10): 920-6.

Kumar, A., S. Agarwal, et al. (2002). "Subcellular localization of the yeast proteome." Genes Dev 16(6): 707-19.

LaCasse, E. C. and Y. A. Lefebvre (1995). "Nuclear localization signals overlap DNA- or RNA-binding domains in nucleic acid-binding proteins." Nucleic Acids Res 23(10): 1647-56.

Lander, E. S., L. M. Linton, et al. (2001). "Initial sequencing and analysis of the human genome." Nature 409(6822): 860-921.

Lewis, D. D. and M. Ringuette (1994). "Comparison of two learning algorithms for text categorization." Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94).

Lewis, S., M. Ashburner, et al. (2000). "Annotating eukaryote genomes." Curr Opin Struct Biol 10(3): 349-54.

Liscovitch, M., M. Czarny, et al. (1999). "Localization and possible functions of phospholipase D isozymes." Biochim Biophys Acta 1439(2): 245-63.

Liu, J. and B. Rost. (2000, January, 2000). "Analysing all proteins in entire genomes." from http://cubic.bioc.columbia.edu/genomes/.

Liu, J. and B. Rost (2001). "Comparing function and structure between entire proteomes." Protein Science 10(10): 1970-1979.

Liu, J. and B. Rost (2002). "Target space for structural genomics revisited." Bioinformatics 18(7): 922-33.

Liu WWW, J. and B. Rost (2000). Analysing all proteins in entire genomes, CUBIC, Columbia University, Dept. of Biochemistry & Molecular Biophysics.

Lu, Z., D. Szafron, et al. (2004). "Predicting subcellular localization of proteins using machine-learned classifiers." Bioinformatics 20(4): 547-56.

Luscombe, N. M., D. Greenbaum, et al. (2001). "What is bioinformatics? A proposed definition and overview of the field." Methods Inf Med 40(4): 346-58.

Marcotte, E. M., I. Xenarios, et al. (2000). "Localizing proteins in the cell from their phylogenetic profiles." Proc Natl Acad Sci U S A 97(22): 12115-20.

Mathews, F. S. (1985). "The structure, function and evolution of cytochromes." Prog. Biophys. Mol. Biol. 45: 1-56.

Mattaj, I. W. and L. Englmeier (1998). "Nucleocytoplasmic transport: the soluble phase." Annu Rev Biochem 67: 265-306.

Matthews, B. W. (1975). "Comparison of the predicted and observed secondary structure of T4 phage lysozyme." Biochim Biophys Acta 405(2): 442-51.

Minor, D. L. J. and P. S. Kim (1996). "Context-dependent secondary structure formation of a designed protein sequence." Nature 380: 730-734.

Moede, T., B. Leibiger, et al. (1999). "Identification of a nuclear localization signal, RRMKWKK, in the homeodomain transcription factor PDX-1." FEBS Lett 461(3): 229-34.

Moroianu, J. (1999). "Nuclear import and export: transport factors, mechanisms and regulation." Crit Rev Eukaryot Gene Expr 9(2): 89-106.

Mott, R., J. Schultz, et al. (2002). "Predicting protein cellular localization using a domain projection method." Genome Res 12(8): 1168-74.

Murzin, A. G. (1998). "How far divergent evolution goes in proteins." Curr Opin Struct Biol 8(3): 380-7.

Nair, R., P. Carter, et al. (2003). "NLSdb: database of nuclear localization signals." Nucleic Acids Res 31(1): 397-9.

Nair, R. and B. Rost (2002). "Inferring sub-cellular localization through automated lexical analysis." Bioinformatics 18 Suppl 1: S78-S86.

Nair, R. and B. Rost (2002). "Sequence conserved for subcellular localization." Protein Sci 11(12): 2836-47.

Nair, R. and B. Rost (2003). "Better prediction of sub-cellular localization by combining evolutionary and structural information." Proteins 53(4): 917-30.

Nair, R. and B. Rost (2003). "LOC3D: annotate sub-cellular localization for protein structures." Nucleic Acids Res 31(13): 3337-40.

Nair, R. and B. Rost (2004). "Annotating protein function through lexical analysis." AI magazine 25: 45-56.

Nair, R. and B. Rost (2004). "LOCnet and LOCtarget: sub-cellular localization for structural genomics targets." Nucleic Acids Res 32(Web Server issue): W517-21.

Nair, R. and B. Rost (2004). "Predicting subcellular localization based on functional hierarchies." manuscript in preperation.

Nair, R. and B. Rost (2005). "Mimicking cellular sorting improves prediction of subcellular localization." Journal of Molecular Biology 348(1): 85-100.

Nair, R. and R. Rost (2005). Predicting protein subcellular localization using intelligent systems. In Silico Technology in Drug Target Identification and Validation. D. Leon and S. Markel. Boca Raton, Taylor and Francis.

Nakai, K. (2000). "Protein sorting signals and prediction of subcellular localization." Adv Protein Chem 54: 277-344.

Nakai, K. (2001). "Review: prediction of in vivo fates of proteins in the era of genomics and proteomics." J Struct Biol 134(2-3): 103-16.

Nakai, K. and P. Horton (1999). "PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization." Trends Biochem Sci 24(1): 34-6.

Nakai, K. and M. Kanehisa (1991). "Expert system for predicting protein localization sites in gram-negative bacteria." Proteins: Structure, Function, and Genetics 11: 95-110.

Nakai, K. and M. Kanehisa (1992). "A knowledge base for predicting protein localization sites in eukaryotic cells." Genomics 14(4): 897-911.

Nakai, K., A. Kidera, et al. (1988). "Cluster analysis of amino acid indices for prediction of protein structure and function." Protein Engineering 2: 93-100.

Nakashima, H. and K. Nishikawa (1994). "Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies." J Mol Biol 238(1): 54-61.

Ng, S. K. and M. Wong (1999). "Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts." Genome Inform Ser Workshop Genome Inform 10: 104-112.

Nielsen, H., S. Brunak, et al. (1999). "Machine learning approaches for the prediction of signal peptides and other protein sorting signals." Protein Engineering 12: 3-9.

Nielsen, H., J. Engelbrecht, et al. (1997). "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites." Protein Eng 10(1): 1-6.

Nielsen, H., J. Engelbrecht, et al. (1997). "A neural network method for identification of prokaryotic and eukaroytoic signal peptides and prediction of their cleavage sites." Internationl Journal of Neural Systems 8: 581-599.

Nielsen, H., J. Engelbrecht, et al. (1997). "A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites." Int J Neural Syst 8(5-6): 581-99.

Nielsen, H., J. Engelbrecht, et al. (1996). "Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site." Proteins: Structure, Function, and Genetics 24: 165-177.

Nishikawa, K., Y. Kubota, et al. (1983). "Classification of proteins into groups based on amino acid composition and other characters: I. Angular distribution." Journal of Biochemistry 94: 981-995.

Nishikawa, K. and T. Ooi (1982). "Correlation of the amino acid composition of a protein to its structural and biological characteristics." Journal of Biochemistry 91: 1821-1824.

Ogul, H. and E. U. Mumcuogu (2007). "Subcellular localization prediction with new protein encoding schemes." IEEE/ACM Trans Comput Biol Bioinform 4(2): 227-32.

Orengo, C. A., A. E. Todd, et al. (1999). "From protein structure to function." Curr Opin Struct Biol 9(3): 374-82.

Ouzounis, C., G. Casari, et al. (1996). "Computational comparisons of model genomes." Trends in Biotechnology 14: 280-285.

Overbeek, R., N. Larsen, et al. (1997). "Representation of function: the next step." Gene 191(1): GC1-GC9.

Pan, Y. X., Z. Z. Zhang, et al. (2003). "Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach." J Protein Chem 22(4): 395-402.

Parfrey, H., R. Mahadeva, et al. (2003). "Alpha(1)-antitrypsin deficiency, liver disease and emphysema." Int J Biochem Cell Biol 35(7): 1009-14.

Park, J., K. Karplus, et al. (1998). "Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods." Journal of Molecular Biology 284: 1201-1210.

Park, K. J. and M. Kanehisa (2003). "Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs." Bioinformatics 19(13): 1656-63.

Pawlowski, K. and A. Godzik (2001). "Surface map comparison: studying function diversity of homologous proteins." J Mol Biol 309(3): 793-806.

Pawlowski, K., L. Jaroszewski, et al. (2000). "Sensitive sequence comparison as protein function predictor." Pac Symp Biocomput: 42-53.

Payne, A. S., E. J. Kelly, et al. (1998). "Functional expression of the Wilson disease protein reveals mislocalization and impaired copper-dependent trafficking of the common H1069Q mutation." Proc Natl Acad Sci U S A 95(18): 10854-9.

Pearce, D. A. (2000). "Localization and processing of CLN3, the protein associated to Batten disease: where is it and what does it do?" J Neurosci Res 59(1): 19-23.

Pearson, W. R. (1995). "Comparison of methods for searching protein sequenc databases." Protein Science 4: 1145-1160.

Pearson, W. R. and D. J. Lipman (1988). "Improved tools for biological sequence comparison." Proc Natl Acad Sci U S A 85(8): 2444-8.

Pierleoni, A., P. L. Martelli, et al. (2006). "BaCelLo: a balanced subcellular localization predictor." Bioinformatics 22(14): e408-16.

Ponting, C. P., J. Schultz, et al. (1999). "SMART: identification and annotation of domains from signalling and extracellular protein sequences." Nucleic Acids Research 27(1): 229-32.

Pruess, M., W. Fleischmann, et al. (2003). "The Proteome Analysis database: a tool for the in silico analysis of whole proteomes." Nucleic Acids Res 31(1): 414-7.

Pruitt, K. D., T. Tatusova, et al. (2007). "NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins." Nucleic Acids Res 35(Database issue): D61-5.

Przybylski, D. and B. Rost (2002). "Alignments grow, secondary structure prediction improves." Proteins: Structure, Function, and Genetics 46: 195-205.

Puntervoll, P., R. Linding, et al. (2003). "The ELM server: A new resource for revealing short functional sites in modular eukaryotic proteins." Nucleic Acids Res(In this issue.).

Reich, J. G. and W. Meiske (1987). "A simple statistical significance test of window scores in large dot matrices obtained from protein or nucleic acid sequences." Comput. Appl. Biosci. 3: 25-30.

Reinhardt, A. and T. Hubbard (1998). "Using neural networks for prediction of the subcellular location of proteins." Nucleic Acids Res 26(9): 2230-6.

Riley, M. (1993). "Function of the gene products in Escherichia coli." Microbiol. Rev. 57: 862-952.

Riley, M. and B. Labedan (1997). "Protein evolution viewed through Escherichia coli protein sequences: introducing the notion of a structural segment of homology, the module." Journal of Molecular Biology 268: 857-868.

Rost, B. (1996). "PHD: predicting one-dimensional protein structure by profile-based neural networks." Methods Enzymol 266: 525-39.

Rost, B. (1996). "PHD: predicting one-dimensional protein structure by profile based neural networks." Methods in Enzymology 266: 525-539.

Rost, B. (1997). Learning from evolution to predict protein structure. BCEC97: Bio-Computing and Emergent Computation, Skövde, Sweden, World Scientific.

Rost, B. (1999). "Twilight zone of protein sequence alignments." Protein Eng 12(2): 85-94.

Rost, B. (2002). "Enzyme function less conserved than anticipated." J Mol Biol 318(2): 595-608.

Rost, B., R. Casadio, et al. (1995). "Prediction of helical transmembrane segments at 95% accuracy." Protein Science 4: 521-533.

Rost, B. and J. Liu (2003). "The PredictProtein server." Nucleic Acids Res 31(13): 3300-4.

Rost, B., J. Liu, et al. (2003). "Automatic prediction of protein function." Cell Mol Life Sci 60(12): 2637-50.

Rost, B., S. O'Donoghue, et al. (1998). "Midnight zone of protein structure evolution." manuscript in preparation.

Rost, B. and C. Sander (1993). "Prediction of protein secondary structure at better than 70% accuracy." Journal of Molecular Biology 232: 584-599.

Rost, B. and C. Sander (1994). "Conservation and prediction of solvent accessibility in protein families." Proteins: Structure, Function, and Genetics 20(3): 216-226.

Rusch, S. L. and D. A. Kendall (1995). "Protein transport via amino-terminal targeting sequences: common themes in diverse systems." Mol Membr Biol 12(4): 295-307.

Rychlewski, L., B. Zhang, et al. (1999). "Functional insights from structural predictions: analysis of the Escherichia coli genome." Protein Science 8(3): 614-624.

Salton, G. (1989). Automatic Text Processing. Reading, MA., Addison-Wesley.

Sander, C. and R. Schneider (1991). "Database of homology-derived protein structures and the structural meaning of sequence alignment." Proteins 9(1): 56-68.

Sander, C. and R. Schneider (1991). "Database of homology-derived structures and the structural meaning of sequence alignment." Proteins: Structure, Function, and Genetics 9: 56-68.

Sander, C. and R. Schneider (1994). "The HSSP database of protein structure-sequence alignments." Nucleic Acids Research 22: 3597-3599.

Sarda, D., G. H. Chua, et al. (2005). "pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties." BMC Bioinformatics 6: 152.

Sayle, R. A. and E. J. Milner-White (1995). Trends in Biochemical Sciences 20: 37.

Schatz, G. and B. Dobberstein (1996). "Common principles of protein translocation across membranes." Science 271(5255): 1519-26.

Schneider, G. and U. Fechner (2004). "Advances in the prediction of protein targeting signals." Proteomics 4(6): 1571-80.

Schutze, H., D. A. Hull, et al. (1995). "A comparison of classifiers and document representation for the routing problem." 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95): 229-237.

Schwarz, E. and W. Neupert (1994). "Mitochondrial protein import: mechanisms, components and energetics." Biochim Biophys Acta 1187(2): 270-4.

Shah, I. and L. Hunter (1997). Predicting enzyme function from sequence: a systematic appraisal. Fifth International Conference on Intelligent Systems for Molecular Biology, Halkidiki, Greece, AAAI Press.

Shannon, C. E. (1951). "Prediction and entropy of printed English." Bell System Tech. J. 30: 50-64.

Simpson, J. C., R. Wellenreuther, et al. (2000). "Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing." EMBO Rep 1(3): 287-92.

Sirover, M. A. (1999). "New insights into an old protein: the functional diversity of mammalian glyceraldehyde-3-phosphate dehydrogenase." Biochim Biophys Acta 1432(2): 159-84.

Skach, W. R. (2000). "Defects in processing and trafficking of the cystic fibrosis transmembrane conductance regulator." Kidney Int 57(3): 825-31.

Small, I., N. Peeters, et al. (2004). "Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences." Proteomics 4(6): 1581-90.

Smith, T. F. (1998). "Functional genomics--bioinformatics is ready for the challenge." Trends Genet 14(7): 291-3.

Sonnhammer, E. L., S. R. Eddy, et al. (1997). "Pfam: a comprehensive database of protein domain families based on seed alignments." Proteins: Structure, Function, and Genetics 28(3): 405-420.

Sprenger, J., J. L. Fink, et al. (2006). "Evaluation and comparison of mammalian subcellular localization prediction methods." BMC Bioinformatics 7 Suppl 5: S3.

Stapley, B. J. and G. Benoit (2000). "Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts." Pac Symp Biocomput: 529-40.

Stapley, B. J., L. A. Kelley, et al. (2002). "Predicting the sub-cellular location of proteins from text using support vector machines." Pac Symp Biocomput: 374-85.

Stephens, M., M. Palakal, et al. (2001). "Detecting gene relations from Medline abstracts." Pac Symp Biocomput: 483-95.

Tamames, J., C. Ouzounis, et al. (1998). "EUCLID: automatic classification of proteins in functional classes by their database annotations." Bioinformatics 14(6): 542-3.

Tamames, J., C. Ouzounis, et al. (1996). "Genomes with distinct function composition." FEBS Lett 389(1): 96-101.

Thornton, J. M., C. A. Orengo, et al. (1999). "Protein folds, functions and evolution." J Mol Biol 293(2): 333-42.

Tinland, B., Z. Koukolikova-Nicola, et al. (1992). "The T-DNA-linked VirD2 protein contains two distinct functional nuclear localization signals." Proc Natl Acad Sci U S A 89(16): 7442-6.

Todd, A. E., C. A. Orengo, et al. (2001). "Evolution of function in protein superfamilies, from a structural perspective." J Mol Biol 307(4): 1113-43.

Truant, R. and B. R. Cullen (1999). "The arginine-rich domains present in human immunodeficiency virus type 1 Tat and Rev function as direct importin beta-dependent nuclear localization signals." Mol Cell Biol 19(2): 1210-7.

Tuparev, G., G. Vriend, et al. (1992). "GCI: A network server for interactive 3D graphics." J. Mol. Graph. 10: 12-16.

Valencia, A. and F. Pazos (2002). "Computational methods for the prediction of protein interactions." Curr Opin Struct Biol 12(3): 368-73.

Vapnik, V. N. (1995). The Nature of Statistical Learning Theory, Springer-Verlag.

Venter, J. C., M. D. Adams, et al. (2001). "The sequence of the human genome." Science 291(5507): 1304-51.

Vogt, G., T. Etzold, et al. (1995). "An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited." Journal of Molecular Biology 249: 816-831.

von Heijne, G. (1981). "On the hydrophobic nature of signal sequences." Eur. J. Biochem. 116: 419-422.

von Heijne, G. (1985). "Signal sequences. The limits of variation." J. Mol. Biol. 184: 99-105.

von Heijne, G. (1995). "Protein sorting signals: simple peptides with complex functions." Exs 73: 67-76.

Voos, W., H. Martin, et al. (1999). "Mechanisms of protein translocation into mitochondria." Biochim Biophys Acta 1422(3): 235-54.

Wall, L. and R. L. Schwartz (1990). Programming perl. Sebastopol, CA, O'Reilly & Associates, Inc.

Webb, E. C. (1992). Enzyme Nomenclature 1992. Recommendations of the Nomenclature committee of the International Union of Biochemistry and Molecular Biology. New York, Academic Press.

Weis, K. (1998). "Importins and exportins: how to get in and out of the nucleus." Trends Biochem Sci 23(5): 185-9.

Wilson, C. A., J. Kreychman, et al. (2000). "Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores." J Mol Biol 297(1): 233-49.

Wrzeszczynski, K. O. and B. Rost (2004). "Annotating proteins from endoplasmic reticulum and Golgi apparatus in eukaryotic proteomes." Cell Mol Life Sci 61(11): 1341-53.

Xie, D., A. Li, et al. (2005). "Using motifs in the prediction of eukaryotic protein subcellular localization." Conf Proc IEEE Eng Med Biol Soc 3: 2802-4.

Yang, A. S. and B. Honig (2000). "An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments." Journal of Molecular Biology 301(3): 691-711.

Yang, Y. and C. G. Chute (1992). "An application of least squares fit mapping to clinical classification." Proceedings - the Annual Symposium on Computer Applications in Medical Care: 460-4.

Yang, Y. and X. Liu (1999). "A re-examination of text categorisation methods." Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.(SIGIR'99): 42-49.

Yang, Y. and J. P. Pederson (1997). "A comparative study on feature selection in text categorization." The Fourteenth International Conference on Machine Learning: 412-420.

Yu, C. S., Y. C. Chen, et al. (2006). "Prediction of protein subcellular localization." Proteins 64(3): 643-51.

Zhu, H., M. Bilgin, et al. (2003). "Proteomics." Annu Rev Biochem 72: 783-812.

 


Appendix

I Glossary

cTP: chloroplast targeting peptide

DSSP: Database of Secondary Structure in Proteins, featuring automatic assignment of secondary structure and solvent accessibility from 3D co-ordinates.

EC: Enzyme commission classification of enzymes.

ER: endoplasmic reticulum

EVA: EVAluation of automatic protein structure prediction servers.

GO: GeneOntology, i.e. functional classification of proteins

HMM: Hidden-Markov model.

HSSP: Homology-derived Secondary Structure of Proteins,

InterPro: a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

LALI: length of sequence alignment of pair of aligned sequences.

LOC3d: of predicted subcellular localization for proteins with know 3D structure.

LOChom: database of predicted subcellular localization based on sequence homology.

LOCkey: database of predicted subcellular localization based on SWISS-PROT keywords.

LOCnet: neural network based prediction of subcellular localization.

LOCtarget: database of predicted subcellular localization for structural genomics targets.

LOCToGo: prediction of subcellular localization based on hierarchical SVM’s.

mTP: mitochondrial targeting peptides.

NLS: signals involved in targeting proteins to the nucleus.

NLSdb: database of nuclear localization signals (NLSs).

NN: neural network.

NNPSL: neural network based prediction of subcellular localization.

ORF: open reading frame (for simplicity we sometimes refer to ORFs from genome sequencing projects as 'proteins').

PDB: Protein Data Bank, the databank of 3-D biological macromolecular structure.

PEP: database of Predictions for Entire Proteomes

Pfam: a large collection of multiple sequence alignments and hidden Markov models covering many common protein families.

PHD: Profile based neural network prediction of secondary structure (PHDsec), solvent accessibility (PHDacc), and transmembrane helices (PHDhtm).

PIDE: percentage pairwise sequence identity of aligned sequences.

PredictNLS: prediction of nuclear localization signals (NLSs) using ‘in-silico’ mutagenesis.

PSI-BLAST: Position specific iterative BLAST.

Psort: prediction of subcellular localization based on sequence signals and amino acid composition.

SCOP: Structural Classification of Proteins, an expert-based classification and domain-dissection of protein structures.

SignalP: a neural network and HMM based prediction method of signal peptides.

SVM: support vector machine

SWISS-PROT: a curated database with protein sequences and functional annotations.

TargetP: a neural network and HMM based prediction method for N-terminal targeting peptides, including signal peptides (SPs), mitochondrial targeting peptides (mTPs) and chloroplast targeting peptides (cTPs).

TrEMBL: a collection of protein sequences automatic translated from EMBL nucleotide database.