Neural networks predict protein structure: hype or hit?
CUBIC, Dept. Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA, rost@columbia.edu
This article is published in P Frasconi & R Shamir (eds.) 'Artificial intelligence and heuristic methods in bioinformatics', 2003 © copyright IOS Press (2003). IOS Press is the only authorised source. All copying of this article including placing on another website requires the written permission of the copyright owner.
| Quote: | B Rost (2002) Neural networks predict protein structure: hype or hit? In 'Artificial intelligence and heuristic methods for bioinformatics' Paolo Frasconi and Ron Shamir (eds.) Amsterdam: IOS Press, 2003, 34-50 |
Neural networks have been applied to many pattern classification problems. Here, I review applications to the problem of predicting protein structure from protein sequence. Initially, many methods were apparently designed by researchers who just wanted a real-life application for their gadget. However, the competitiveness of the field separated the wheat from the chaff. Meanwhile, several neural network-based methods have contributed significantly to advancing the field of bio-informatics, and some are clearly influencing molecular biology. Today, a plethora of network methods is used in everyday sequence analysis, and an increasing number of applications explore very novel problems.
Proteins constitute life’s machinery. The first bacterial genome was sequenced in 1995 [1] ; the first mono-cellular eukaryote (Saccharomyces cerevisiae, yeast) followed in 1996 [2] . Meanwhile, we know the entire proteomes (all proteins in a genome) of various multi-cellular eukaryotes: Drosophila melanogaster (fly) [3] , Caenorhabditis elegans (worm) [4] , the plant Arabidopsis thaliana [5, 6, 7, 8] . The first drafts of the human genome [9, 10] have also been completed, however, one year later, we still do not know all human proteins [11] . Overall, more than 60 entire organisms have been sequenced over the last eight years. This avalanche of entirely sequenced organisms is exciting for biology because the genomes contain the blueprint for all parts of life’s machinery. The machinery itself consists of proteins that perform most important tasks in organisms (catalysis of biochemical reactions, transport of nutrients, recognition, and transmission of signals). Proteins are formed by joining 20 different amino acids (dubbed residues, when joined in proteins) into a stretched chain. In water, many proteins fold into unique three-dimensional (3D) structures. The main driving force is the need to pack residues for which a contact with water is energetically unfavourable (hydrophobic residues) into the interior of the molecule. This appears possible through the formation of a macroscopic substructure called secondary structure ( Fig. 1 ; for an introduction into protein structure, see: [12] ; for principles of folding, see: [13] ).
Sequence determines structure determines function. The world of proteins is governed by shape: interactions between proteins are mediated by the ‘key-hole’ principle, i.e., two proteins interact when they fit to one another like a key into a hole. Thus, protein structure determines protein function. What determines structure? All information about the native structure of a protein is coded in the amino acid sequence, plus its native solution environment [14] . Can we decipher the code, i.e., can we predict 3D structure from sequence, or in other words: Can we unboil the egg [15] ? In principle, the code could by deciphered from physico-chemical principles using, e.g., molecular dynamics [16, 17, 18] . In practice, such approaches are frustrated by principle obstacles [20, 21] . Furthermore, the last decade has unravelled that possibly most proteins do not adopt their native 3D structure in vitro, rather the need the cellular machinery to correctly fold in vivo [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33] .
Fig. 1. : Representation of HIV-1 protease (PDB code 1HHP) in 1D and in 3D. 1D: SEQ, sequence in one-letter code; SEC, secondary structure assignment (E for strand, blank for loop); note only the first 34 residues corresponding to the first strands (upper left arrows in 3D representation) are shown. 3D: The trace of the protein chain in 3D is plotted schematically as a ribbon Ca-trace (alpha carbons = backbone of protein). Strands are indicated by arrows, the short helix is on the right towards the end (C-term) of the protein. Graph made with MOLSCRIPT [19] .
State-of-the-art in protein structure prediction. For over 40 years, there has been an ardent search for methods predicting protein structure from sequence (reviews: [34, 35, 20, 21, 36, 37] ; books: [38, 39, 40] ). Many methods were found which looked initially very promising - but always the hope has been dashed [41] . How well do we do in practice? The following results stand out after four experiments initiated by John Moult (CARB, Washington) to explore the accuracy of structure prediction [42, 43, 44, 45]. The goal to predict structure from sequence has not been reached, yet. However, most recently approaches that assemble fragments have scored considerable successes [46, 37, 47]. Comparative modelling enables rather accurate predictions of 3D structure for proteins that have significant sequence similarity to proteins of known structure. This technique increases the number of known structures effectively by a factor of 5-30 [35, 20, 48, 36, 49, 50]. Methods addressed at solving simplified structure prediction problems, such as predictions of secondary structure, solvent accessibility, and inter-residue distances [35, 20, 51] have become significantly more accurate, and useful by using the information contained in growing sequence databases.
How can neural networks predict protein structure? In practice, the only successful attempts at predicting aspects of protein structure are based on an analysis of common features extracted from proteins of known structures. Neural networks comprise a particular tool for pattern classification ( Fig. 2 ) that – along with many others – has been applied to the problem of protein structure prediction [52, 53, 54, 55, 56, 57, 58] . We could write the history of neural network applications in four chapters. Initially, researchers applied black boxes, and searched improvements through optimising the internal free parameters (training speed, network architecture). These early applications were often not evaluated on representative data sets, and thus were scarcely used by biologists. (2) Later, researchers have opened the black box by extracting, or implementing rules, by carving specific knowledge into the networks, and by using networks to detect errors or outliers in data bases. (3) Then, the combination of neural networks with evolutionary information unleashed the full potential of the tool and thus established networks in the bioinformatics community. Incidentally, the successful prediction of protein secondary structure was one of the first examples for applications of neural networks in which these significantly exceeded the performance of all other systems and even of experts [63]. The last five years of network applications combined many of the lessons learned in the first decade of the tools' history: non-network-experts applied these methods as a standard element in a tool box, and network-experts developed new architectures tailored to particular problems. Here, I focused on discussing but a few representative applications of neural networks (for an excellent review of network applications: [58] ).
Simple neural network: The simplest layered feed-forward neural network consists of a layer of input units (here two), and a layer of output unit(s) (here one). Signals are transmitted from input to output layer (feed-forward) via the connections (J’s). The network dynamic consists of a linear and a non-linear step. (1) The value of each input unit (example: 0 for unit 1; 1 for unit 2) is multiplied with the strength of the connection; the products sum to a local field (h) representing the signal that arrives at the output unit. The multiplication represents a projection of the input vector onto the vector of the connections. (2) The final output is determined by applying a sigmoid function (shown is the hyperbolic tangent) to the local field. The result is that the output is constrained to values between 0 and 1. On the right hand side the potential of such a network is illustrated: the open, and the dark circles are separated by a line.
Two-layered neural network: Two open and two dark circles can obviously not be separated by a single straight line. Two lines would enable the separation, but how can a neural network introduce two lines? The simple trick is the introduction of a layer of hidden layers (hidden as neither input nor output). The dynamics of such a network are identical to the simple network without hidden layer.
Training a neural network: How can particular pattern classification problems be implemented? The input is fixed by the pattern, as well is the desired output. The output for a given set of connections is uniquely determined by the dynamics of the network described above. The actual network error can be written as:
E = (output – desired)2
The free variables that contain the potential of the network to learn a given problem are the connections between the layers of units. The simplest way to reduce the network error is by changing the connections according to the derivative of the error with respect to the connections, i.e., by a gradient descent that assures to move downhill in the error-landscape:
This is often referred to as back-propagating the error through the neural network [59, 60] . To avoid being trapped in local minima, in practise, the actual training is typically performed by a variant of this algorithm that permits up-hill moves (conjugate gradient descent [61, 62, 57] ).
Generalisation ability: With enough hidden units neural networks can learn to separate any set of patterns. Typical applications require to extract particular features (underlying rules) present in the patterns rather than to learn the known examples ‘by heart’. A successful extraction of such features permits the network to generalise, i.e., to also correctly classify patterns that have not been learned explicitly. Generalisation requires a balance between the number of training examples (enough to enable feature extraction), and the number of connections (enough to separate patterns). As a rule-of-thumb the number of connections should be an order of magnitude lower than the number of patterns to avoid over-fitting the training data (this learning-by-heart of the training set is also referred to as ‘over-training’).
No improvement in secondary structure prediction by black boxes. Secondary structure prediction methods distinguish between helix (H), strand (E: for extended structure), and other (L for loop). Some stretches of sequence show a particular preference to be in one of these three states. The prediction task is to classify w adjacent residues as either H, E, or L ( Fig. 3 ). Simple neural networks reached values around 60% accuracy (percentage of residues predicted correctly in any of the three states HEL) [64, 65, 66] . This was similar to the best methods 30 years of research had resulted in by the end of the 80's [67, 61, 68, 69, 35, 70, 51] . Attempts to improve performance by changing the network details failed [71, 53] . In contrast, combining neural networks with other methods succeeded to some extent [72] .
Prediction of functional class. Methods predicting functional similarities between proteins have been based (i) on multiple feed-forward networks [73] using proteins of similar sequences as input, (ii) on simple feed-forward networks using different amino acid features as input [74, 75, 58] , and (iii) on Kohonen maps [54] using the frequency with which any of the 20*20 possible residue pairs occurs in the sequence [76, 77, 78, 79] , or using the information extracted from database annotations [80, 81] . While feed-forward networks are useful to learn a classification into known features (e.g. types of secondary structure), Kohonen maps have been applied to render a general classification scheme of proteins (e.g. A and B, are similar, and A is more similar to C, than B). Such a classification is a priori not evident and by itself an area of controversy in active research, e.g., attempting to answer questions like: Are we more similar to an orang-utan than to a pig?. One hope guiding such analyses is to end up with similarities between proteins that might help to learn about details in bio-chemical reaction pathways. The neural network-based automatic annotation system [80, 81] has already been applied successfully to genome analysis [82] .
Prediction of surface exposure, and function-specific motifs. When attempting to arrange secondary structure segments in 3D, one needs to know to which extent a particular residue is exposed to solvent. Neural networks were used to classify amino acid residues as either buried or exposed [83] . Often protein function is associated with relatively short (5-10 residues) sequence motifs (unique pattern of adjacent amino acids). Examples for motifs found by neural networks include: (i) sequence motifs that reveal binding of energy storage molecules [52] , (ii) sequence motifs specific for particular proteins, e.g. the immunoglobulins [84] , and (iii) signal peptide motifs in sequences [85, 86] . The group of Søren Brunak (Copenhagen) has developed two methods of particular practical impact: (i) a system of neural networks predicting signal peptides and cleavage signals [87] , (ii) and a combination of rules, and networks predicting glycosilation sites [88] .
Fig. 3: Simple neural network for secondary structure prediction. For simplification the protein sequence given consists of two amino acid types (S and P). The protein sequence is translated into patterns by shifting a window of w adjacent residues (shown w = 5; typical values in practice are w = 13-21) through the protein. The output of the network is uniquely determined ( Fig. 2 ). Suppose the output would be: 0.2, 0.4, 0.5 for the three output states (H, E, L). For known examples the desired output is also known (1, 0, 0 if the central residue is in a helix). Consequently, the network error is given by the difference between actual network output and desired output. The only free variables are the connections. Training or learning means changing the connections such that the error decreases for the given examples. A training set typically comprises some 30,000 examples. If training is successful, the patterns are correctly classified. But how can new patterns be classified correctly? The hope is that the network succeeds in extracting general rules by the classification of the training patterns. The generalisation ability is checked by another set of test samples for which the mapping of sequence window to secondary structure is also known. Sufficient testing is crucial and requires (1) to remove any significant sequence similarity between test and training set, and (2) to evaluate the expected prediction accuracy on a sufficient number of test proteins (rule of thumb: > 100).
Extracting rules from, and implementing rules into neural networks. Genome sequences do contain some information about protein structure [89] . A prerequisite to uncover this result was to learn the genetic code by a neural network, i.e., the mapping between the four-letter alphabet of the nucleic acids (DNA), and the 20-letter alphabet of the amino acids (proteins). Analysing the rules learned by the network suggested evidence for a particular scenario for the evolution of the genetic code ( Fig. 4 ). In a similar attempt to extract rules by specific modulation of the training procedure Tchoumatchenko, Vissotsky and Ganascia extracted more complicated rules from networks that learned to predict secondary structure than were available by statistical analysis [90] . Unfortunately, that attempt did not improve performance. Maclin and Shavlik explored the opposite approach by incorporating expert rules into a neural network and thus improved performance over simple statistical devices [91, 92] . All these approaches proved that neural networks are not black boxes, but can become as ‘transparent’ as rule-based systems. The problem often has been to make use of the complex rules extracted.
Carving biology into neural networks. Two problems are common to most secondary structure prediction methods (including simple networks, Fig. 3 ): (1) strands are predicted at almost random levels of accuracy, (2) and predicted secondary structure segments are too short [61, 68] . The common explanation for the first problem was that strands are stabilised by long-range interactions not visible in a segment of 13-21 residues. The training dynamics of neural networks revealed that networks learned to classify helix, and loop ten times faster than strand [56] . Consequently, the idea was to simply increase the frequency in presenting strand residues during training. This change of the training dynamics improved strand accuracy significantly, indicating that the inferior prediction of strand did NOT result primarily from long-range interactions, but from technical problems. The second problem of predicting too short segments originates from the fact that the sliding window ( Fig. 3 ) erases the correlation between adjacent residues. This shortcoming was corrected by introducing a second level network [56] ( Fig. 5 ). Such a network system learned correlations between adjacent residues. These examples illustrate that neural networks can easily be tailored to particular problems.
Fig. 4: Learning the genetic code. The four-letter nucleic acid code from the genomes is translated into a 20-letter amino acid code from proteins. Three nucleic acids (dubbed one codon) code for one amino acid. This implies that the four nucleic acid can code for 4*4*4 = 64 amino acids, i.e., the code is redundant: some amino acids are coded for by more than one codon, and three codons are used for stop-signals during the translation procedure. The minimal network that learned the genetic code had two hidden units [93] . The four graphs represent the connections between the 20 input and the two hidden units. (1) The untrained network with randomly assigned weights locates all 61 points near the centre of the square. (2) After seven training epochs the points have moved into a transient local minimum, where the activities of the intermediate units are close to one and the activities of all the output units are close to zero. (3) At 30 epochs the groups have started to segregate, but are still mixed. (4) Finally at 13,000 epochs the network groups the 61 codons at the edge of the circular region. After the four epochs shown the number of correctly classified codons was 2, 6, 26 and 61, respectively. The final grouping separates hydrophobic residues (top: IMVPF) from hydrophilic (centre right and left: YQHKNEDR), and others (lower right: TSAGPCW). The figure is taken from [93] .
Fig. 5: Second level neural network [56]. (1) The window of w adjacent residues is shifted through the protein (here w = 5). For each window secondary structure is predicted for the central residue (shown three windows with central residues S, P, S). (2) The prediction of this first level network is fed into a second level network. This is again realised by shifting a window of w adjacent predictions through the protein (for the second level w = 3). The final prediction of secondary structure is valid for the central residue of the second window (here a P).
Detecting database errors during training. Neural networks generalise by extracting the underlying physico-chemical principles from the training data. Obviously, this requires a correct training set. Søren Brunak has pioneered the idea to unravel errors in the training set by monitoring samples that could not be learned even when the networks were trained until over-fitting the data [94, 95, 96, 97, 87, 98, 99] . This technique has not only been used successfully to identify errors, and inconsistencies in public databases, but also to improve the performance of the networks.
Long-range information in multiple sequence alignments. Some residue substitutions do not alter protein structure. However, not every amino acid can be replaced by any other. On the contrary, one evolutionary step (exchange of one residue) can destabilise a structure. Thus, residue substitution patterns observed in protein families are highly specific for particular details of protein structure and function, i.e. they contain more information about structure than do single sequences. Furthermore, multiple alignments of sequence families implicitly also carry information about interactions between residues separated by more than w residues in sequence. We can profit from this evolutionary information for structure prediction in the following way [56]. A sequence of unknown structure U is aligned against a database of known sequences. (2) Proteins with significant sequence identity to U are retrieved. (3) For each sequence position the profile of residue exchanges in the final multiple alignment is compiled, and fed into a network ( Fig. 6 ).
Significant improvement of secondary structure prediction. Using evolutionary information has improved secondary structure prediction accuracy from 65% to over 70% [100] , or even over 72% [56] . Such a profile based neural network system was the first method to surpass the magic line of 70% accuracy, and has proven to remain the most accurate method for almost a decade. Today's best methods still use the same idea of feeding evolutionary information into neural networks. The latest improvement to levels above 76% accuracy mainly resulted from larger databases and more sensitive search methods retrieving similar proteins from these larger databases [101, 102, 103, 104, 105, 51, 106] .
Evolutionary information improved accuracy in predicting solvent accessibility. Solvent accessibility at each position of the protein structure is evolutionarily conserved within sequence families. This fact has been used to develop another neural network method for predicting accessibility from multiple alignment information [56] . The final network system is clearly more accurate than methods not using alignment information, and has been established to be more accurate than other prediction methods. More recently, several groups have attempted to refine the concept by training networks to predict different implementations of the output state for solvent accessibility [88, 101] , by focussing on particular protein families [Stahl, 2000 #5932; Lebeda, 1998 #5933; Lebeda, 1998 #5934; Lebeda, 1997 #5936], or by predicting the number of contact partners for each amino acid [Fariselli, 2000 #4978; Fariselli, 2001 #5355; Pollastri, 2001 #5863; Dosztanyi, 1997 #5935].
Predicting transmembrane helices by combining networks with dynamic programming. The neural network system designed to predict secondary structure for water-soluble proteins failed in predicting helices inserted into the lipid-bilayer of membranes. However, the networks have been re-trained to also predict transmembrane helices [56] . Again information from multiple alignments improved prediction accuracy significantly [56, 35, 20] . The problem of predicting transmembrane helices is ideal to incorporate additional globular information into the prediction method. The principal idea is to regard the neural network prediction as an energy landscape and to search the best path through this landscape given that transmembrane helices are constrained to a minimal and a maximal length. The final system has achieved a significantly higher accuracy than the simple neural network-based system, and has been applied to analysing entire genomes [107, 108, 109, 49, 33] .
Fig. 6: Feeding evolutionary information into a neural network system [56]. (1) A sequence family is aligned (shown are the sequence of unknown structure and three aligned relatives). (2) For each sequence position of profile is compiled that gives the percentage of S, or P in the alignment (shown in centre for window of five adjacent residues). (3) Instead of using binary input units (0 or 1), now the profile is fed into the first neural network. (4) Finally, the output is again fed into a second level network ( Fig. 5 ).
Avalanche of applications with standard architectures. The first neural networks were applied to protein structure prediction in 1988 [64, 65] . A decade later, networks have become a standard method that is tested on problems in bioinformatics ( Table 1 ). Particular recent examples are networks that discover motifs on 3D structures [110] , distinguish good from bad drug targets [111] , predict binding motifs [112, 113, 114, 115, 116] , biological activity [117, 118] , post-translational modifications [97, 119, 87, 88, 120, 99, 121, 122, 123] , particular protein types [124] , domains [125] folding rates [126] , disordered proteins [127, 128, 129, 130] , and even substitution matrices [131] . The major strength of networks for many of these applications appears to be that they can readily be adapted to particular problems ( Table 1 ). Two extreme examples of neural network applications are to replace first order statistics when averaging over a variety of scores, i.e. networks with very few input units that use results from a variety of threading methods to identify remote similarities between proteins [132, 133] . The opposite extreme is to generate thousand different networks each specialised on some aspects of the secondary structure prediction problem and to then statistically average over the outputs of these networks [103] .
| 1. | Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A.,Kirkness, E. F. et al. (1995). Whole-genome random sequencing and assembly ofHaemophilus influenzae Rd. Science, 269, 496-512. |
| 2. | Goffeau, A., Barrell, B. G., Bussey,H., Davis, R. W., Dujon, B. et al. (1996). Life with 6000 genes. Science, 274, 546-567. |
| 3. | Adams, M. D., Celniker, S. E., Holt,R. A., Evans, C. A., Gocayne, J. D. et al. (2000). The genome sequence ofDrosophila melanogaster. Science, 287, 2185-2195. |
| 4. | The C. elegans Sequencing Consortium(1998). Genome sequence of the nematode C. elegans: a platform forinvestigating biology. Science, 282, 2012-2018. |
| 5. | Arabidopsis Genome Initiative (2000).Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408, 796-815. |
| 6. | Salanoubat, M., Lemcke, K., Rieger,M., Ansorge, W., Unseld, M. et al. (2000). Sequence and analysis of chromosome3 of the plant Arabidopsis thaliana. Nature,408, 820-822. |
| 7. | Tabata, S., Kaneko, T., Nakamura,Y., Kotani, H., Kato, T. et al. (2000). Sequence and analysis of chromosome 5of the plant Arabidopsis thaliana. Nature,408, 823-826. |
| 8. | Theologis, A., Ecker, J. R., Palm,C. J., Federspiel, N. A., Kaul, S. et al. (2000). Sequence and analysis ofchromosome 1 of the plant Arabidopsis thaliana. Nature, 408, 816-820. |
| 9. | The genome international sequencingconsortium (2001). Initial sequencing and analysis of the human genome. Nature, 409, 860-921. |
| 10. | Venter, J. C., Adams, M. D., Myers,E. W., Li, P. W., Mural, R. J. et al. (2001). The Human genome. Science, 291, 1304-1351. |
| 11. | O'Donovan, C., Apweiler, R. &Bairoch, A. (2001). The human proteomics initiative (HPI). TIBTECH, 19, 178-181. |
| 12. | Brändén, C. &Tooze, J. (1991). Introduction to Protein Structure. Garland Publ., New York,London. |
| 13. | Lattman, E. E. & Rose, G. D.(1993). Protein folding-what's the question? Proc. Natl. Acad. Sci. U.S.A., 90, 439-441. |
| 14. | Anfinsen, C. B. (1973). Principlesthat govern the folding of protein chains. Science, 181, 223-230. |
| 15. | Perutz, M. F. (1940)."Unboiling" an egg. Discovery, May, reprint in Jaenicke, Rainer: Protein Folding. Amsterdam, New York:Elsevier, 1980, p. 14. |
| 16. | Levitt, M. & Warshel, A.(1975). Computer simulation of protein folding. Nature, 253, 694-698. |
| 17. | Hagler, A. T. & Honig, B.(1978). On the formation of protein tertiary structure on a computer. Proc.Natl. Acad. Sci. U.S.A., 75, 554-558. |
| 18. | van Gunsteren, W. F. (1993).Molecular dynamics studies of proteins. Curr. Opin. Str. Biol., 3, 167-174. |
| 19. | Kraulis, P. (1991). MOLSCRIPT: aprogram to produce both detailed and schematic plots of protein structures. J.Appl. Cryst., 24,946-950. |
| 20. | Rost, B. & O'Donoghue, S. I.(1997). Sisyphus and prediction of protein structure. CABIOS, 13, 345-356. |
| 21. | Rost, B. (1998). Protein structureprediction in 1D, 2D, and 3D. In The Encyclopaedia of Computational Chemistry(Schleyer, P. v. R., Allinger, N. L., Clark, T., Gasteiger, J., Kollman, P. A.et al., eds.), pp. 2242-2255, John Wiley & Sons, Chichester. |
| 22. | Hubbard, T. J. P. & Sander, C.(1991). The role of heat-shock and chaperone proteins in protein folding:possible molecular mechanisms. Prot. Engin.,4, 711-717. |
| 23. | Joachimiak, A. (1997). Capturingthe misfolds: chaperone-peptide-binding motifs. Nat. Struct. Biol., 4, 430-434. |
| 24. | Martin, J. & Hartl, F. U.(1997). Chaperone-assisted protein folding. Curr. Opin. Str. Biol., 7, 41-52. |
| 25. | Netzer, W. J. & Hartl, F. U.(1997). Recombination of protein domains facilitated by co-translationalfolding in eukaryotes. Nature, 388, 343-349. |
| 26. | Ellis, R. J., Dobson, C. &Hartl, U. (1998). Sequence does specify protein conformation. TIBS, 23, 468. |
| 27. | Wright, P. E. & Dyson, H. J.(1999). Intrinsically unstructured proteins: re-assessing the proteinstructure-function paradigm. J. Mol. Biol.,293, 321-331. |
| 28. | Gottesman, M. E. & Hendrickson,W. A. (2000). Protein folding and unfolding by Escherichia coli chaperones andchaperonins. Curr. Opin. Microbiol., 3, 197-202. |
| 29. | Sanders, C. R. & Nagy, J. K.(2000). Misfolding of membrane proteins in health and disease: the lady or thetiger? Curr. Opin. Str. Biol., 10, 438-442. |
| 30. | Dobson, C. M. (2001). Thestructural basis of protein folding and its links with human disease. PhilosTrans R Soc Lond B Biol Sci, 356, 133-145. |
| 31. | Dunker, A. K., Lawson, J. D.,Brown, C. J., Williams, R. M., Romero, P. et al. (2001). Intrinsicallydisordered protein. J Mol Graph Model, 19, 26-59. |
| 32. | Fandrich, M., Fletcher, M. A. &Dobson, C. M. (2001). Amyloid fibrils from muscle myoglobin. Nature, 410, 165-166. |
| 33. | Liu, J., Tan, H. & Rost, B.(2002). Eukaryotes full of loopy proteins? J. Mol. Biol.,submitted. |
| 34. | Barton, G. J. (1995). Proteinsecondary structure prediction. Curr. Opin. Str. Biol., 5, 372-376. |
| 35. | Rost, B. & Sander, C. (1996).Bridging the protein sequence-structure gap by structure predictions. Annu.Rev. Biophys. Biomol. Struct., 25, 113-136. |
| 36. | Marti-Renom, M. A., Stuart, A.,Fiser, A., Sanchez, R., Melo, F. et al. (2000). Comparative protein structuremodeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct., 29, 291-325. |
| 37. | Bonneau, R. & Baker, D. (2001).Ab initio protein structure prediction: progress and prospects. Annu. Rev.Biophys. Biomol. Struct., 30, 173-189. |
| 38. | Doolittle, R. F. (1996). Computermethods for macromolecular sequence analysis. Academic Press, San Diego. |
| 39. | Sternberg, M. J. E. (1996). Proteinstructure prediction. Oxford Univ. Press, Oxford. |
| 40. | Baldi, P. & Brunak, S. (2001).Bioinformatics: the machine learning approach. MIT Press, Cambridge. |
| 41. | Honig, B. & Cohen, F. E.(1996). Adding backbone to protein folding: why proteins are polypeptides. Folding& Design, 1,R17-R20. |
| 42. | CASP1 (1995). Special issue: FirstMeeting on Critical Assessment of Protein Structure prediction (CASP). Proteins, 23. |
| 43. | CASP2 (1997). Special issue: SecondMeeting on Critical Assessment of Protein Structure prediction (CASP). Proteins, Suppl. 2. |
| 44. | CASP3 (1999). Special issue: ThirdMeeting on Critical Assessment of Protein Structure prediction (CASP). Proteins, Suppl. 2. |
| 45. | CASP4WWW (2000). Fourth meeting onthe critical assessment of techniques for protein structure prediction.Prediction Center, Lawrence Livermore National Lab, WWW document:http://PredictionCenter.llnl.gov/casp4/Casp4.html. |
| 46. | Bystroff, C., Thorsson, V. &Baker, D. (2000). HMMSTR: a hidden Markov model for local sequence-structurecorrelations in proteins. J. Mol. Biol., 301, 173-190. |
| 47. | Lesk, A. M., Lo Conte, L. &Hubbard, T. J. P. (2001). Assessment of novel folds targets in CASP4:Predictions of three-dimensional structures, secondary structures, andinterresidue contacts. Proteins,in press. |
| 48. | Teichmann, S. A., Chothia, C. &Gerstein, M. (1999). Advances in structural genomics. Curr. Opin. Str. Biol., 9, 390-399. |
| 49. | Liu, J. & Rost, B. (2001).Comparing function and structure between entire proteomes. Prot. Sci., 10, 1970-1979. |
| 50. | Liu, J. & Rost, B. (2002).Target space for structural genomics revisited. Bioinformatics,in press. |
| 51. | Rost, B. (2001). Protein secondarystructure prediction continues to rise. J. Struct. Biol., 134, 204-218. |
| 52. | Hirst, J. D. & Sternberg, M. J.E. (1991). Prediction of ATP-binding motifs a comparison of a perceptron-typeneural network and a consensus sequence method. Prot. Engin., 4, 615-623. |
| 53. | Presnell, S. R. & Cohen, F. E.(1993). Artificial neural networks for pattern recognition in biochemicalsequences. Annu. Rev. Biophys. Biomol. Struct.,22, 283-298. |
| 54. | Arbib, M. (1995). The handbook ofbrain theory and neural networks. Bradford Books/The MIT Press, Cambridge, MA. |
| 55. | Fiesler, E. & Beale, R. (1996).Handbook of Neural Computation. Oxford Univ. Press, New York. |
| 56. | Rost, B. (1996). PHD: predictingone-dimensional protein structure by profile based neural networks. Meth.Enzymol., 266,525-539. |
| 57. | Rost, B. (1996). NN which predictsprotein secondary structure. In Handbook of Neural Computation (Fiesler, E.& Beale, R., eds.), pp. G4.1, Oxford Univ. Press, New York. |
| 58. | Wu, C. H. (1997). Artificial neuralnetworks for molecular sequence analysis. Comput. Chem., 21, 237-256. |
| 59. | Müller, B. & Reinhardt, J.(1990). Neural Networks. Springer, Berlin, F.R.G. |
| 60. | Hertz, J. A., Krogh, A. &Palmer, R. G. (1991). Introduction to the theory of neural computation.Addison-Wesley, Redwood City, C.A., U.S.A. |
| 61. | Rost, B. & Sander, C. (1993).Prediction of protein secondary structure at better than 70% accuracy. J.Mol. Biol., 232,584-599. |
| 62. | Rost, B. & Sander, C. (1995).Protein structure prediction by neural networks. In The handbook of braintheory and neural networks (Arbib, M., eds.), pp. 772-775, Bradford Books/TheMIT Press, Cambridge, MA. |
| 63. | Rost, B. & Sander, C. (1992).Jury returns on structure prediction. Nature,360, 540. |
| 64. | Bohr, H., Bohr, J., Brunak, S.,Cotterill, R. M. J., Lautrup, B. et al. (1988). Protein secondary structure andhomology by neural networks. FEBS Lett., 241, 223-228. |
| 65. | Qian, N. & Sejnowski, T. J.(1988). Predicting the secondary structure of globular proteins using neuralnetwork models. J. Mol. Biol., 202, 865-884. |
| 66. | Holley, H. L. & Karplus, M.(1989). Protein secondary structure prediction with a neural network. Proc.Natl. Acad. Sci. U.S.A., 86, 152-156. |
| 67. | Rost, B. & Sander, C. (1992).Exercising multi-layered networks on protein secondary structure. In NeuralNetworks: From Biology to High Energy Physics (Benhar, O., Brunak, S.,DelGiudice, P. & Grandolfo, M., eds.), pp. 209-220, International Journalof Neural Systems, Elba, Italy. |
| 68. | Rost, B., Sander, C. &Schneider, R. (1993). Progress in protein structure prediction? TIBS, 18, 120-123. |
| 69. | Rost, B. & Sander, C. (1994).Structure prediction of proteins - where are we now? Curr. Opin. Biotech., 5, 372-380. |
| 70. | Rost, B. & Sander, C. (2000).Third generation prediction of secondary structure. Meth. Mol. Biol., 143, 71-95. |
| 71. | Hirst, J. D. & Sternberg, M. J.E. (1992). Prediction of structural and functional features of protein andnucleic acid sequences by artificial neural networks. Biochem., 31, 615-623. |
| 72. | Zhang, X., Mesirov, J. P. &Waltz, D. L. (1992). Hybrid system for protein secondary structure prediction. J.Mol. Biol., 225,1049-63. |
| 73. | Frishman, D. & Argos, P.(1992). Recognition of distantly related protein sequences using conservedmotifs and neural networks. J. Mol. Biol.,228, 951-962. |
| 74. | Wu, C., Whitson, G., McLarty, J.,Ermongkonchai, A. & Chang, T.-C. (1992). Protein classification artificialneural system. Prot. Sci., 1, 667-677. |
| 75. | Wu, C. H., Zhao, S., Chen, H.-L.,Lo, C.-J. & McLarty, J. (1996). Motif identification neural design forrapid and sensitive protein family search. CABIOS, 12, 109-118. |
| 76. | Ferrán, E. A. & Ferrara,P. (1991). Topological maps of protein sequences. Biol. Cybern., 65, 451-458. |
| 77. | Ferrán, E. & Ferrara, P.(1992). Clustering proteins into families using artificial neural networks. CABIOS, 8, 39-44. |
| 78. | Ferrán, E. & Ferrara, P.(1992). A neural network dynamics that resembles protein evolution. PhysicaA, 185, 395-401. |
| 79. | Ferrán, E. A. &Pflugfelder, B. (1993). A hybrid method to cluster protein sequences based onstatistics and artificial neural networks. CABIOS, 9, 671-680. |
| 80. | Andrade, M. A., Casari, G., Sander,C. & Valencia, A. (1997). Classification of protein families and detectionof the determinant residues with an improved self-organizing map. Biol.Cybern., 76,441-450. |
| 81. | Andrade, M. A. & Valencia, A.(1997). Automatic annotation for biological sequences by extraction of keywordsfrom MEDLINE abstracts. Development of a prototype system. In FifthInternational Conference on Intelligent Systems for Molecular Biology(Gaasterland, T., Karp, P., Karplus, K., Ouzounis, C., Sander, C. et al.,eds.), pp. 25-32, AAAI Press, Halkidiki, Greece. |
| 82. | Ouzounis, C., Casari, G., Sander,C., Tamames, J. & Valencia, A. (1996). Computational comparisons of modelgenomes. TIBTECH, 14, 280-285. |
| 83. | Holbrook, S. R., Muskal, S. M.& Kim, S.-H. (1990). Predicting surface exposure of amino acids fromprotein sequence. Prot. Engin., 3, 659-665. |
| 84. | Bengio, Y. & Pouliot, Y.(1990). Efficient recognition of immunglobulin domains from amino acidsequences using a neural network. CABIOS, 6, 319-324. |
| 85. | Ladunga, I., Czakó, F.,Csabai, I. & Geszti, T. (1991). Improving signal peptide predictionaccuracy by simulated neural network. CABIOS,7, 485-487. |
| 86. | Schneider, G. & Wrede, P.(1993). Development of artificial neural filters for pattern recognition inprotein sequences. J. Mol. Evol., 36, 586-595. |
| 87. | Nielsen, H., Engelbrecht, J.,Brunak, S. & von Heijne, G. (1997). A neural network method foridentification of prokaryotic and eukaroytoic signal peptides and prediction oftheir cleavage sites. Internationl Journal of Neural Systems, 8, 581-599. |
| 88. | Hansen, J., Lund, O., Tolstrup, N.,Gooley, A. A., Williams, K. L. et al. (1998). NetOglyc: Prediction of mucin typeO-glycosylation sites based on sequence context and surface accessibility. GlycoconjugateJournal, 15,115-130. |
| 89. | Brunak, S. & Engelbrecht, J.(1996). Protein structure and the sequential structure of mRNA: a-helix and b-sheet signalsat the nucleotide level. Proteins, 25, 237-252. |
| 90. | Tchoumatchenko, I., Vissotsky, F.& Ganascia, J.-G. (1993). How to make explicit a neural network trained topredict proteins secondary structure. ACASA, LAFORIA-CNRS, UniversitéParis VI, 4 Place Jussieu, 75 252 Paris, CEDEX 05, France. |
| 91. | Maclin, R. & Shavlik, J. W.(1992). Refining Algorithms with Knowledge-Based Neural Networks: Improving theChou-Fasman Algorithm for Protein Folding. In Computational Learning Theory andNatural Learning Systems (Hanson, S., Drostal, G. & Rivest, R., eds.), pp.MIT Press, Cambridge, MA. |
| 92. | Maclin, R. & Shavlik, J. W.(1993). Using knowledge-based neural networks to improve algorithms: refiningthe Chou-Fasman algorithm for protein folding. Machine Learning, 11, 195-215. |
| 93. | Tolstrup, N., Toftgård, J.,Engelbrecht, J. & Brunak, S. (1994). Neural network model of the geneticcode is strongly correlated to the GES scale of amino acid transfer freeenergies. J. Mol. Biol., 243, 816-820. |
| 94. | Brunak, S., Engelbrecht, J. &Knudsen, S. (1990). Neural network detects errors in the assignment of mRNAsplice sites. Nucl. Acids Res., 18, 4797-4801. |
| 95. | Brunak, S. (1991). Non-linearitiesin training sets identified by inspecting the order in which neural networkslearn. In Neural Networks From Biology to High Energy Physics (Benhar, O.,Bosio, C., Del Giudice, P. & Tabet, E., eds.), pp. 277-288, ETS EditricePisa, Elba, Italy. |
| 96. | Brunak, S., Engelbrecht, J. &Knudsen, S. (1991). Prediction of human mRNA donor and acceptor sites from theDNA sequence. J. Mol. Biol., 220. |
| 97. | Blom, N., Hansen, J., Blaas, D.& Brunak, S. (1996). Cleavage site analysis in picornaviral polyproteins:discovering cellular targets by neural networks. Prot. Sci., 5, 2203-2216. |
| 98. | Blom, N., Gammeltoft, S. &Brunak, S. (1999). Sequence and structure-based prediction of eukaryoticprotein phosphorylation sites. J. Mol. Biol.,294, 1351-1362. |
| 99. | Nielsen, H., Brunak, S. & vonHeijne, G. (1999). Machine learning approaches for the prediction of signalpeptides and other protein sorting signals. Prot. Engin., 12, 3-9. |
| 100. | Riis, S. K. & Krogh, A.(1996). Improving prediction of protein secondary structure using structuredneural networks and multiple sequence alignments. J. Comp. Biol., 3, 163-183. |
| 101. | Cuff, J. A. & Barton, G. J.(2000). Application of multiple sequence alignment profiles to improve proteinsecondary structure prediction. Proteins, 40, 502-511. |
| 102. | King, R. D., Ouali, M., Strong, A.T., Aly, A., Elmaghraby, A. et al. (2000). Is it better to combine predictions?Prot. Engin., 13,15-19. |
| 103. | Petersen, T. N., Lundegaard, C.,Nielsen, M., Bohr, H., Bohr, J. et al. (2000). Prediction of protein secondarystructure at 80% accuracy. Proteins, 41, 17-20. |
| 104. | Andersen, C. A., Bohr, H. &Brunak, S. (2001). Protein secondary structure: category assignment andpredictability. FEBS Lett., 507, 6-10. |
| 105. | Pollastri, G., Przybylski, D.,Rost, B. & Baldi, P. (2001). Improving the prediction of protein secondarystructure in three and eight classes using recurrent neural networks andprofiles. Proteins,in press. |
| 106. | Przybylski, D. & Rost, B.(2002). Alignments grow, secondary structure prediction improves. Proteins, 46, 195-205. |
| 107. | Rost, B., Casadio, R., Fariselli,P. & Sander, C. (1995). Prediction of helical transmembrane segments at 95%accuracy. Prot. Sci., 4, 521-533. |
| 108. | Rost, B., Casadio, R. &Fariselli, P. (1996). Refining neural network predictions for helicaltransmembrane proteins by dynamic programming. In Fourth InternationalConference on Intelligent Systems for Molecular Biology (States, D., Agarwal,P., Gaasterland, T., Hunter, L. & Smith, R. F., eds.), pp. 192-200, MenloPark, CA: AAAI Press, St. Louis, M.O., U.S.A. |
| 109. | Rost, B., Casadio, R. & Fariselli,P. (1996). Topology prediction for helical transmembrane proteins at 86%accuracy. Prot. Sci., 5, 1704-1718. |
| 110. | Fetrow, J. S., Palumbo, M. J.& Berg, G. (1997). Patterns, structures, and amino acid frequencies instructural building blocks, a protein secondary structure classificationscheme. Proteins, 27, 249-271. |
| 111. | Frimurer, T. M., Bywater, R.,Naerum, L., Lauritsen, L. N. & Brunak, S. (2000). Improving the odds indiscriminating "drug-like" from "non drug-like" compounds. JChem Inf Comput Sci, 40, 1315-1324. |
| 112. | Gulukota, K., Sidney, J., Sette,A. & DeLisi, C. (1997). Two complementary methods for predicting peptidesbinding major histocompatibility complex molecules. J. Mol. Biol., 267, 1258-1267. |
| 113. | Lebeda, F. J. & Olson, M. A.(1997). Predicting differential antigen-antibody contact regions based onsolvent accessibility. J. Prot. Chem., 16, 607-618. |
| 114. | Stahl, M., Taroni, C. &Schneider, G. (2000). Mapping of protein surface cavities and prediction ofenzyme class by a self-organizing neural network. Prot. Engin., 13, 83-88. |
| 115. | Gulukota, K. & DeLisi, C.(2001). Neural network method for predicting peptides that bind majorhistocompatibility complex molecules. Meth. Mol. Biol., 156, 201-209. |
| 116. | Mlinsek, G., Novic, M., Hodoscek,M. & Solmajer, T. (2001). Prediction of enzyme binding: human thrombininhibition study by quantum chemical and artificial intelligence methods basedon X-ray structures. J Chem Inf Comput Sci,41, 1286-1294. |
| 117. | Honeyman, M. C., Brusic, V.,Stone, N. L. & Harrison, L. C. (1998). Neural network-based prediction ofcandidate T-cell epitopes. Nat. Biotechnol.,16, 966-969. |
| 118. | Wrede, P., Landt, O., Klages, S.,Fatemi, A., Hahn, U. et al. (1998). Peptide design aided by neural networks:biological activity of artificial signal peptidase I cleavage sites. Biochem., 37, 3588-3593. |
| 119. | Nielsen, H., Engelbrecht, J.,Brunak, S. & von Heijne, G. (1997). Identification of prokaryotic andeukaryotic signal peptides and prediction of their cleavage sites. Prot.Engin., 10, 1-6. |
| 120. | Gupta, R., Jung, E., Gooley, A.A., Williams, K. L., Brunak, S. et al. (1999). Scanning the availableDictyostelium discoideum proteome for O-linked GlcNAc glycosylation sites usingneural networks. Glycobiology, 9, 1009-1022. |
| 121. | Herrmann, J. L., Delahay, R.,Gallagher, A., Robertson, B. & Young, D. (2000). Analysis ofpost-translational modification of mycobacterial proteins using a cassetteexpression system. FEBS Lett., 473, 358-362. |
| 122. | Nakai, K. (2000). Protein sortingsignals and prediction of subcellular localization. Adv Protein Chem, 54, 277-344. |
| 123. | Nakai, K. (2001). Review:prediction of in vivo fates of proteins in the era of genomics and proteomics. J.Struct. Biol., 134,103-116. |
| 124. | Gurvitz, A., Langer, S., Piskacek,M., Hamilton, B., Ruis, H. et al. (2000). Predicting the function andsubcellular location of Caenorhabditis elegans proteins similar toSaccharomyces cerevisiae beta-oxidation enzymes. Yeast, 17, 188-200. |
| 125. | Murvai, J., Vlahovicek, K.,Szepesvari, C. & Pongor, S. (2001). Prediction of protein functionaldomains from sequences using artificial neural networks. Genome Res., 11, 1410-1417. |
| 126. | Zhou, H. & Zhou, Y. (2002).Folding Rate Prediction Using Total Contact Distance. Biophys. J., 82, 458-463. |
| 127. | Garner, E., Cannon, P., Romero,P., Obradovic, Z. & Dunker, A. K. (1998). Predicting disordered regionsfrom amino acid sequence: common themes despite differing structuralcharacterization. Genome Inform., 9, 201-214. |
| 128. | Romero, P., Obradovic, Z.,Kissinger, C., Villafranca, J. E., Garner, E. et al. (1998). Thousands ofproteins likely to have long disordered regions. Pac. Symp. Biocomput., 3, 437-448. |
| 129. | Romero, P., Obradovic, Z. &Dunker, A. K. (1999). Folding minimal sequences: the lower bound for sequencecomplexity of globular proteins. FEBS Lett.,462, 363-367. |
| 130. | Iakoucheva, L. M., Kimzey, A. L.,Masselon, C. D., Bruce, J. E., Garner, E. C. et al. (2001). Identification ofintrinsic order and disorder in the DNA repair protein XPA. Prot. Sci., 10, 560-571. |
| 131. | Lin, K., May, A. C. & Taylor,W. R. (2001). Amino acid substitution matrices from an artificial neuralnetwork model. J. Comp. Biol., 8, 471-481. |
| 132. | Jones, D. T. (1999). GenTHREADER:an efficient and reliable protein fold recognition method for genomicsequences. J. Mol. Biol., 287, 797-815. |
| 133. | Lundstrom, J., Rychlewski, L.,Bujnicki, J. & Elofsson, A. (2001). Pcons: a neural-network-based consensuspredictor that improves fold recognition. Prot. Sci., 10, 2354-2362. |
| 134. | McGregor, M. J., Flores, T. P.& Sternberg, M. J. E. (1989). Prediction of beta-turns in proteins usingneural networks. Prot. Engin., 2, 521-526. |
| 135. | Bohr, H., Bohr, J., Brunak, S.,Fredholm, H., Lautrup, B. et al. (1990). A novel approach to prediction of the3-dimensional structures of protein backbones by neural networks. FEBS Lett., 261, 43-46. |
| 136. | Bossa, F. & Pascarella, S.(1990). PRONET: a microcomputer program for predicting the secondary structureof proteins with a neural network. CABIOS, 5, 319-320. |
| 137. | Kneller, D. G., Cohen, F. E. &Langridge, R. (1990). Improvements in Protein Secondary Structure Prediction byan Enhanced Neural Network. J. Mol. Biol.,214, 171-182. |
| 138. | Muskal, S. M., Holbrook, S. R.& Kim, S.-H. (1990). Prediction of the disulfide-bonding state of cysteinein proteins. Prot. Engin., 3, 667-672. |
| 139. | Hayward, S. & Collins, J. F.(1992). Limits on a-helix prediction with neural network models. Proteins, 14, 372-381. |
| 140. | Shavlik, J. W., Towell, G. G.& Noordewier, M. O. (1992). Using neural networks to refine existingbiological knowledge. Int. J. Genome Res., 1, 81-107. |
| 141. | Muskal, S. M. & Kim, S.-H.(1992). Predicting protein secondary structure content. A tandem neural network approach. J.Mol. Biol., 225,713-727. |
| 142. | Andrade, M. A., Chacón, P.,Merelo, J. J. & Morán, F. (1993). Evaluation of secondary structureof proteins from UV circular dichroism spectra using an unsupervised learningneural network. Prot. Engin., 6, 383-390. |
| 143. | Dubchak, I., Holbrook, S. R. &Kim, S.-H. (1993). Prediction of protein folding class from amino acidcomposition. Proteins, 16, 79-91. |
| 144. | Fariselli, P., Compiani, M. &Casadio, R. (1993). Predicting secondary structures of membrane proteins withneural networks. Eur. Biophys. J., 22, 41-51. |
| 145. | Metfessel, B. A., Saurugger, P.N., Connelly, D. P. & Rich, S. S. (1993). Cross-validation of proteinstructural class prediction using statistical clustering and neural networks. Prot.Sci., 2, 1171-1182. |
| 146. | Reczko, M. (1993). Proteinsecondary structure prediction with partially recurrent neural networks. InFirst International Workshop on Neural Networks Applied to Chemistry andEnvironmental Sciences eds.), pp. 153-159, Gordon and Breach Science Publ.,Lyon, France. |
| 147. | Rost, B. & Sander, C. (1993).Improved prediction of protein secondary structure by use of sequence profilesand neural networks. Proc. Natl. Acad. Sci. U.S.A., 90, 7558-7562. |
| 148. | Rost, B. & Sander, C. (1993).Secondary structure prediction of all-helical proteins in two states. Prot.Engin., 6, 831-836. |
| 149. | Rost, B. & Sander, C. (1994).Combining evolutionary information and neural networks to predict proteinsecondary structure. Proteins, 19, 55-72. |
| 150. | Rost, B., Sander, C. &Schneider, R. (1994). PHD - an automatic server for protein secondary structureprediction. CABIOS, 10, 53-60. |
| 151. | Sasagawa, F. & Tajima, K.(1993). Prediction of protein secondary structures by a neural network. CABIOS, 9, 147-152. |
| 152. | Casadio, R., Fariselli, P.,Taroni, C. & Compiani, M. (1994). A predictor of transmembrane a-helix domainsof proteins based on neural networks. European Journal of Biophysics,submitted, 8/94. |
| 153. | Dombi, G. W. & Lawrence, J.(1994). Analysis of protein transmembrane helical regions by a neural network. Prot.Sci., 3, 557-566. |
| 154. | Rost, B. & Sander, C. (1994).Conservation and prediction of solvent accessibility in protein families. Proteins, 20, 216-226. |
| 155. | Barlow, T. W. (1995). Feed-forwardneural networks for secondary structure prediction. J. Mol. Graph., 13, 175-183. |
| 156. | Casadio, R., Compiani, M.,Fariselli, P. & Vivareli, F. (1995). Predicting free energy contributionsto the conformational stability of folded proteins from the residue sequencewith radial basis function networks. In Third International converence onIntelligent Systems for Molecular Biology (ISMB) (Rawlings, C., Clark, D.,Altman, R., Hunter, L., Lengauer, T. et al., eds.), pp. 81-88, AAAI Press,Cambridge, England. |
| 157. | Chandonia, J.-M. & Karplus, M.(1995). Neural networks for secondary structure and structural classpredictions. Prot. Sci., 4, 275-285. |
| 158. | Grossman, T., Farber, R. &Lapedes, A. (1995). Neural net representations of empirical protein potentials.In Third International conference on Intelligent Systems for Molecular Biology(ISMB) (Rawlings, C., Clark, D., Altman, R., Hunter, L., Lengauer, T. et al.,eds.), pp. 154-161, AAAI Press, Cambridge, England. |
| 159. | Hansen, J. E., Lund, O.,Engelbrecht, J., Bohr, H., Nielsen, J. O. et al. (1995). Prediction ofO-glycosylation of mammalian proteins: specificity patterns of UDP-GalNAc:polypeptide N-acetylgalctosaminyltransferase. Biochem. J., 308, 801-813. |
| 160. | Milik, M., Kolinski, A. &Skolnick, J. (1995). Neural network system for the evaluation of side-chainpacking in protein structures. Prot. Engin.,8, 225-236. |
| 161. | Casadio, R., Fariselli, P.,Taroni, C. & Compiani, M. (1996). A predictor of transmembrane a-helix domainsof proteins based on neural networks. European Journal of Biophysics, 24, 165-178. |
| 162. | Fariselli, P. & Casadio, R.(1996). HTP: a neural network-based method for predicting the topology ofhelical transmembrane domains in proteins. CABIOS, 12, 41-48. |
| 163. | Hanke, J., Beckmann, G., Bork, P.& Reich, J. G. (1996). Self-organizing hierarchic networks for patternrecognition in protein sequence. Prot. Sci.,5, 72-82. |
| 164. | Sun, Z. R., Zhang, C. T., Wu, F.H. & Peng, L. W. (1996). A vector projection method for predictingsupersecondary motifs. J. Prot. Chem., 15, 721-729. |
| 165. | Sun, Z., Rao, X., Peng, L. &Xu, D. (1997). Prediction of protein supersecondary structures based on theartificial neural network method. Prot. Engin.,10, 763-769. |
| 166. | Aloy, P., Cedano, J., Oliva, B.,Avisel, F. X. & Querol, E. (1997). 'TransMem': a neural network implementedin Excel spreadsheets for predicting transmembrane domains of proteins. CABIOS, 13, 231-234. |
| 167. | Asogawa, M. (1997). Beta-sheet predictionusing inter-strand residue pairs and refinement with Hopfield neural network. Ismb, 5, 48-51. |
| 168. | Dopazo, J. & Carazo, J. M.(1997). Phylogenetic reconstruction using an unsupervised growing neuralnetwork that adopts the topology of a phylogenetic tree. J. Mol. Evol., 44, 226-233. |
| 169. | Dosztanyi, Z., Fiser, A. &Simon, I. (1997). Stabilization centers in proteins: identification,characterization and predictions. J. Mol. Biol.,272, 597-612. |
| 170. | Dubchak, I., Muchnik, I. &Kim, S.-H. (1997). Protein folding class predictor for SCOP: approach based onglobal descriptors. In Fifth International Conference on Intelligent Systemsfor Molecular Biology (Gaasterland, T., Karp, P., Karplus, K., Ouzounis, C.,Sander, C. et al., eds.), pp. 104-107, AAAI Press, Halkidiki, Greece. |
| 171. | Kawabata, T. & Doi, J. (1997).Improvement of protein secondary structure prediction using binary wordencoding. Proteins, 27, 36-46. |
| 172. | Lund, O., Frimand, K., Gorodkin,J., Bohr, H., Bohr, J. et al. (1997). Protein distance constraints predicted byneural networks and probability density functions. Prot. Engin., 10, 1241-1248. |
| 173. | Arrigo, P., Fariselli, P. &Casadio, R. (1998). Can functional regions of proteins be predicted from theircoding sequences? The case study of G-protein coupled receptors. Gene, 221, GC65-110. |
| 174. | Chou, K. C. & Elrod, D. W.(1998). Using discriminant function for prediction of subcellular location ofprokaryotic proteins. Biochem Biophys Res Commun,252, 63-68. |
| 175. | Diederichs, K., Freigang, J.,Umhau, S., Zeth, K. & Breed, J. (1998). Prediction by a neural network ofouter membrane beta-strand protein topology. Prot. Sci., 7, 2413-2420. |
| 176. | Reinhardt, A. & Hubbard, T.(1998). Using neural networks for prediction of the subcellular location ofproteins. Nucl. Acids Res., 26, 2230-2235. |
| 177. | Casadio, R., Compiani, M.,Fariselli, P. & Martelli, P. L. (1999). A data base of minimally frustratedalpha helical segments extracted from proteins according to an entropy criterion.Ismb,68-76. |
| 178. | Emanuelsson, O., Nielsen, H. &von Heijne, G. (1999). ChloroP, a neural network-based method for predictingchloroplast transit peptides and their cleavage sites. Prot. Sci., 8, 978-984. |
| 179. | Fariselli, P., Riccobelli, P.& Casadio, R. (1999). Role of evolutionary information in predicting thedisulfide-bonding state of cysteine in proteins. Proteins, 36, 340-346. |
| 180. | Fariselli, P. & Casadio, R.(1999). A neural network based predictor of residue contacts in proteins. Prot.Engin., 12, 15-21. |
| 181. | Gorodkin, J., Lund, O., Andersen,C. A. & Brunak, S. (1999). Using sequence motifs for enhanced neuralnetwork prediction of protein distance constraints. Ismb,95-105. |
| 182. | Guermeur, Y., Geourjon, C.,Gallinari, P. & Deleage, G. (1999). Improved performance in proteinsecondary structure prediction by inhomogeneous score combination. Bioinformatics, 15, 413-421. |
| 183. | Jones, D. T. (1999). Proteinsecondary structure prediction based on position-specific scoring matrices. J.Mol. Biol., 292,195-202. |
| 184. | Krogh, A. & Riis, S. K.(1999). Hidden neural networks. Neural Comput,11, 541-563. |
| 185. | Pasquier, C. & Hamodrakas, S.J. (1999). An hierarchical artificial neural network system for theclassification of transmembrane proteins. Prot. Engin., 12, 631-634. |
| 186. | Shepherd, A. J., Gorse, D. &Thornton, J. M. (1999). Prediction of the location and type of beta-turns inproteins using neural networks. Prot. Sci., 8, 1045-55. |
| 187. | Baldi, P., Pollastri, G.,Andersen, C. A. & Brunak, S. (2000). Matching protein beta-sheet partnersby feedforward and recurrent neural networks. Ismb, 8, 25-36. |
| 188. | Emanuelsson, O., Nielsen, H.,Brunak, S. & von Heijne, G. (2000). Predicting subcellular localization ofproteins based on their N-terminal amino acid sequence. J. Mol. Biol., 300, 1005-1016. |
| 189. | Fariselli, P. & Casadio, R.(2000). Prediction of the number of residue contacts in proteins. Ismb, 8, 146-151. |
| 190. | Jacoboni, I., Martelli, P. L.,Fariselli, P., Compiani, M. & Casadio, R. (2000). Predictions of proteinsegments with the same aminoacid sequence and different secondary structure: Abenchmark for predictive methods. Proteins,41, 535-544. |
| 191. | Ouali, M. & King, R. D.(2000). Cascaded multiple classifiers for secondary structure prediction. Prot.Sci., 9, 1162-1176. |
| 192. | Workman, C. T. & Stormo, G. D.(2000). ANN-Spec: a method for discovering transcription factor binding siteswith improved specificity. Pac. Symp. Biocomput.,467-478. |
| 193. | Babajide, A., Farber, R.,Hofacker, I. L., Inman, J., Lapedes, A. S. et al. (2001). Exploring proteinsequence space using knowledge-based potentials. J. Theor. Biol., 212, 35-46. |
| 194. | Bohr, H. G., Rogen, P. &Jalkanen, K. J. (2001). Applications of neural network prediction ofconformational states for small peptides from spectra and of fold classes. Comput.Chem., 26, 65-77. |
| 195. | Ding, C. H. & Dubchak, I.(2001). Multi-class protein fold recognition using support vector machines andneural networks. Bioinformatics, 17, 349-358. |
| 196. | Fariselli, P. & Casadio, R.(2001). RCNPRED: prediction of the residue co-ordination numbers in proteins. Bioinformatics, 17, 202-204. |
| 197. | Fariselli, P., Olmea, O.,Valencia, A. & Casadio, R. (2001). Prediction of contact maps with neuralnetworks and correlated mutations. Prot. Engin.,14, 835-843. |
| 198. | Jacoboni, I., Martelli, P. L.,Fariselli, P., De Pinto, V. & Casadio, R. (2001). Prediction of thetransmembrane regions of beta-barrel membrane proteins with a neuralnetwork-based predictor. Prot. Sci., 10, 779-787. |
| 199. | Pasquier, C., Promponas, V. J.& Hamodrakas, S. J. (2001). PRED-CLASS: cascading neural networks forgeneralized protein classification and genome-wide applications. Proteins, 44, 361-9. |
| 200. | Pollastri, G., Baldi, P.,Fariselli, P. & Casadio, R. (2001). Improved prediction of the number ofresidue contacts in proteins by recurrent neural networks. Bioinformatics, 17, S234-242. |
| 201. | Zhou, H. X. & Shan, Y. (2001).Prediction of protein interaction sites from sequence profile and residueneighbor list. Proteins, 44, 336-343. |
| 202. | Baldi, P., Brunak, S., Frasconi,P., Soda, G. & Pollastri, G. (1999). Exploiting the past and the future inprotein secondary structure prediction. Bioinformatics, 15, 937-946. |
| 203. | Baldi, P. & Pollastri, G.(2001). Machine learning structural and functional proteomics. IEEEIntelligent Systems,in press. |
| Contact: rost@columbia.edu | Version: Feb 12, 2002 |