| Tool: | pairwise alignment: inferring structural homology |
| Rule: | pairwise sequence identity > 25% |
| Restrictions: | 25% over more than 80 aligned residues (for shorter regions identity must be higher (Sander and Schneider 1991)) short motifs (< 10 residues) not sufficiently indicative for homology 25% level may not apply to engineered proteins composition biased regions (e.g. GRA rich regions in DNA binding proteins) should be excluded from compiling level of sequence identity many gaps: if an alignment between two proteins contains too many insertions (gaps) even a relatively high value of sequence identity may not suffice to ascertain homology (typical structure alignments contain up to 10% gaps) |
| Tool: | pairwise alignment: inferring structural homology |
| Rule: | pairwise sequence similarity > pairwise sequence identity (Table 1) |
| Restrictions: | depends on similarity metric chosen, hence comparison between different methods problematic |
| Tool: | pairwise alignment: inferring function |
| Rule: | level of similarity required for identifying functionally equivalent proteins in two species depends on the overall divergence of the species and on the particular protein family |
| Restrictions: | functional annotations for homlogue used to infer function may be incomplete or wrong. Thus, annotations for the putative homologue ought to be verified in the original sources of functional assignments (a more reliable database is SWISS-PROT (Bairoch and Apweiler 1996)). errors in sequences, such as frame-shifts or sequencing errors (very frequent in EST's) could lead to falsely inferred function |
| Tool: | pairwise alignment: inferring function |
| Rule: | alignment used to infer function should contain the functional residues, ideally the alignment should extend over the entire proteins |
| Restrictions: | false annotations (s. a.) errors in sequences (s. a.) |
| Tool: | multiple alignment: sufficient information |
| Rule: | cover entire range of evolutionary divergence, i.e., representatives with 90%, 80%, ... , 30% pairwise sequence identity |
| Restrictions: | if A1 and A2 have 99% identical residues and both have a pairwise sequence identity of 40% to U, including A2 in a multiple alignment does not increase the amount of information contained in the multiple alignment |
| Tool: | multiple alignment: aligning entire families |
| Rule: | align entire folds, rather than short fragments, assure conservation of local motifs |
| Restrictions: | if A1 is aligned to U in a region where A1 is known to have a functionally important sequence motif (Bairoch, et al. 1996) and this motif is not in U, this may indicate a false alignment |
| Tool: | multiple alignment: aligning based on family specific profiles |
| Rule: | alignments based on family specific profiles are more accurate and more sensitive (finding non-trivial homologues) than pairwise alignments |
| Restrictions: | if the family profile used for the search contains errors, the final alignment may be less accurate than pairwise alignments good profile-based alignments require expertise and alignments containing many sequences |
| Tool: | 1D prediction |
| Rule: | 70% correct implies 30% incorrect |
| Restrictions: | 70% is an average over a distribution with a typical standard deviation of 10% (Rost 1996). Thus, for a particular protein U prediction accuracy can be less than 60% or more than 80% (Rost 1996). expected values for accuracy hold for classes of proteins used to set up prediction method (e.g. no prediction of transmembrane helices with tools optimised for globular proteins (Rost, et al. 1995)) prediction accuracy for engineered proteins hard to estimate |
| Tool: | 1D prediction based on alignments |
| Rule: | the more accurate and informative the alignment, the more accurate the prediction |
| Restrictions: | the quality of multiple alignments depends on the divergence of the sequences aligned and the completeness with which the family is covered |
| Tool: | 3D prediction: homology modelling |
| Rule: | accuracy depends on level of sequence identity |
| Restrictions: | simulating ligand binding requires > 70-90% pairwise sequence identity no accurate predictions for inserted loop regions |
| Tool: | 3D prediction: remote homology modelling |
| Rule: | most models proposed by threading methods are wrong! |
| Restrictions: | be careful to use threading without a lot of caution, skill and intuition! |