Table 1: Rules of thumb for sequence analysis

Tool: pairwise alignment: inferring structural homology
Rule: pairwise sequence identity > 25%
Restrictions: 25% over more than 80 aligned residues (for shorter regions identity must be higher (Sander and Schneider 1991))

short motifs (< 10 residues) not sufficiently indicative for homology

25% level may not apply to engineered proteins

composition biased regions (e.g. GRA rich regions in DNA binding proteins) should be excluded from compiling level of sequence identity

many gaps: if an alignment between two proteins contains too many insertions (gaps) even a relatively high value of sequence identity may not suffice to ascertain homology (typical structure alignments contain up to 10% gaps)

Tool: pairwise alignment: inferring structural homology
Rule: pairwise sequence similarity > pairwise sequence identity (Table 1)
Restrictions: depends on similarity metric chosen, hence comparison between different methods problematic

Tool: pairwise alignment: inferring function
Rule: level of similarity required for identifying functionally equivalent proteins in two species depends on the overall divergence of the species and on the particular protein family
Restrictions: functional annotations for homlogue used to infer function may be incomplete or wrong. Thus, annotations for the putative homologue ought to be verified in the original sources of functional assignments (a more reliable database is SWISS-PROT (Bairoch and Apweiler 1996)).

errors in sequences, such as frame-shifts or sequencing errors (very frequent in EST's) could lead to falsely inferred function

Tool: pairwise alignment: inferring function
Rule: alignment used to infer function should contain the functional residues, ideally the alignment should extend over the entire proteins
Restrictions: false annotations (s. a.)

errors in sequences (s. a.)

Tool: multiple alignment: sufficient information
Rule: cover entire range of evolutionary divergence, i.e., representatives with 90%, 80%, ... , 30% pairwise sequence identity
Restrictions: if A1 and A2 have 99% identical residues and both have a pairwise sequence identity of 40% to U, including A2 in a multiple alignment does not increase the amount of information contained in the multiple alignment

Tool: multiple alignment: aligning entire families
Rule: align entire folds, rather than short fragments, assure conservation of local motifs
Restrictions: if A1 is aligned to U in a region where A1 is known to have a functionally important sequence motif (Bairoch, et al. 1996) and this motif is not in U, this may indicate a false alignment

Tool: multiple alignment: aligning based on family specific profiles
Rule: alignments based on family specific profiles are more accurate and more sensitive (finding non-trivial homologues) than pairwise alignments
Restrictions: if the family profile used for the search contains errors, the final alignment may be less accurate than pairwise alignments

good profile-based alignments require expertise and alignments containing many sequences

Tool: 1D prediction
Rule: 70% correct implies 30% incorrect
Restrictions: 70% is an average over a distribution with a typical standard deviation of 10% (Rost 1996). Thus, for a particular protein U prediction accuracy can be less than 60% or more than 80% (Rost 1996).

expected values for accuracy hold for classes of proteins used to set up prediction method (e.g. no prediction of transmembrane helices with tools optimised for globular proteins (Rost, et al. 1995))

prediction accuracy for engineered proteins hard to estimate

Tool: 1D prediction based on alignments
Rule: the more accurate and informative the alignment, the more accurate the prediction
Restrictions: the quality of multiple alignments depends on the divergence of the sequences aligned and the completeness with which the family is covered

Tool: 3D prediction: homology modelling
Rule: accuracy depends on level of sequence identity
Restrictions: simulating ligand binding requires > 70-90% pairwise sequence identity

no accurate predictions for inserted loop regions

Tool: 3D prediction: remote homology modelling
Rule: most models proposed by threading methods are wrong!
Restrictions: be careful to use threading without a lot of caution, skill and intuition!




EMBL Home Sander Home Rost Home Mail to Rost Schneider Home