Three-state prediction accuracy: Q3
Thus, the three-state per residue accuracy Q3 becomes:
Per-state percentages:
To define accuracy for a particular state (helix, strand, other), there are two possible variants. These answer to the following questions.
Information index:
The information index is given by:
where Pobs describes the probability for finding one particular string of Nres residues with obsi residues being in structure i out of all combinatorial possible ones, and Pprd is the probability for a particular realisation of the prediction matrix {M}. The resulting information index is:
Matthew's correlation coefficients are defined by the following formulas:
The Segment OVerlap measure for prediction accuracy is defined by (for more details):
- Per-stage segment overlap:
with the following definitions:
S1 and S2 are the observed and predicted secondary structure segments (in state i, which can be either H, E or C)
LEN(S1) is the number of residues in the segments S1
MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e. the extent for which both segments have residues in state i, for example H
MAXOV(S1;S2) is the length of the total extent for which either of the segments S1 or S2 has a residue in state i
DELTA(S1;S2) is the integer value defined as being equal to the following
THE SUM is taken over S, all the pairs of segments {S1;S2}, where S1 and S2 have at least one residue in state i in common
N(i) is the number of residues in state i defined as follows:
The two sums are taken over S and S':
S(i) is the number of all the pairs of segments {S1;S2}, where S1 and S2 have at least one residue in state i in common
S'(i) is the number of segments S1 that do not produce any segment pair
- Segment OVerlap quantity measure for all three states:
where the normalization value N is a sum of N(i) over all three conformational states (i = HELIX, STRAND, COIL)
- Structural classes derived from secondary structure content:
Proteins can be classified into 'structural' classes according to their secondary structure content (reference). To evaluate how well one of the four major classes:
all-alpha, all-beta, alpha-beta, other
is predicted, the following classification scheme is used (based on modifications of papers by Zhang & Chou and Kneller et al. ):
| class | protein length | percentage H | percentage E |
| all-alpha | > 60 | > 45% | < 5% |
| all-beta | > 60 | < 5% | > 45% |
| alpha-beta | > 60 | > 30% | > 20% |
| other | other | other | other |
NOTE: the table is to be read as an if alpha, elsif beta, elsif alpha-beta, else mix.
For these classes the overall four-class accuracy is reported (number of proteins correctly predicted in one of the four classes / total number of protein).
- Four-state structural class accuracy:
The percentage of proteins correctly predicted in one of the above four classes is simply given by:
- Difference between observed and predicted secondary structure content:
Because of technical constraints (build up of averages on flight), the accuracy of predicting the secondary structure content is measured by the simple difference between observed and predicted (reference):
Note: the abbreviations
are used on the EVA pages (for omegai; w)
- Pearson correlation coefficient for secondary structure content:
A more reasonable score to measure the accuracy of predicting content is the Pearson coefficient (note: this score may be added at some later stage):
- D Frishman & P Argos (1993) Proteins, 23, 566-579
- W Kabsch & C Sander (1983) Biopolymers, 22, 2577-2637
- M Levitt (1976) J Mol Biol, 104, 59-107
- DG Kneller, FE Cohen & R Langridge (1990) J Mol Biol, 214, 171-182
- B Rost & C Sander (1993) J Mol Biol, 232, 584-599
- A Zemla, C Venclovas, K Fidelis and B Rost (1999) Proteins, 34, 220-223
- C-T Zhang & K-C Chou (1992) Prot Sci, 1, 401-408
Home -
Prev -
Next -
Top
|