Home - Prev - Next - Bottom

EVA measures for secondary structure prediction accuracy

Author: Burkhard Rost rost@columbia.edu

Contents



    Standard-of-truth

  1. PDB -> secondary structure
    Observed secondary structure is taken primarily from the program DSSP (reference). However, we shall (soon) run STRIDE additionally (reference).
  2. Conversion of DSSP secondary structure
    The following chart illustrates, how the 8 states from DSSP are converted to three secondary structure states:
    DSSPHGIEBTS' '
    usedHHHEELLL



    Per-residue accuracy


The following scores are used to measure per-residue accuracy (reference, note: most tables contain only a subset of these scores):
  1. Prediction accuracy matrix:

     
    note: the total number of residues observed in state i is:
     
    note: the total number of residues predicted in state i is (helix, strand, other)
     
    and the total number of residues is simply:
     
  2. Three-state prediction accuracy: Q3

    Thus, the three-state per residue accuracy Q3 becomes:
     
  3. Per-state percentages:

    To define accuracy for a particular state (helix, strand, other), there are two possible variants. These answer to the following questions.
  4. Information index:

    The information index is given by:
     
    where Pobs describes the probability for finding one particular string of Nres residues with obsi residues being in structure i out of all combinatorial possible ones, and Pprd is the probability for a particular realisation of the prediction matrix {M}. The resulting information index is:
     
  5. Matthew's correlation coefficients are defined by the following formulas:

     



    Per-segment accuracy (SOV)


The Segment OVerlap measure for prediction accuracy is defined by (for more details):


    Accuracy of predicting structural class and secondary structure content


  1. Structural classes derived from secondary structure content:

    Proteins can be classified into 'structural' classes according to their secondary structure content (reference). To evaluate how well one of the four major classes:

    all-alpha, all-beta, alpha-beta, other
    is predicted, the following classification scheme is used (based on modifications of papers by Zhang & Chou and Kneller et al. ):
    class protein lengthpercentage Hpercentage E
    all-alpha > 60> 45%< 5%
    all-beta > 60< 5%> 45%
    alpha-beta> 60> 30%> 20%
    other otherotherother

    NOTE: the table is to be read as an if alpha, elsif beta, elsif alpha-beta, else mix.
    For these classes the overall four-class accuracy is reported (number of proteins correctly predicted in one of the four classes / total number of protein).

  2. Four-state structural class accuracy:
    The percentage of proteins correctly predicted in one of the above four classes is simply given by:

     

  3. Difference between observed and predicted secondary structure content:
    Because of technical constraints (build up of averages on flight), the accuracy of predicting the secondary structure content is measured by the simple difference between observed and predicted (reference):

     

    Note: the abbreviations
    are used on the EVA pages (for omegai;   w)

  4. Pearson correlation coefficient for secondary structure content:
    A more reasonable score to measure the accuracy of predicting content is the Pearson coefficient (note: this score may be added at some later stage):

     






    References







eva cubic cubic_data cubic_genomes cubic_ftp cubic_mail cubic_pp cubic_meta cubic_pptools pub_expasy pub_embl pub_ebi pub_lion
Home - Prev - Next - Top