Bottom - Index of papers - Paper in HTML - Abstract - CUBIC

Title: CHOP proteins into structural domain-like fragments
Author:Burkhard Rost
Quote: Proteins, 2004,55(3):678-688

CUBIC papers: abstract for
CHOP proteins into structural domain-like fragments

We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins from entirely sequenced organisms beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected over two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, over 70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over about 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid - although previously not described - for all proteins in the PDB. Third, single domain proteins were significant longer than most domains in multi-domain proteins. Fourth, three-fourth of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP - the simple clustering scheme CLUP introduced here - succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found over 63,000 multi- and over 118,000 single-member clusters. Although most fragments were restricted to a particular cluster, about 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target over 30,000 fragments to at least cover the multi-member clusters in 62 proteomes.

 



Top - Index of papers - Paper in HTML - Abstract - CUBIC