|
The analysis of these proteomes used
PSI-BLAST,
COILS v.2,
MaxHom,
NORS,
PROFsec,
PHDhtm,
SEG,
PROSITE and
SignalP v.2. PEP is created from all proteins available for each respective genome, aligned against SWISS-PROT, TrEMBL and PDB, and by individual analysis of each PEP sequence using COILS, MaxHom, NSR, PHD, SEG and SignalP. The PEP database is a summary of these results. The full results are also available from the Rost group, although are quite large > GB in size.
The PEP database is available as a collection of flatfiles containing analysis of different proteomes, which will be increased over time (you can find some of the papers for the sequencing projects here). At present we have analysed proteins from the following sources:
| Total number of organisms/proteomes analysed in PEP: | 105 |
| Total number of proteins analysed in PEP: | 406015/413630 |
| Eukaryotes | |
| Arabidopsis thaliana | 25541 sequences; |
| Caenorhabditis elegans | 20251 sequences; |
| Drosophila melanogaster | 14304 sequences; |
| Homo sapiens | 37271 sequences; |
| Mus musculus | 28090 sequences; |
| Saccharomyces cerevisiae | 6356 sequences; |
| Schizosaccharomyces pombe | 4987 sequences; |
| Prokaryotes | |
| Achaeoglobus fulgidus | 2407 sequences; |
| Agrobacterium tumefaciens | 5274 sequences; |
| Agrobacterium tumefaciens (strain C58 / ATCC 33970) | 5402 sequences; |
| Aquifex aeolicus | 1522 sequences; |
| Bacillus anthracis (strain Ames) | 5311 sequences; |
| Bacillus cereus (ATCC 14579) | 5275 sequences; |
| Bacillus subtilis | 4099 sequences; |
| Acholeplasma florum (Mesoplasma florum) | 683 sequences; |
| Acinetobacter sp (strain ADP1) | 3322 sequences; |
| Bacteroides thetaiotaomicron VPI-5482 | 4776 sequences; |
| Bartonella henselae (Houston-1) | 1482 sequences; |
| Bartonella quintana (Toulouse) | 1141 sequences; |
| Bdellovibrio bacteriovorus | 3584 sequences; |
| Bordetella bronchiseptica RB50 | 4986 sequences; |
| Bordetella parapertussis | 4184 sequences; |
| Bordetella pertussis | 3446 sequences; |
| Borrelia burgdorferi | 850 sequences; |
| Bradyrhizobium japonicum | 8307 sequences; |
| Brucella melitensis | 2059 sequences; |
| Buchnera aphidicola (subsp. Acyrthosiphon pisum) | 574 sequences; |
| Buchnera aphidicola (subsp. Baizongia pistaciae) | 504 sequences; |
| Buchnera aphidicola (subsp. Schizaphis graminum) | 546 sequences; |
| Campylobacter jejuni | 1633 sequences; |
| Candidatus Blochmannia floridanus | 583 sequences; |
| Caulobacter crescentus | 3737 sequences; |
| Chlamydophila caviae | 998 sequences; |
| Chlamydia muridarum | 907 sequences; |
| Chromobacterium violaceum ATCC 12472 | 4396 sequences; |
| Corynebacterium diphtheriae NCTC 13129 | 2269 sequences; |
| Corynebacterium efficiens | 2947 sequences; |
| Corynebacterium glutamicum | 2989 sequences; |
| Chlorobium tepidum | 2252 sequences; |
| Chlamydia trachomatis | 894 sequences; |
| Clostridium acetobutylicum | 3848 sequences; |
| Clostridium perfringens | 2723 sequences; |
| Clostridium tetani | 2373 sequences; |
| Coxiella burnetii | 2009 sequences; |
| Deinococcus radiodurans | 3100 sequences; |
| Desulfovibrio vulgaris subsp. vulgaris str. Hildenborough | 3524 sequences; |
| Escherichia coli | 4281 sequences; |
| Enterococcus faecalis | 3113 sequences; |
| Erwinia carotovora | 4463 sequences; |
| Fusobacterium nucleatum | 2067 sequences; |
| Gloeobacter violaceus | 4425 sequences; |
| Haemophilus ducreyi | 1715 sequences; |
| Haemophilus influenzae | 1709 sequences; |
| Helicobacter heilmannii | 1874 sequences; |
| Helicobacter pylori | 1564 sequences; |
| Lactobacillus johnsonii | 1813 sequences; |
| Lactobacillus plantarum WCFS1 | 3002 sequences; |
| Lactococcus lactis (subsp. lactis) | 2266 sequences; |
| Leifsonia xyli (subsp. xyli) | 2023 sequences; |
| Leptospira interrogans (serogroup Icterohaemorrhagiae / serovar Copenhageni) | 3652 sequences; |
| Listeria innocua | 2968 sequences; |
| Listeria monocytogenes | 2833 sequences; |
| Mycobacterium avium | 4340 sequences; |
| Mycobacterium bovis AF2122/97 | 3906 sequences; |
| Mycoplasma gallisepticum | 726 sequences; |
| Mycoplasma genitalium | 470 sequences; |
| Mycoplasma mycoides (subsp. mycoides SC) | 1016 sequences; |
| Mycoplasma pneumoniae | 688 sequences; |
| Mycoplasma pulmonis | 778 sequences; |
| Neisseria meningitidis | 2065 sequences; |
| Nitrosomonas europaea | 2461 sequences; |
| Oceanobacillus iheyensis | 3496 sequences; |
| Porphyromonas gingivalis | 1909 sequences; |
| Pseudomonas aeruginosa | 5563 sequences; |
| Pseudomonas putida | 5316 sequences; |
| Ralstonia solanacearum | 5092 sequences; |
| Rhizobium loti | 7264 sequences; |
| Rickettsia conorii | 1374 sequences; |
| Shigella flexneri | 4176 sequences; |
| Staphylococcus aureus | 2516 sequences; |
| Streptococcus agalactiae | 2121 sequences; |
| Streptococcus pneumoniae | 2094 sequences; |
| Streptococcus pyogenes | 1845 sequences; |
| Streptomyces coelicolor | 7894 sequences; |
| Thermotoga maritima | 1846 sequences; |
| Treponema pallidum | 1031 sequences; |
| Ureaplasma urealyticum | 611 sequences; |
| Vibrio cholerae | 2736 sequences; |
| Vibrio parahaemolyticus RIMD 2210633 | 4800 sequences; |
| Wolinella succinogenes | 2044 sequences; |
| Xanthomonas axonopodis (pv. citri) | 4029 sequences; |
| Xylella fastidiosa | 2766 sequences; |
| Yersinia pestis | 4087 sequences; |
| Archae | |
| Achaeoglobus fulgidus | 2407 sequences; |
| Aeropyrum pernix K1 | 2687 sequences; |
| Halobacterium sp. (strain NRC-1) | 2058 sequences; |
| Methanosarcina acetivorans | 4540 sequences; |
| Methanopyrus kandleri | 1687 sequences; |
| Methanobacterium thermoautotrophicum | 1869 sequences; |
| Pyrococcus abyssi | 1764 sequences; |
| Pyrococcus furiosus | 2065 sequences; |
| Pyrococcus horikoshii | 2064 sequences; |
| Sulfolobus solfataricus | 2977 sequences; |
| Thermoplasma acidophilum | 1478 sequences; |
| Virus | |
| Human cytomegalovirus (strain AD169) | 202 sequences; |
| Murine herpesvirus 68 strain WUMS | 80 sequences; |
PEP sequences are ORFs (open reading frames). We have also predicted putative
structural domains (self-sufficient folding units), or fragments, within the
ORFs. Collectively these fragments have been analysed for the
same features, i.e. protein homologs and secondary structure features.
The fragments results are available as a database we have called CHOP.
The fragments have also been clustered using PSI-BLAST using an
all versus all comparison. The clusters are available as a database CLUP.
| no |
organism | organism_latin | sequencing | cubic | date | ORFs | results | download from CUBIC |
| 1 |
aerpe | Aeropyrum pernix K1 | done | done | 99-07 | 2694 | htm | seq ,htm |
| 2 |
aquae | Aquifex aeolicus | done | done | 99-07 | 1522 | htm | seq ,htm |
| 3 |
arcfu | Archaeoglobus fulgidus | done | done | 98-04 | 2383 | htm | seq ,htm |
| 4 |
bacsu | Bacillus subtilis | done | done | 98-04 | 4099 | htm | seq ,htm |
| 5 |
borbu | Borrelia burgdorferi | done | done | 98-04 | 850 | htm | seq ,htm |
| 6 |
caeel | Caenorhabditis elegans | done | done | 99-07 | 18944 | htm | seq ,htm |
| 7 |
camje | Campylobacter jejuni | done | done | 00-02 | 1731 | htm | seq ,htm |
| 8 |
chlpn | Chlamydia pneumoniae | done | done | 99-07 | 1052 | htm | seq ,htm |
| 9 |
chltr | Chlamydia trachomatis | done | done | 99-07 | 894 | htm | seq ,htm |
| 10 |
deira | Deinococcus radiodurans | done | done | 99-12 | 3103 | htm | seq ,htm |
| 11 |
drome | Drosophila melanogaster | done | done | 00-04 | 14218 | htm | seq ,htm |
| 12 |
ecoli | Escherichia coli | done | done | 98-04 | 4285 | htm | seq ,htm |
| 13 |
haein | Haemophilus influenzae | done | done | 98-04 | 1716 | htm | seq ,htm |
| 14 |
helpy | Helicobacter pylori | done | done | 98-04 | 1788 | htm | seq ,htm |
| 15 |
hs22 | Homo sapiens(Chromosome 22) | done | done | 00-03 | 887 | htm | seq ,htm |
| 16 |
human | Homo sapiens | partial | partial | 98-04 | 24235 | htm | htm |
| 17 |
metja | Methanococcus jannaschii | done | done | 98-04 | 1735 | htm | seq ,htm |
| 18 |
mettm | Methanobacterium thermoautotrophicum | done | done | 98-04 | 1871 | htm | seq ,htm |
| 19 |
mycge | Mycoplasma genitalium | done | done | 98-04 | 470 | htm | seq ,htm |
| 20 |
mycpn | Mycoplasma pneumoniae | done | done | 98-04 | 677 | htm | seq ,htm |
| 21 |
myctu | Mycobacterium tuberculosis | done | done | 99-07 | 3918 | htm | seq ,htm |
| 22 |
neime | Neisseria meningitidis | done | done | 00-03 | 2081 | htm | seq ,htm |
| 23 |
pyrab | Pyrococcus abyssi | done | done | 99-07 | 1765 | htm | seq ,htm |
| 24 |
pyrho | Pyrococcus horikoshii | done | done | 99-07 | 2064 | htm | seq ,htm |
| 25 |
ricpr | Rickettsia prowazekii | done | done | 99-07 | 834 | htm | seq ,htm |
| 26 |
syny3 | Synechocystis PCC6803 | done | done | 99-07 | 3169 | htm | seq ,htm |
| 27 |
thema | Thermotoga maritima | done | done | 99-07 | 1846 | htm | seq ,htm |
| 28 |
trepa | Treponema pallidum | done | done | 99-07 | 1031 | htm | seq ,htm |
| 29 |
ureur | Ureaplasma urealyticum | done | done | 00-03 | 613 | htm | seq ,htm |
| 30 |
yeast | Saccharomyces cerevisiae | done | done | 98-04 | 6307 | htm | seq ,htm |
-->
|