Bottom -
Previous -
Next -
CUBIC -
PP home
Explanations for terms (CUBIC's genome analysis)
This page tries to explain some of the jargon used on the CUBIC WWW pages
(http://cubic.bioc.columbia.edu).
Please feel free to send questions and comments to the CUBIC administrator:
cubic@cubic.bioc.columbia.edu
- ORF:
open reading frame, typically referring to an expressed gene, resp. a protein.
- :
PHDhtm = prediction of location and topology
PHDhtm
is a method for the refined prediction of transmembrane helix location and
topology. Here, the method was applied to search all proteins from human in Swiss-PROT
release 34.0 for transmembrane helices. The
results are described in Rost et al., 1996 (
Abstract;
Appendix to paper )
Expected accuracy of predictions by PHDhtm
PHDtopology was estimated to have an expected accuracy in predicting all
transmembrane helices and the topology correctly of 86% (+/- 3%, one
standard deviation). However, for prokaryotes expected accuracy drops to
below 73% (+/- 9%, one standard deviation). The expected rate of false
positives (proteins without transmembrane helices for which HTM's are
falsely predicted) was below 2%; and the expected percentage of proteins
with HTM's that were missed was below 3% (
more
details ).
- htm: transmembrane helix
- location: where in the sequence the transmembrane helix starts and ends
- topology: orientation of N-term with respect to membrane
= in if first loop is inside, = out, otherwise
- :
- FTP : find all data at the CUBIC ftp site :
dodo.bioc.columbia.edu in the directory pub/cubic/data/genomes/
- HTML : find the respective HTML pages
- ASCII : find the respective ASCII pages (load faster than HTML)
- ali : alignment of the query protein against proteins in the Swissprot database
- phd: secondary structure and solute accessibility prediction of the query protein made by phd
- phdHtm: transmembrane helices prediction of the query proteins made by phd
- id : sequence identifier
- nhtm : number of transmembrane helices
- top : topology (in-> N-term, i.e. first non transmembrane region,
intra-cytoplasmic, out-> N-term extra-cytoplasmic)
- nali : number of sequences in proteinfamily used for prediction
- htmCN: positions of predicted transmembrane helices (with respect to first residue)
- len1 : length of protein
- riTop: reliability of topology prediction (9=high, 0=low)
- riMod: reliability of best model (9=high, 0=low)
- topD : difference in number of charged loop residues (K+R) = all even loops j- all odd loops
- htmCN: position (C-N) of predicted TM helices
- seq : sequence of protein
Note: see the ftp version of the entire sequences for details!
- seqN : N-term sequence of protein (first 20 residues)
Note: see the ftp version of the entire sequences for details!
- seqC : C-term sequence of protein (last 20 residues)
Note: see the ftp version of the entire sequences for details!
Top -
Previous -
Next -
CUBIC -
PP home