Help PageHelp Page
Who should use UniqueProt

    UniqueProt should be used by researchers who want to analyze a sequence-set containing proteins of a certain functional class or cellular location. It removes the bias of sequence-redundant proteins from these sets hoping that the aquired unique sub-set will be a more accurate approximation of a non-redundant protein universe.
Example Files
Input
     
  • Create a Fasta-file on your local harddrive  in which all sequences of your chosen proteins are listed one after another.  Click here to view an  example of such a file. The maximum filesize that our server accepts is 500kb.
  • The program is case-insensitive, so that you don't have to worry about upper or lower case characters in your fasta-file.

    Optional: Perform a pairwise BLAST or PSI-BLAST alignment on your local machine for each sequence in your set against each other. Click here to view an example of such a file. You are allowed to submit 10Mb to our server when you use this option.
     
  • Let me know for which HSSP-values uniqueProt should perform the job. You can submit a whole range of values (fields From and To). If you only need one single HSSP-value, input this value into both fields. The higher the HSSP-value, the more sequences you will retrieve in your final set, but the less unique these sequences will be. HSSP is a function of the percentage sequence identity and the length of an alignment. A rule of thumb is that an HSSP-value of 0 will give you a list of proteins, that do not have a similar structure. You can get more detailed information about HSSP-values in the paper to this service.
  • Set the desired mode for the algorithm. If you pick "largest first", the program will start the greedy clustering with the largest families. If you want certain proteins to show up more likely in your final unique-list than other proteins, order your fasta-file with the most important sequences first and so on. Then pick "custom" as the algorithm mode. The program will try to include those proteins in the final list which have a high priority in your fasta file.
  • If you want to get information about the HSSP-values of each alignment, tick the "HSSP-value-file"-box.

  • You can set a minimum length for an alignment to be considered by the algorithm. Since the HSSP-formula already takes the alignment length into account, this option is only for special purposes and should normally be set to 0.

  • Choose the desired type of compression (tar.gz or zip).
Output
You will get an instant message about how much time the algorithm needs to finish. After approximately this time you will receive an email with the desired results. This email will contain a link to a compressed file (either zip or tar.gz) on our server. Download and decompress this file into a folder on your harddrive.
The compressed result-file will contain the fasta files with the sequence-unique sets. There will be one fasta-file for each HSSP-value within the submitted range.
From here
pppppppppppppppppppp
 
  • back to the uniqueProt homepage
  • ©2003, Sven Mika & Burkhard Rost