#------------------------------------------- # uniqueProt command-line version #------------------------------------------- #------------------ Installation #------------------ 1) Change to the uniqueProt directory 2) Decompress the uniqueProt.zip -> unzip uniqueProt.zip or uniqueProt.tar.gz file -> gunzip uniqueProt.tar.gz -> tar -xf uniqueProt.tar 3) Type: "perl install.pl" The script will ask you for the path of your installation (the path where the uniqueProt.pl script is) and for the perl-path of your local system. #------------------ Caution! #------------------ UniqueProt generates a working directory ('work') to dump certain temporary files. This directory could take a lot of disc-space! A rough estimation would be 1 Gbyte per 100,000 sequences. This amount of space is only required when the job is running. The temporary files will be deleted after uniqueProt finishes. #------------------ Files #------------------ Files contained in the uniqueProt-package: - uniqueProt.pl - blastall - formatdb - data/BLOSUM62 - README.txt #------------------ Running the Program #------------------ To use UniqueProt just run uniqueProt.pl from its directory: Type "perl uniqueProt.pl" on the command line and give some necessary options. Options are submitted to the program as it is shown in the following example: perl uniqueProt.pl -i /home/test/testinput.fasta -d 10 -o /home/test/testoutput.zip Options: -------- -i the input file (has to be fasta-formatted or an alignment-file; no database-ID-list allowed!). This is the only required option for the program. All other options are given default values for the case that they were not defined by the user. If possible try to give the complete filename (with path) to the program -a a possible BLAST/PSIBLAST alignment file. Only use this option if you know what you are doing. UniqueProt will not run BLAST by itself to get similarity HSSP-values for your sequences if you set this option. Please make sure that your BLAST-file really contains all alignments, since otherwise many sequences, which are homologous will be in the final unique-list. Also make sure that the fasta headers in your fasta-file (-i option) are not unreasonably long and that they are identical with those descriptions showing up in your BLAST/PSIBLAST-file. -o the output file. Additionally, fasta-files and maybe an hssp-matrix file will be created (if -x option set). Default: "output.txt" in your uniqueProt directory. If possible try to give the complete filename (with path) to the program -d HSSP-Distance Value (has to be a number between -60 and 80) Default: 0 -f,-t 'from'/'to' = if you want to submit a whole range of distance values, you can do that by using the -f and the -t options. e.g. uniqueProt -f -10 -t 10 -i /home/tom/test.fasta This will let the program run the algorithm for distance values between -10 and 10 (usually 10 different), so that one can choose a set of representatives from these values. Default: -f 0 -t 0 -x Activate this option by just typing '-x' on the commandline without any following specifications. This will make the program create an hssp-value file, in which hssp-values for the sequence-pairs are listed. Only those pairs will appear in the file, which produced a BLAST alignment or which showed up in your submitted alignment file (-a option). -l Minimum length of an alignment to be considered as valid by uniqueProt. Alignments below this length will be ignored. Default: 0 -m Mode of the greedy algorithm used to create the unique list (has to be 'small', 'large' or 'custom') Mode 'large'/'small' starts the greedy algorithm on the largest/smalles family. Mode 'custom' starts with the first sequence in the submitted fasta-file (-i option), then with the second, etc. The custom-behaviour enforces the first sequences in the fasta-file to be more likely in the final set than later sequences in the fasta-file. Default: large (-h Activate this option by just typing '-h' on the commandline without any following specifications. This will make the program create browsable html-files for a better overview of the results. The main file from where one should start is called index.html Default: no html) does not work yet! #------------------ Deinstallation #------------------ Delete all contents of the uniqueProt directory. Since no files are being copied to other directories by the install.pl script, this should be sufficient to remove the program completely from your machine.