|
|
||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
|
PoSSuM Software Distribution
Description: ![]() The PoSSuMsearch program, included in the PoSSuM software distribution implements the algorithms Simplesearch, Lookaheadsearch, ESAsearch and Lazydistrib. A user can search for PSSMs in enhanced suffix arrays built by mkvtree from the VMATCH package (also included in the PoSSuM software distribution), as well as on plain sequence data in Fasta, GENBANK, EMBL, or SPROT format. The search algorithm can be chosen from the command line. PSSMs are specified in a simple plain text format, where one file may contain multiple PSSMs. The alphabet a PSSM refers to, and alphabet character to PSSM column assignments can be specified on a per-PSSM basis for most flexible alphabet support. All implemented algorithms support alphabet transformations. PSSMs can contain integer as well as floating point scores. To prevent rounding errors for integer based PSSMs, PoSSuMsearch uses integer arithmetics for these, resulting in an additional speedup on most CPU architectures. Searching on the reverse strand of nucleotide sequences is implemented by PSSM transformation according to Watson-Crick base pairing. Hence it is sufficient to build the enhanced suffix array for one strand only. This can then be used to search both strands. The cutoff can be specified as p-value, E-value, MSS (matrix similarity score), or raw score threshold. If only the best matches with the highest scores need to be known, then PoSSuMsearch can be asked to report only the k highest scoring matches without even specifying an explicit cutoff. To do so, the search algorithms dynamically adapt the threshold during the search. When using p- or E-values, the score threshold is determined by either the lazy dynamic programming algorithm introduced in this contribution, or read from file that stores the complete precalculated probability distribution. Background distributions can be specified arbitrarily by the user, or determined from a given sequence database. We provide a tool, PoSSuMdist, to generate a compressed file containing the complete precalculated probability distributions for a set of PSSMs. PSSM matches can be sorted by specifying a list of sort keys,
like p-value, match score, sequence number, and so on. The output
formats of PoSSuMsearch print out all
available information about a match, either in a human readable
format, tab delimited, or in machine readable, XML-based CisML.
PoSSuMsearch as well as PoSSuMdist support multi-threading for a further
reduction of running time on multi CPU machines. Availability: The PoSSuM software package, including the program PoSSuMsearch implementing the ESAsearch algorithm, is available free of charge for non-commercial research institutions.
If you use PoSSuM in your own research, please cite:
Michael Beckstette, Robert Homann, Robert Giegerich, Stefan Kurtz Fast index based algorithms for matching position specific scoring matrices in BMC Bioinformatics 7:389, 2006 Download Publication Michael Beckstette, Dirk Strothmann, Robert Homann, Robert Giegerich, Stefan Kurtz PoSSuMsearch: Fast and Sensitive Matching of Position Specific Scoring Matrices using Enhanced Suffix Arrays in Proceedings of the German Conference on Bioinformatics 2004 GI Lecture Notes in Informatics,vol.: P-53, pages 53-64, ISBN 3-88579-382-2 Download Publication |
|
|||||||||||||||||||||||||||||||||
|
|
|
|||||||||||||||||||||||||||||||||