|
|
PoSSuMsearch2 - Welcome
Significant speedup of database searches with HMMs by search
space reduction with PSSM family models
Motivation:
Profile Hidden Markov models (pHMMs) are currently the most
popular modeling concept for protein families. They provide
sensitive family descriptors, and sequence database searching
with pHMMs has become a standard task in today's genome
annotation pipelines. On the downside, searching with pHMMs is
computationally expensive.
Results:
We propose a new method for efficient protein family
classification and for speeding up database searches with pHMMs
as is necessary for large scale analysis scenarios. We employ
simpler models of protein families called PSSM family models. For
fast database search, we combine full text indexing, efficient
exact p-value computation of PSSM match scores, and fast fragment
chaining. The resulting method is well suited to pre-filter the
set of sequences to be searched for subsequent database searches
with pHMMs.
We achieved a classification performance only marginally inferior
to hmmsearch, yet, results could be obtained in a fraction of
runtime with a speedup of more than 64 fold. In experiments
addressing the method's ability to pre-filter the sequence space
for subsequent database searches with pHMMs, our method reduces
the number of sequences to be searched with hmmsearch to only
0.80% of all sequences. The filter is very fast and leads to a
total speedup of factor 43 over the unfiltered search while
retaining more than 99.5% of the original results. In a loss-less
filter setup for hmmsearch on UniProtKB-SwissProt, we observed a
speedup of factor 92.
Availability:
The PoSSuM2 software package, including the program
PoSSuMsearch2, is available free of charge for non-commercial
research institutions.
|
|