Universität Bielefeld - Technische Fakultät - AG Praktische Informatik - FSPM - Strukturbildungsprozesse
Divide-and-Conquer Multiple Sequence AlignmentFast DCA |
|
|---|
For more than about six distantly related sequences one cannot expect to find combinations of cut positions with a multiple additional cost of zero since, for any diverged sequence-family, the ``total incompatibility'' - estimated by the minimal multiple additional cost, here denoted by Copt - increases considerably with the number of sequences considered. Moreover, we have developed a heuristic procedure called MinC (see Perrey & Stoye, 1996) which computes approximations to Copt , and it has been shown that the cut positions resulted from this heuristic, called MinC -optimal cut positions, are often very close to (or even equal to) the C -optimal cut positions (see Stoye, 1997). Therefore, the standard version of DCA uses by far the most time for either finding only slightly better combinations of cut positions than the approximate ones computed by MinC , or for verifying that the MinC -optimal combinations are already C -optimal.
Moreover, C -optimal combinations of cut positions themselves represent only approximations to those of the sum-of-pairs score optimal, so that it is rather natural to avoid the full exponential-time exhaustive search for C -optimal cut positions and use those of MinC directly to cut the sequences. Such a version of the DCA alignment algorithm is significantly faster and therefore is able to align far more sequences than the original algorithm. Thus, we define the Fast Divide and Conquer Alignment algorithm, short FDCA, to be the modified DCA procedure which directly uses MinC -optimal cut positions and hence avoids the time-consuming search for C -optimal cut positions.