Universität Bielefeld -
Technische Fakultät -
AG Praktische Informatik -
FSPM - Strukturbildungsprozesse
Divide-and-Conquer Multiple Sequence Alignment
Parameters of DCA
|
|
In general, DCA takes as input a family of sequences
and a matrix of pairwise letter distances as well as
gap initiation and gap extension parameters.
-
Sequences
DCA accepts protein, DNA, RNA, or any ASCII sequences.
Sequences can be either typed in manually in the provided
lines (e.g. by dragging and dropping with the computer mouse),
or they can be loaded from a local sequence file.
Accepted input formats are FASTA and GDE.
For an example see
here.
-
Substitution matrix
You can either use any of the provided substitution matrices
(PAM250,
PAM160,
Blosum30,
Blosum45,
Blosum62,
Gonnet250,
Gonnet120,
DNA/RNA, or
unit cost)
which are displayed upon clicking the "show matrix" button in the submission
form,
or you can provide your own (distance) substitution matrix in
MSA
format.
-
Gap parameters
The affine cost of a gap is computed by the formula
where
is the "gap initiation cost"
and
is the "gap extension cost".
-
Free shift
DCA by default does not penalize gaps at either end of the sequences
to make the compensation of differences in the length of the seuences
free of charge. This free shift option can be deactivated.
-
Approximate cut positions/FDCA
The last, most time consuming phase of the search for cut positions
can be deactivated so that the sequences are cut at approximate
slicing positions, which generally yields slightly less accurate but
often much faster alignments.
A more decent explanation of this heuristic is found
here.
-
Recursion stop size
The
recursion stop size
can be set to any number
.
Of course, too large an L (e.g.
)
can result in very long
running times and very big memory usage due to the resulting
MSA-runs.
On the other hand, too small an L (e.g.
)
can result in empty subsequences at the end of the iteration
which may lead to bad alignments.
-
Window size
To correct the alignment in the proximity of division sites,
the sequences can be re-aligned inside a window of size
placed across each slicing site.
-
Weight intensity
The weight intensity
can be set to any value between
0 (no weighting) and 1.0 (max. weighting).
Here
is the formula.
J. Stoye, V. Moulton