Sequence Format:
DCA supports sequences in FASTA format via file upload or
copy&paste in a textfield. A sequence in FASTA format
begins with a single-line description, followed by lines of
sequence data.
- The description line starts with a greater than symbol
(">").
- The word following the greater than symbol (">")
immediately is the "ID" (name) of the sequence, the rest of
the line is the description.
- The "ID" and the description are optional.
- all lines of text should be shorter than 80 (normally 60)
characters.
- the sequence ends if there is another greater than symbol
(">") symbol at the beginning of a line and another
sequence begins.
The following example contains one sequence (sequence_1):
>sequence_1
GLAKDAWEIPRESLRLEAKLGQGCFGEVWMGTWNDTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAV
VSEEPIYIVIEYMSKGSLLDFLKGEMGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVA
DFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERG
YRMPCPPECPESLHDLMCQCWRKDPEERPTFKYLQAQLLPACVLEVAE
Substitution
Matrix:
DCA provides following substituion matrices: PAM250, PAM160, Blosum30, Blosum45, Blosum62, DNA/RNA, and unit cost.
Free shift:
DCA by default does not penalize gaps at either end of the
sequences to make the compensation of differences in the length
of the seuences free of charge. This free shift option can be
deactivated.
Approximate Cut
Positions:
The last, most time consuming phase of the search for cut
positions can be deactivated so that the sequences are cut at
approximate slicing positions, which generally yields
slightly less accurate but often much faster alignments. A more
decent explanation of this heuristic is found here.
Recursion stop size:
The recursion stop
size can be set to any number L ≥ 1. Of course,
too large an L (e.g. L > 100) can result in
very long running times and very big memory usage due to the
resulting MSA-runs. On the other hand, too small an L
(e.g. L < 5) can result in empty subsequences at the
end of the iteration which may lead to bad alignments.
Window size:
To correct the alignment in the proximity of division sites,
the sequences can be re-aligned inside a window of size W
≥ 0
Weight intensity:
The weight intensity λ can be set to any value
between 0 (no weighting) and 1.0 (max. weighting). Here is the formula.
The output is a generated file that contains the aligned
sequences in FASTA format.