BiBiServ Logo
Attention:
Due to technical maintenance some tools might be unavailable.
See maintenance information.
BiBiServ -
                                    Bielefeld         University Bioinformatic Service
Tools
Education
Administration
Tools
Genome Comparison
Gecko
REPuter
...more
Alignments
PoSSuMsearch2
ChromA
...more
Primer Design
GeneFisher2
RNA Studio
RNAshapes
KnotInFrame
RNAhybrid
...more
Evolutionary Relationship
ROSE
...more
Others
XenDB
jPREdictor
...more

DIALIGN - Manual


Sequence file:

DIALIGN requires a single ASCII file containing the sequences to be aligned. Four different file formats are supported: IG, FASTA, EMBL and GCG-RSF format. The following is an example of the FASTA sequence file format:

>HTL2  
LDTAPCLFSDGSPQKAAYVLWDQTILQQDITPLPSHETHSAQKGELLALICGLRAAKPWP
SLNIFLDSKYLIKYLHSLAIGAFLGTSAHQTLQAALPPLLQGKTIYLHHVRSHTNLPDPI
STFNEYTDSLILAPL
>MMLV   
PDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALDAGTSAQRAELIALTQALKMAE
GKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKRLSIIH
CPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLL
>HEPB 
RPGLCQVFADATPTGWGLVMGHQRMRGTFSAPLPIHTAELLAACFARSRSGANIIGTDNS
VVLSRKYTSFPWLLGCAANWILRGTSFVYVPSALNPADDPSRGRLGLSRPLLRLPFRPTT
GRTSLYADSPSVPSHLPDRVH
>ECOL   
MLKQVEIFTDGSCLGNPGPGGYGAILRYRGREKTFSAGYTRTTNNRMELMAAIVALEALK
EHCEVILSTDSQYVRQGITQWIHNWKKRGWKTADKKPVKNVDLWQRLDAALGQHQIKWEW
VKGHAGHPENERCDELARAAAMNPTLEDTGYQVEV

For each sequence, the first line starts with ">" and contains the name of the sequence.

Options:

  • Sequence Type:

    The user can decide if nucleic acid or protein sequences are to be aligned.

  • Threshold T:

    As described in our papers, DIALIGN constructs alignments from gapfree pairs of segments of the sequences. Such segment pairs are referred to as `diagonals'.

    Every possible diagonal is given a so-called weight reflecting the degree of similarity among the two segments involved. The overall score of an alignment ist then defined as the sum of weights of the diagonals it consists of and the program tries to find an alignment with maximum score -- in other words: the program tries to find a consistent collection of diagonals with maximum sum of weights. This novel scoring scheme for alignments is the basic difference between DIALIGN and other global or local alignment methods. Note that DIALIGN does not employ any kind of gap penalty.

    It is possible to use a threshold T for the quality of the diagonals. In this case, diagonals are considered only if their `weights' exceed this threshold, and regions of lower similarity are ignored.

    In the first version of the program (DIALIGN 1), this threshold was in many situations absolutely necessary to obtain meaningful alignments. By contrast, DIALIGN 2 should produce reasonable alignments without a threshold, i.e. with T = 0. This is the most important difference between DIALIGN 2 and the first version of the program.

    Nevertheless, it is still possible to use a threshold T, so it is up to the user to experience with this option.

  • Translation of `nucleotide diagonals' into `peptide diagonals':

    If (possibly) coding nucleic acid sequences are to be aligned, DIALIGN optionally translates the compared `nucleic acid segments' to `peptide segments' according to the genetic code -- without (necessarily) presupposing any of the three possible reading frames, so all three of them get checked for significant similarity. In this case, the similarity among segments will be assessed on the `peptide level' rather than on the `nucleic acid level'. We strongly recommend this option if nucleic acid sequences are expected to contain protein coding regions, as it will significantly increase the sensitivity of the alignment procedure in such cases.

  • `*' characters:

    The user can specify the maximum number of `*' characters indicating the degree of local similarity among sequences.

Similarity Matrix:

DIALIGN 2 employs the BLOSUM62 amino acid substitution matrix.


Program Output:


DIALIGN creates a file containing

  • An alignment of the input sequences in DIALIGN format.
  • The same alignment in FASTA format.
  • A sequence tree in PHYLIP format. This tree is constructed by applying the UPGMA clustering method to the DIALIGN similarity scores. It roughly reflects the different degrees of similarity among sequences. For detailed phylogenetic analysis, we recommend the usual methods for phylogenetic reconstruction.

This is DIALIGN alignment format:

  
HTL2          1   ldtapcLFSD GS------PQ KAAYVLWDQT IL---QQDIT PLPSHethSA
MMLV          1   pdadhtwYTD GSSLLQEGQR KAGAAVTTET eviwaKALDA G---T---SA
HEPB          1   rpglcQVFAD AT------PT GWGLVMGHQR MR---GTFSA PLPIHt----
ECOL          1   mlkqvEIFTD GSCLGNPGPG GYGAILRYRG RE---KTFSA GytrT---TN
                                                                
                       ***** ********** ********** **   ***** *****   **
                        **** **      ** ********** **   ***** *****   **
                         *** **      ** ********** **   *****           
                                     ** ******                          
                                                                        


HTL2         42   QKGELLALIC GLRAAKPWPS LNIFLDSKYL IKYLHslaig aflgtsah--
MMLV         45   QRAELIALTQ ALKMAEgkk- LNVYTDSRYA FATAHIHGEI YRRRGLLTSE
HEPB         38   --AELLAACF Arsrsgan-- -IIGTDN--- ---------- ----------
ECOL         45   NRMELMAAIV ALEALKEHCE VILSTDSQYV RQGITQWIHN WKKRGWKTAD
                                                                
                  ********** ********** ********** ********** **********
                  ********** ********** ********** ********** **********
                     ******* ******     ********** *****                
                     ******* ******     ********** *****                
                                          ********                      


HTL2         90   -------QT- --LQAALPPL LQGKTIYLHH VRSHT----- -NLPDPISTF
MMLV         94   GKEIKNKDE- --ILALLKAL FLPKRLSIIH CPGHQ----- -KGHSAEARG
HEPB         60   ---------- ---SVVLSR- ---------- ---KYTSFPW LLGCAANWI-
ECOL         95   KKPVKNVDlw qrLDAALGQ- ---------- ---HQIKWEW VKGHAGHPE-
                                                                
                  *********    ******** ********** ********** **********
                  ********                                              
                         *                                              
                                                                        
                                                        


HTL2        124   NEYTDSLILA pl-------- ---------- ---------- ----------
MMLV        135   NRMADQAARK AAITETPDTS tll------- ---------- ----------
HEPB         82   LRGTSFVYVP SALNPADDPS rgrlglsrpl lrlpfrpttg rtslyadsps
ECOL        130   NERCDELARA AAMNPTledt gyqvev---- ---------- ----------
                                                                
                  ********** **********                                 
                  ********** ******                                     
                                                                        
                                                                        
                                                                        


HTL2        136   ----------
MMLV              ----------
HEPB        132   vpshlpdrvh
ECOL        156   ----------
                    


  • Names of the aligned sequences are shown on the left hand side of the alignment.
  • Numbers on the left hand side of the alignment denote the position of the first residue in a line within the respective sequence.
  • Capital letters denote aligned residues, i.e. residues involved in at least one of the `diagonals' the alignment consists of. Lower-case letters denote residues not belonging to any of these selected `diagonals'. They are not considered to be aligned by DIALIGN. Thus, if a lower-case letter is standing in the same column with other letters, this is pure chance; these residues are not considered to be homologous.
  • The number of `*' characters below the alignment reflects the degree of local similarity among sequences. More precisely: They represent the sum of `weights' of diagonals connecting residues at the respective position. The number of `*' characters is normalized such that regions of maximum similarity have N `*' characters per column. N can be specified by the user. By default, N = 5. Note that the number of `*' characters depicts the relative degree of similarity within an alignment, since in every alignment, the region of maximum similarity gets N `*' characters.

This is FASTA alignment format:

>HTL2
ldtapcLFSDGS------PQKAAYVLWDQTIL---QQDITPLPSHethSA
QKGELLALICGLRAAKPWPSLNIFLDSKYLIKYLHslaigaflgtsah--
-------QT---LQAALPPLLQGKTIYLHHVRSHT------NLPDPISTF
NEYTDSLILApl--------------------------------------
----------
>MMLV
pdadhtwYTDGSSLLQEGQRKAGAAVTTETeviwaKALDAG---T---SA
QRAELIALTQALKMAEgkk-LNVYTDSRYAFATAHIHGEIYRRRGLLTSE
GKEIKNKDE---ILALLKALFLPKRLSIIHCPGHQ------KGHSAEARG
NRMADQAARKAAITETPDTStll---------------------------
----------
>HEPB
rpglcQVFADAT------PTGWGLVMGHQRMR---GTFSAPLPIHt----
--AELLAACFArsrsgan---IIGTDN-----------------------
-------------SVVLSR--------------KYTSFPWLLGCAANWI-
LRGTSFVYVPSALNPADDPSrgrlglsrpllrlpfrpttgrtslyadsps
vpshlpdrvh
>ECOL
mlkqvEIFTDGSCLGNPGPGGYGAILRYRGRE---KTFSAGytrT---TN
NRMELMAAIVALEALKEHCEVILSTDSQYVRQGITQWIHNWKKRGWKTAD
KKPVKNVDlwqrLDAALGQ--------------HQIKWEWVKGHAGHPE-
NERCDELARAAAMNPTledtgyqvev------------------------
----------

This is PHYLIP tree format:

 
((HTL2:0.111024,
(MMLV:0.078471,
ECOL:0.078471):0.032554):0.121218,
HEPB:0.232242);


Trees can be visualized using the drawtree program contained in the PHYLIP software package.

Welcome
Submission
Download
References
Manual
Web Service
Contact
Fri Dec 14 12:50:24 2012