Universität Bielefeld - Technische Fakultät - AG Praktische Informatik - FSPM - Strukturbildunge

Divide-and-Conquer Multiple Sequence Alignment

Example DCA Alignment
DCA home

As an example of the output of DCA, we present an alignment of six tyrosine kinase protein sequences of length between 273 and 280 amino acids which were also aligned by Kececioglu using the maximum weight trace approach (Kececioglu, 1993).

In FASTA format, the sequence file is the following:

>SRC_RSVP                                                   
GLAKDAWEIPRESLRLEAKLGQGCFGEVWMGTWNDTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAV
VSEEPIYIVIEYMSKGSLLDFLKGEMGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVA
DFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERG
YRMPCPPECPESLHDLMCQCWRKDPEERPTFKYLQAQLLPACVLEVAE                           

>YES_AVISY                                                  
GLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTKVAIKTLKLGTMMPEAFLQEAQIMKKLRHDKLVPLYAV
VSEEPIYIVTEFMTKGSLLDFLKEGEGKFLKLPQLVDMAAQIADGMAYIERMNYIHRDLRAANILVGDNLVCKIA
DFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELVTKGRVPYPGMVNREVLEQVERG
YRMPCPQGCPESLHELMKLCWKKDPDERPTFEYIQSFLEDYFTAAEPSG                          

>ABL_MLVAB                                                  
TIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNL
VQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGE
NHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQV
YELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSIS                    

>FES_FSVGA                                                  
VLNRAVPKDKWVLNHEDLVLGEQIGRGNFGEVFSGRLRADNTLVAVKSCRETLPPDIKAKFLQEAKILKQYSHPN
IVRLIGVCTQKQPIYIVMELVQGGDFLTFLRTEGARLRMKTLLQMVGDAAAGMEYLESKCCIHRDLAARNCLVTE
KNVLKISDFGMSREAADGIYAASGGLRQVPVKWTAPEALNYGRYSSESDVWSFGILLWETFSLGASPYPNLSNQQ
TREFVEKGGRLPCPELCPDAVFRLMEQCWAYEPGQRPSFSAIYQELQSIRKRHR                     

>FPS_FUJSV                                                  
VLTRAVLKDKWVLNHEDVLLGERIGRGNFGEVFSGRLRADNTPVAVKSCRETLPPELKAKFLQEARILKQCNHPN
IVRLIGVCTQKQPIYIVMELVQGGDFLSFLRSKGPRLKMKKLIKMMENAAAGMEYLESKHCIHRDLAARNCLVTE
KNTLKISDFGMSRQEEDGVYASTGGMKQIPVKWTAPEALNYGWYSSESDVWSFGILLWEAFSLGAVPYANLSNQQ
TREAIEQGVRLEPPEQCPEDVYRLMQRCWEYDPHRRPSFGAVHQDLIAIRKRHR                     

>KRAF_MSV36                                                 
SSYYWKMEASEVMLSTRIGSGSFGTVYKGKWHGDVAVKILKVVDPTPEQLQAFRNEVAVLRKTRHVNILLFMGYM
TKDNLAIVTQWCEGSSLYKHLHVQETKFQMFQLIDIARQTAQGMDYLHAKNIIHRDMKSNNIFLHEGLTVKIGDF
GLATVKSRWSGSQQVEQPTGSVLWMAPEVIRMQDDNPFSFQSDVYSYGIVLYELMAGELPYAHINNRDQIIFMVG
RGYASPDLSRLYKNCPKAIKRLVADCVKKVKEERPLFPQILSSIELLQHSLPKIN

With the recursion stop size set to L=40 , DCA computes an alignment which differs from the (PAM 250) score-optimal one by only a single gap within 1.7 seconds:

Level                                          2  *********            
SRC_RSVP   -----GLAKDAWEIPRESLRLEAKLGQGCFGEVWMGTWND-TTRVAI-KTLKPG--TMSP
YES_AVISY  -----GLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNG-TTKVAI-KTLKLG--TMMP
ABL_MLVAB  TIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAV-KTLKED--TMEV
FES_FSVGA  -VLNRAVPKDKWVLNHEDLVLGEQIGRGNFGEVFSGRLRADNTLVAV-KSCRETLPPDIK
FPS_FUJSV  -VLTRAVLKDKWVLNHEDVLLGERIGRGNFGEVFSGRLRADNTPVAV-KSCRETLPPELK
KRAF_MSV36 -------SSYYWKMEASEVMLSTRIGSGSFGTVYKGKWHG-DVAVKILKVVDPT--PEQL

Level                   1                                  2           
SRC_RSVP   EAFLQEAQVMKKLRHEKLVQLYAV-VSEEPIYIVIEYMSKGSLLDFLKGEMGKYLRLPQL
YES_AVISY  EAFLQEAQIMKKLRHDKLVPLYAV-VSEEPIYIVTEFMTKGSLLDFLKEGEGKFLKLPQL
ABL_MLVAB  EEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVL
FES_FSVGA  AKFLQEAKILKQYSHPNIVRLIGVCTQKQPIYIVMELVQGGDFLTFLRTEGAR-LRMKTL
FPS_FUJSV  AKFLQEARILKQCNHPNIVRLIGVCTQKQPIYIVMELVQGGDFLSFLRSKGPR-LKMKKL
KRAF_MSV36 QAFRNEVAVLRKTRHVNILLFMGY-MTKDNLAIVTQWCEGSSLYKHLHVQETK-FQMFQL

Level                             0                                    
SRC_RSVP   VDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAK
YES_AVISY  VDMAAQIADGMAYIERMNYIHRDLRAANILVGDNLVCKIADFGLARLIEDNEYTARQGAK
ABL_MLVAB  LYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAK
FES_FSVGA  LQMVGDAAAGMEYLESKCCIHRDLAARNCLVTEKNVLKISDFGMSREAADGIYAASGGLR
FPS_FUJSV  IKMMENAAAGMEYLESKHCIHRDLAARNCLVTEKNTLKISDFGMSRQEEDGVYASTGGMK
KRAF_MSV36 IDIARQTAQGMDYLHAKNIIHRDMKSNNIFLHEGLTVKIGDFGLATVKSRWSGSQQVEQP

Level      2                                    1                      
SRC_RSVP   -FPIKWTAPEAALY---GRFTIKSDVWSFGILLTELTTKGRVPYPGMVNR-EVLDQVERG
YES_AVISY  -FPIKWTAPEAALY---GRFTIKSDVWSFGILLTELVTKGRVPYPGMVNR-EVLEQVERG
ABL_MLVAB  -FPIKWTAPESLAY---NKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS-QVYELLEKD
FES_FSVGA  QVPVKWTAPEALNY---GRYSSESDVWSFGILLWETFSLGASPYPNLSNQ-QTREFVEKG
FPS_FUJSV  QIPVKWTAPEALNY---GWYSSESDVWSFGILLWEAFSLGAVPYANLSNQ-QTREAIEQG
KRAF_MSV36 TGSVLWMAPEVIRMQDDNPFSFQSDVYSYGIVLYELMA-GELPYAHINNRDQIIFMVGRG

Level                     2                                     
SRC_RSVP   YRMPCPP----ECPESLHDLMCQCWRKDPEERPTFKYLQAQLLPACVLEVAE-
YES_AVISY  YRMPCPQ----GCPESLHELMKLCWKKDPDERPTFEYIQSFLEDYFTAAEPSG
ABL_MLVAB  YRMERPE----GCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSIS-
FES_FSVGA  GRLPCPE----LCPDAVFRLMEQCWAYEPGQRPSFSAIYQELQSIRKRHR---
FPS_FUJSV  VRLEPPE----QCPEDVYRLMQRCWEYDPHRRPSFGAVHQDLIAIRKRHR---
KRAF_MSV36 YASPDLSRLYKNCPKAIKRLVADCVKKVKEERPLFPQILSSIELLQHSLPKIN

The numbers above the alignment denote the cut positions. The asterisks denote the region where the alignment differs from a score-optimal one computed with MSA.

The following table shows the scores of several alignments computed with DCA with different stop lengths L and window sizes W . In parentheses, the number of alignment positions is shown where the alignments differ from the score-optimal one.


\begin{tabular}{\vert l\vert rr\vert rr\vert rr\vert rr\vert}
\hline
~ &
\mult...
...\\
$L=300$\space & 61883 & (0) & ~ & ~ & ~ & ~ & ~ & ~ \\
\hline
\end{tabular}

Alignments computed with DCA can, for example, be used as input to the SplitsTree program producing results similar to the following one:


 
Figure 1: Splits graph of the above alignment with default parameters.
\begin{figure}\centering\includegraphics[width=5in]{fig/splits_graph.ps}
\end{figure}



J. Stoye, V. Moulton