Universität Bielefeld - Technische Fakultät - AG Praktische Informatik - FSPM - Strukturbildunge
Divide-and-Conquer Multiple Sequence AlignmentExample DCA Alignment |
|
|---|
As an example of the output of DCA, we present an alignment of six tyrosine kinase protein sequences of length between 273 and 280 amino acids which were also aligned by Kececioglu using the maximum weight trace approach (Kececioglu, 1993).
In FASTA format, the sequence file is the following:
>SRC_RSVP GLAKDAWEIPRESLRLEAKLGQGCFGEVWMGTWNDTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAV VSEEPIYIVIEYMSKGSLLDFLKGEMGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVA DFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERG YRMPCPPECPESLHDLMCQCWRKDPEERPTFKYLQAQLLPACVLEVAE >YES_AVISY GLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTKVAIKTLKLGTMMPEAFLQEAQIMKKLRHDKLVPLYAV VSEEPIYIVTEFMTKGSLLDFLKEGEGKFLKLPQLVDMAAQIADGMAYIERMNYIHRDLRAANILVGDNLVCKIA DFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELVTKGRVPYPGMVNREVLEQVERG YRMPCPQGCPESLHELMKLCWKKDPDERPTFEYIQSFLEDYFTAAEPSG >ABL_MLVAB TIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNL VQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGE NHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQV YELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSIS >FES_FSVGA VLNRAVPKDKWVLNHEDLVLGEQIGRGNFGEVFSGRLRADNTLVAVKSCRETLPPDIKAKFLQEAKILKQYSHPN IVRLIGVCTQKQPIYIVMELVQGGDFLTFLRTEGARLRMKTLLQMVGDAAAGMEYLESKCCIHRDLAARNCLVTE KNVLKISDFGMSREAADGIYAASGGLRQVPVKWTAPEALNYGRYSSESDVWSFGILLWETFSLGASPYPNLSNQQ TREFVEKGGRLPCPELCPDAVFRLMEQCWAYEPGQRPSFSAIYQELQSIRKRHR >FPS_FUJSV VLTRAVLKDKWVLNHEDVLLGERIGRGNFGEVFSGRLRADNTPVAVKSCRETLPPELKAKFLQEARILKQCNHPN IVRLIGVCTQKQPIYIVMELVQGGDFLSFLRSKGPRLKMKKLIKMMENAAAGMEYLESKHCIHRDLAARNCLVTE KNTLKISDFGMSRQEEDGVYASTGGMKQIPVKWTAPEALNYGWYSSESDVWSFGILLWEAFSLGAVPYANLSNQQ TREAIEQGVRLEPPEQCPEDVYRLMQRCWEYDPHRRPSFGAVHQDLIAIRKRHR >KRAF_MSV36 SSYYWKMEASEVMLSTRIGSGSFGTVYKGKWHGDVAVKILKVVDPTPEQLQAFRNEVAVLRKTRHVNILLFMGYM TKDNLAIVTQWCEGSSLYKHLHVQETKFQMFQLIDIARQTAQGMDYLHAKNIIHRDMKSNNIFLHEGLTVKIGDF GLATVKSRWSGSQQVEQPTGSVLWMAPEVIRMQDDNPFSFQSDVYSYGIVLYELMAGELPYAHINNRDQIIFMVG RGYASPDLSRLYKNCPKAIKRLVADCVKKVKEERPLFPQILSSIELLQHSLPKIN
With the recursion stop size set to L=40 , DCA computes an alignment which differs from the (PAM 250) score-optimal one by only a single gap within 1.7 seconds:
Level 2 ********* SRC_RSVP -----GLAKDAWEIPRESLRLEAKLGQGCFGEVWMGTWND-TTRVAI-KTLKPG--TMSP YES_AVISY -----GLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNG-TTKVAI-KTLKLG--TMMP ABL_MLVAB TIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAV-KTLKED--TMEV FES_FSVGA -VLNRAVPKDKWVLNHEDLVLGEQIGRGNFGEVFSGRLRADNTLVAV-KSCRETLPPDIK FPS_FUJSV -VLTRAVLKDKWVLNHEDVLLGERIGRGNFGEVFSGRLRADNTPVAV-KSCRETLPPELK KRAF_MSV36 -------SSYYWKMEASEVMLSTRIGSGSFGTVYKGKWHG-DVAVKILKVVDPT--PEQL Level 1 2 SRC_RSVP EAFLQEAQVMKKLRHEKLVQLYAV-VSEEPIYIVIEYMSKGSLLDFLKGEMGKYLRLPQL YES_AVISY EAFLQEAQIMKKLRHDKLVPLYAV-VSEEPIYIVTEFMTKGSLLDFLKEGEGKFLKLPQL ABL_MLVAB EEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVL FES_FSVGA AKFLQEAKILKQYSHPNIVRLIGVCTQKQPIYIVMELVQGGDFLTFLRTEGAR-LRMKTL FPS_FUJSV AKFLQEARILKQCNHPNIVRLIGVCTQKQPIYIVMELVQGGDFLSFLRSKGPR-LKMKKL KRAF_MSV36 QAFRNEVAVLRKTRHVNILLFMGY-MTKDNLAIVTQWCEGSSLYKHLHVQETK-FQMFQL Level 0 SRC_RSVP VDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAK YES_AVISY VDMAAQIADGMAYIERMNYIHRDLRAANILVGDNLVCKIADFGLARLIEDNEYTARQGAK ABL_MLVAB LYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAK FES_FSVGA LQMVGDAAAGMEYLESKCCIHRDLAARNCLVTEKNVLKISDFGMSREAADGIYAASGGLR FPS_FUJSV IKMMENAAAGMEYLESKHCIHRDLAARNCLVTEKNTLKISDFGMSRQEEDGVYASTGGMK KRAF_MSV36 IDIARQTAQGMDYLHAKNIIHRDMKSNNIFLHEGLTVKIGDFGLATVKSRWSGSQQVEQP Level 2 1 SRC_RSVP -FPIKWTAPEAALY---GRFTIKSDVWSFGILLTELTTKGRVPYPGMVNR-EVLDQVERG YES_AVISY -FPIKWTAPEAALY---GRFTIKSDVWSFGILLTELVTKGRVPYPGMVNR-EVLEQVERG ABL_MLVAB -FPIKWTAPESLAY---NKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLS-QVYELLEKD FES_FSVGA QVPVKWTAPEALNY---GRYSSESDVWSFGILLWETFSLGASPYPNLSNQ-QTREFVEKG FPS_FUJSV QIPVKWTAPEALNY---GWYSSESDVWSFGILLWEAFSLGAVPYANLSNQ-QTREAIEQG KRAF_MSV36 TGSVLWMAPEVIRMQDDNPFSFQSDVYSYGIVLYELMA-GELPYAHINNRDQIIFMVGRG Level 2 SRC_RSVP YRMPCPP----ECPESLHDLMCQCWRKDPEERPTFKYLQAQLLPACVLEVAE- YES_AVISY YRMPCPQ----GCPESLHELMKLCWKKDPDERPTFEYIQSFLEDYFTAAEPSG ABL_MLVAB YRMERPE----GCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSIS- FES_FSVGA GRLPCPE----LCPDAVFRLMEQCWAYEPGQRPSFSAIYQELQSIRKRHR--- FPS_FUJSV VRLEPPE----QCPEDVYRLMQRCWEYDPHRRPSFGAVHQDLIAIRKRHR--- KRAF_MSV36 YASPDLSRLYKNCPKAIKRLVADCVKKVKEERPLFPQILSSIELLQHSLPKIN
The numbers above the alignment denote the cut positions. The asterisks denote the region where the alignment differs from a score-optimal one computed with MSA.
The following table shows the scores of several alignments computed with DCA with different stop lengths L and window sizes W . In parentheses, the number of alignment positions is shown where the alignments differ from the score-optimal one.
Alignments computed with DCA can, for example, be used as input to the SplitsTree program producing results similar to the following one: