-- Torsten Will, mailto:towi@geocities.com, 11.10.99 COSTMATRICES 3.cost Symbols = { - A B M X Y } Used for test alignments with a somewhow "spitten" lattices. dayhoff.cost The standard dayhoff matrix used for my test results. unitAC.cost Symbols = { - A C } g=1, d(-,*)=5, d(A,C)=10 For alignments that are easy to hand compute... unitG1.cost . Symbols = {Alphabet} g=1, d=(-,*)=5, d(x,y)=10 Same as unitAC.cost but with all letters. SEQUENCE SETS a*.seq Very simple sets of sequences. Mostly testd with 3.cost or unit*.cost. cccc.seq Sample set does not really align. Used with costG1.seq. usym.seq Gives a simple unsymmetrical 2dim alignment with unitG1.cost. diplom.{ab}.seq Somewhat test cases with artifical (?) sequences. Use dayhoff.cost. diplom.??.seq Real sequences from [gsa98] where k=??. The goal is that the program can align all these sets (k=14), because GSA1 did it. Use dayhoff.cost. prot?.seq Real sequences -- I don't know the source -- with k=?. Seems hard to align (k=7). I used dayhoff.cost. GETTING HELP FROM OMA If you enter oma you get the a full help from oma before any part gsalib is entered. If you do something wrong on the command line, like oma --? you get the command line help from an inner part of oma ('dca') which is not complete. Some parameters listed here are not yet implemented or completly useless in the cntext of oma. SAMPLE CALLS OF OMA I ran my bis test sets on the BAliBASE with a Makefile. The sequence-files (*.fasta) are used as input which are aligned to the output (*.oma-?-out). Perhaps it is helpful for you to take a look at the lines which concern oma: %.oma-a-out: %.fasta date > $@ nice -5 oma -c dayhoff.cost -- '-v -W 10 -I 128 -T 18000' $< > $@ 2>&1 %.oma-b-out: %.fasta date > $@ oma -c dayhoff.cost -- '-v -I 1024 -T 18000 -M 19000' $< > $@ 2>&1 %.oma-c-out: %.fasta date > $@ oma -c dayhoff.cost -- '-v -I 1024 -W 1 -T 1000' $< > $@ 2>&1 grep ime $@ | elm -s "Q: $@" torsten.will@mediaways.net If you are really interested in oma and if you want to do such nice things with it like "show-me-the-progress-of-the-algorithm- in-space" like I did (with the "watch parameter" -W, gnuplot, perl and other dirty tricks) feel free and mail to me. I can send and explain my scripts to you. ABOUT THE THINGS THAT ARE MISSING * The -b option (of dca) for wighting sequences SHOULD work but is completely untested and the firs thing to be done in the future. * Triple alignments for a beter lower bound. Some thinking must be done to select a "good" set of triples to align before the multiple alignment is done. * Read the epsilon (used for facebounding) from a file. Should be a relatively simple task. Who needs this feature? * Dca is able to benefit the pre-computing of blocks. Currently the optimal alignment part of oma can not use this information. This should be done. * Terminal Gaps ALWAYS count at the moment. To be fixed. * I already programmed a multithreaded oma which was able to precompute faces and distancmatrices parallel. It shoult be possible to parallize the alignment of the segments. But one can say "when using p processors the memory is full p times faster". Someone interested in this "enhancement"?