|
|
Rose - Manual
$Id: manual.html,v 1.8 2003/11/24 14:44:38 hmersch Exp $
NAME
ROSE - Random-model Of Sequence Evolution
Rose implements a new probabilistic model of RNA-, DNA-, or
protein-sequence evolution.
SYNOPSIS
rose [-I <dir>[:<dir>]] <input file> | -
AVAILABILITY
http://bibiserv.TechFak.Uni-Bielefeld.DE/rose/
DESCRIPTION
Rose: generating sequence families
Jens Stoye (1) Dirk Evers (2) and Folker Meyer (2)
1 Research Center for Interdisciplinary Studies on Structure Formation (FSPM)
2 Technische Fakultaet, University of Bielefeld, Postfach 100 131,
33501 Bielefeld, Germany
Motivation: We present a new probabilistic model of the evolution of
RNA-, DNA-, or protein-like sequences and a software tool, Rose, that
implements this model. Guided by an evolutionary tree, a family of
related sequences is created from a common ancestor sequence by
insertion, deletion and substitution of characters. During this
artificial evolutionary process, the `true' history is logged and the
`correct' multiple sequence alignment is created simultaneously. The
model also allows for varying rates of mutation within the sequences,
making it possible to establish so-called sequence motifs.
Results: The data created by Rose are suitable for the evaluation of
methods in multiple sequence alignment computation and the prediction of
phylogenetic relationships. It can also be useful when teaching courses
in or developing models of sequence evolution and in the study of
evolutionary processes.
OPTIONS
-I dir[:dir]
A colon-separated list of directories used to specify
include search directories to the input parser.
USAGE
rose <input file>| -
Input can be from stdin ( specify a '-'(minus) on the command line)
or from an input file.
The input stream may contain the following parameters:
Name Type Default Optional Comment
StdOut Boolean True Yes output to stdout
OutputFilename String None Yes output to single filename
OutputFilebase String None Yes out to separate files named...
SequenceSuffix String ".fas" Yes sequence file suffix
AlignmentFormat String "PHYLIP" Yes "FASTA" or "PHYLIP"
AlignmentWithAncestors Boolean False Yes alignment will contain ancestors
AlignmentSuffix String (".fa" or ".phy") Yes alignment file suffix
TreeSuffix String ".tree" Yes tree file suffix
SequenceOutputLen Integer 60 Yes Length of Seq on a Line
SeedVal Integer None Yes Seed of random num gen
SequenceLen Integer 100 Yes average sequence length
SequenceNum Integer 10 Yes How many sequences?
InputType Integer 1 Yes 1=Protein, 4=DNA
Relatedness Integer 1 Yes nonsense default value!
ChooseFromLeaves Boolean True Yes Output only leaf seqs
TreeWithSequences Boolean False Yes Tree with seqs attached
TreeSequencesWithGaps Boolean False Yes Sequences in tree will contain alignment gaps
TreeWithAncestors Boolean False Yes Give all ancestors in the tree
TheTree Tree None Yes Tree in Phylip format
TheSequence String None Yes Start Sequence
ThePAMMatrix FP Matrix None No!* The Mutation Matrix
TheAlphabet String None No! The used Alphabet
TheFreq FP Vector None No! The average freq of Elem
TheInsertThreshold FP 0.03 Yes Insertion only % time
TheDeleteThreshold FP 0.03 Yes Deletion only % time
TheMutationProbability FP Vector [1.0+] Yes at a given site
TheDNAmodel String None No!* "JC","HKY","F81","F84","K2P"
MeanSubstitution Double 0.01342302 Yes Mean Subst. Rate (all)
TransitionBias Double 1.0 Yes needed for HKY, K2P
TTratio Double 0.0 Yes Transition/Transversion (F84)
NumberOfRuns Integer 1 Yes number of rose-runs
TheInsFunc FP Vector None No! Prob of certain length
TheDelFunc FP Vector None No! Prob of certain length
* either ThePAMMatrix or TheDNAModel has to be specified !!
Assignment
==========
{Tag} = {Value} [;]
Example:
OutputFilename = "myoutput";
Includes
========
May be placed anywhere between complete assignments in the input file
and nested to a given depth.
%include {include filename}
Example:
%include protein-defaults
Comments
========
Can be any of:
C type comments: /* A comment
stretching several lines */
C++ type comments: //Another comment ending with the line
Bourne Shell comments: # The hash has to be the first character on the line
Type Description
================
Name Regexp like Description Example
Integer {DIGIT}+ 1 or more digits 123
FP {DIGIT}+"."{DIGIT}* FP has to have "." 3.4 or .5
"."{DIGIT}*
Boolean [Tr]"rue" True or false
[Ff]"alse"
String \"[^\"\n]*\" double quoted
text no newlines "An Example"
Vector [\[\{]{Objects}[\]\}] [4], {.4,5.5}
Matrix [[3,2],[5,5]]
Tree Phylip Tree (a,b,(c,d:5,e));
Parse Errors
============
Are commented in compiler style giving file names and line numbers
in a nested fashion together with expected symbol.
Example:
In file included from sample1:2:
protein-defaults:13: parse error, expecting `EQ' or `OBRACE'
EXAMPLES
rose sample2
Takes this input file
----------------------------------------------
# Sample2 for ISMB97 Poster
%include dna-defaults
SequenceNum = 5
ChooseFromLeaves = True
TheSequence = "AGTCTGTACTATAATGGGAGGAAAGCC"
TheTree = ((a:3,b:5):5,(c:4,d:2,e:4):5,(f:3,g:4):6,(h:3,i:3):4);
TheMutationProbability =
[1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,
0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
----------------------------------------------
includes this default file
----------------------------------------------
#
# default rose include file for DNA
#
InputType = 4 // DNA
TheAlphabet = "ACGT"
TheFreq = [.25,.25,.25,.25]
TheInsertThreshold = 0.09
TheDeleteThreshold = 0.09
TheInsFunc = [.2,.2,.2,1,1,1,1]
TheDelFunc = [.2,.2,.2,1,1,1,1]
ThePAMMatrix = [[.97,.01,.01,.01],
[.01,.97,.01,.01],
[.01,.01,.97,.01],
[.01,.01,.01,.97]]
----------------------------------------------
results in something like this
----------------------------------------------
#i
ACGCTGTAGTATAATGGGAGGAACGCT
#h
ACTATGTCCAATCAACTATAATGGGAGGAACCCT
#e
AGTCCGTACTATAATGGGTTCCAGGAATGC
#d
AGTCAGTACTATAATGGGTTCCAGGAAAGC
#c
AGTCCGTAATATAATGTGTTCCAGGAATCC
Alignment:
i ACGCTGT-------AGTATAATGGG----AGGAACGCT
h ACTATGTCCAATCAACTATAATGGG----AGGAACCCT
e AGTCCGT-------ACTATAATGGGTTCCAGGAATGC-
d AGTCAGT-------ACTATAATGGGTTCCAGGAAAGC-
c AGTCCGT-------AATATAATGTGTTCCAGGAATCC-
(
(
(
i:3,
h:3):4,
(
e:4,
d:2,
c:4):5));
----------------------------------------------
Giving you:
1. The chosen ancestor sequences
2. Their alignment
3. The coresponding tree with distances
ENVIRONMENT
No environment variables are used.
FILES
protein-defaults default config file to include for protein seqs
dna-defaults default config file to include for dna seqs
SEE ALSO
For a complete description of the functionality of ROSE see:
Stoye, J., Evers, D., & Meyer, F. (1998)
Rose: generating sequence families.
In Bioinformatics, Vol. 14, Issue 2, pp. 157-163.
http://www.oup.co.uk/bioinformatics/hdb/Volume_14/Issue_02/ps/btb005_gml.ps.gz
preprint version:
ftp://ftp.uni-bielefeld.de/pub/papers/techfak/pi/Report97-04.ps.gz
BUGS
If you encounter strange behaviour please contact:
mailto:folker@TechFak.Uni-Bielefeld.DE
mailto:dirk@TechFak.Uni-Bielefeld.DE
mailto:Jens.Stoye@CeBiTec.Uni-Bielefeld.DE
An example input file
for DNA sequences
An example input
file for protein sequences
|
|