|
|
RNAforester - Manual
RNAforester calculates the similarity between two or multiple
RNA secondary structures. Note that the scoring scheme for
pairwise and multiple alignments differs slightly (see
below).
InputThe input sequences/structures are required in
Vienna (DotBracket) format. The first line if a
sequence/structure block starts with an '>' character followed
by an id (first word) and an optional description. The next line
contains the sequence information and the last line of a block
contains the structure information, where matching brackets
symbolize base-pairs and unpaired bases are represented by dots.
An example is given below:
>id1 description
accaguuacccauucgggaaccggu
.((..(((...)))..((..)))).
>id2 description
...
Global, local and small-in-large alignment
Local similarity means finding the maximal similarity between
substructures of RNA secondary structures. If these substructures
are extended, the score decreases. This requires a scoring scheme
that balances positive and negative scoring contributions.
Otherwise, the similarity of the complete structures would always
achieve the maximum score. It is generally assumed that an
alignment of two empty structures scores zero. A localized
variant of distance makes no sense, as empty forests have always
the lowest possible distance of zero. 
Substructures of RNA secondary structures can be defined in
different ways. Here, it means that the substructures are
contiguous and ``closed'' by hairpin loops. (no stem without its
closing loop(s)!) The blue regions shows a vaild substructure.
The green part of the structure is not closed because the closing
hairpin is missing. The red part shows a substructure that is not
considered as a local structure for the same reason. However,
this is less obvious since only the $U$, which is a child of the
root of this subtree, is not included. If the top-level $P$ node
would not be included in the red substructure, this part would
correspond to a closed subforest. The yellow part does not
correspond to a closed subforest since the subtrees are not
consecutive siblings.
Scoring models

Structural edit operations of Jiang et al.'s general edit model
for RNA structures.
The sequence edit operations base match, base
mismatch and base deletion are the same for pairwise
and multiple alignment. A base pair breaking means the
deletion of a base-pair bond. A base-pair deletion is the
composition of a base-pair breaking and two base
deletions. A base-pair altering is treated likewise
but there is only one base-deletion involved. The
structural edit operations base pair replacement and have
a different effect for pairwise and multiple alignment. In
pairwise alignment mode, the pairing bases are treated as a unit.
In multiple alignment mode, base pair replacement score
means the score for matching any base-pair plus the score for
matching or mismatching the bases that pair. Thus, it is not
possible to construct a base-pair dependend scoring for this
model. The RIBOSUM scoring scheme are empirically derived
base-pair and single base substitution scores that are available
in pairwise alignment mode.
Multiple alignment mode
Output
|
|