Sequence Format:
An input sequence for RNAshapes may have up to 2k nucleotids.
RNAshapes supports sequences in FASTA format via file upload or
copy&paste in a textfield. A sequence in FASTA format
begins with a single-line description, followed by lines of
sequence data.
- The description line starts with a greater than symbol
(">").
- The word following the greater than symbol (">")
immediately is the "ID" (name) of the sequence, the rest of
the line is the description.
- The "ID" and the description are optional.
- all lines of text should be shorter than 80 (normally 60)
characters.
- the sequence ends if there is another greater than symbol
(">") symbol at the beginning of a line and another
sequence begins.
The following example contains one sequence (sequence_1):
>sequence_1
caagcacagaaacctatggcataaatccctctgagacgcgttgtactatggttatctaat
tctccggcgacacaagttgtctaaccgtgatcaccttaaagggcaagccgcccaatagat
gttagttaatactacgtaccaagtatgcctgcgcttggtaaagccgcctgtccatagttc
tactagggtagagcttcaggatgctctatagttcgagcggttctttgatcaactcgacta
gctaccaccatgtctgtgttttattgcacgcaaagtcgtaagtttaaacggaccaagaag
ccttcttcggtcagtagcaggttaagggccaagtacaagcctctccaggaatgcttaacg
gcatcgatgcaacttggacaagtaaacatcctgaagctta
Shape type:
The shape type is the level of abstraction or dissimilarity
which defines a different shape. In general, helical regions
are depicted by a pair of opening and closing square brackets
and unpaired regions are represented as a single underscore.
The differences of the shape types are due to whether a
structural element (bulge loop, internal loop, multiloop,
hairpin loop, stacking region and external loop) contributes to
the shape representation: Five types are implemented. Their
differences are shown in the following example:
CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG
((((((...(((..(((...))))))...(((..((.....))..)))))))))..
| Type |
Description |
Result |
| 1 |
Most accurate - all loops and all unpaired |
[_[_[]]_[_[]_]]_
|
| 2 |
Nesting pattern for all loop types and unpaired
regions in external loop and multiloop |
[[_[]][_[]_]]
|
| 3 |
Nesting pattern for all loop types but no unpaired
regions |
[[[]][[]]]
|
| 4 |
Helix nesting pattern in external loop and
multiloop |
[[][[]]]
|
| 5 |
Most abstract - helix nesting pattern and no unpaired
regions |
[[][]]
|
The following image also describes the differences
between shape types:
Match shape:
Specify a shape for the corresponding mode of operation.
Calculate
structure probabilities:
This calculates the probability of every computed structure. It
can be combined with any sequence analysis mode.
Generate structure graphs:
This generates postscript structure graphs for each given
sequence.
Allow lonely base pairs:
In default mode, RNAshapes only considers helices of length 2
or longer. With this option, lonely base pairs are also
included.
Ignore unstable
structures:
This option filters out closed structures with positive free
energy.
Window size & Window increment:
Beginning with position 1 of the input sequence, the analysis
is repeatedly processed on subsequences of the specified size.
After each calculation, the results are printed out and the
window is moved by the window position increment, until the end
of the input sequence is reached.
Set maximum loop length:
This option sets the maximum lengths of the considered internal
and bulge loops. The default value here is 30. Note that this
restriction can have a very slight influence on the calculated
structure and shape probabilities.
Normal probability mode:
This is the default shape
probabilites mode.
Also calculate shreps:
Calculates the shape probabilities based on partition function.
Additional to the standard probability mode, the corresponding
shreps with their minimum free energies are calculated. Note
that this mode is slightly slower and can be used with
sequences up to a length of 250 bases.
Shape probabilities for mfe-best shapes:
This mode first calculates the best shapes based on free energy
minimization. In a second step, it calculates the probability
for each of these best shapes. This mode can be used for longer
sequences (up to 500 bases).
Energy range:
This sets the energy range either as percentage value of the
minimum free energy (% of mfe) or as the difference to the
minimum free energy for the sequence (kcal/mol).
Probability cutoff filter:
This option sets a barrier for filtering out results with very
low probabilities during calculation. The default value here is
0.000001, which gives a significant speedup compared to a
disabled filter. Note that this filter can have a slight
influence on the overall results.
Probability output filter:
This option sets a filter for omitting low probability results
during output. Unlike probability cutoff filter, this option
does not have any influence on probabilities beyond this
value.
Number of sampling
iterations:
Number of iterations for Sampling
mode.
Omit sampling output.:
Omit sampling output for Sampling
mode.
RapidShapes calculates exact probabilities for RNA
abstract shapes. Since it is a runtime heuristic it calculates
these exact values much faster than the exhaustive version of
RNAshapes for most of the RNA input sequences. This speed-up is
gained by first guessing a handful of promising shapes. In a
second phase the exact shape probability is calculated in
O(n3) time for each promising shape.
The difference to the exhaustive version is, that
RapidShapes analyses only the promising shapes instead
of all exponential many existing shapes for an input sequence.
Thus the speed-up is the ratio between the promising shapes and
all exponential many shapes for the input sequence. Fewer
promising shapes means faster runtime.
RapidShapes uses the sampling method to guess promising
shapes.
Number of sampling
iterations:
RapidShapes uses a sampling method to gain promising
shapes for which it calculates exact shape probabilities in a
second phase.
Sampling is a stochastic process where one RNA structure is
drawn out of the complete folding space of the input sequence.
The chance to draw a special RNA structure depends on its
minimal free energy.
Repeating this process <Number of sampling
iterations> times and translating the RNA structures
into shapes, the shape probabilities can simple be estimated by
counting their appearance.
More iterations raise the chances to observe more diverse
shapes and thus increases the number of promising shapes for
RapidShapes.
Minimal
shape probability threshold:
The lower the shape probability the less likely is it to find
an RNA sequence forming an according structure in a cell. Since
form follows function one would expect a non functional RNA to
have a relatively high shape probability or at least a
probability of <Minimal shape probability
threshold> percent.
The problem definition of RapidShapes is as follows:
Given an RNA sequence s of length
n and a threshold 0<T≤1, compute all shapes p of s with Prob(p)≥T. This definition permits that some
shapes with sub-threshold probability will also be computed,
but the goal is, of course, to minimize the efforts spent on
those. (T is the variable <Minimal
shape probability threshold>.)
When the accumulated probability of all analyzed shapes exceeds
1-T, no additional shape with Prob(p)≥T can hide in the remaining unexplored
folding space. Thus RapidShapes can stop the
calculation.