BiBiServ Logo
Attention:
Due to technical maintenance some tools might be unavailable.
See maintenance information.
BiBiServ -
                                    Bielefeld         University Bioinformatic Service
Tools
Education
Administration
Tools
Genome Comparison
Gecko
REPuter
...more
Alignments
PoSSuMsearch2
ChromA
...more
Primer Design
GeneFisher2
RNA Studio
RNAshapes
KnotInFrame
RNAhybrid
...more
Evolutionary Relationship
ROSE
...more
Others
XenDB
jPREdictor
...more

RNAshapes - Manual


Program mode

RNAshapes offers six major program modes:

  • Shape folding:
  • RNA folding based on abstract shapes. This is the standard mode of operation when no other options are given. It calculates the shapes and the corresponding shreps based on free energy minimization.
  • Suboptimal shape folding:
  • Complete suboptimal folding of RNA. This mode uses a non-ambiguous grammar that also handles dangling bases of multiloop components in a non-ambiguous way.
  • Shape probabilities:
  • This option calculates the shape probabilities based on partition function. The probability of a shape is the sum of the probabilities of all structures that fall into this shape.
  • RapidShapes:
  • Computation of shape probabilities requires exponential runtime. RapidShapes aims at a heuristic improvement of runtime, while still computing exact probability values. It computes the shapes above a specified probability threshold T by generating a list of promising shapes and constructing specialized folding programs for each shape to compute its probability.
  • Sampling:
  • Probabilistic sampling based on partition function. This mode combines stochastic sampling with a-posteriori shape abstraction. A sample from the structure space holds M structures together with their shapes, on which classification is performed. The probability of a shape can then be approximated by its frequency in the sample.
    Sequences up to a length of around 1500 can be handled with this mode. In our experience, 1000 iterations are sufficient to achieve reasonable results for shapes with high probability.
  • Consensus shapes:
  • For a family of RNA sequences, this method independently enumerates the near-optimal abstract shape space, and predicts as the consensus an abstract shape common to all sequences. For each sequence, it delivers the thermodynamically best structure which has this common shape. Since the shape space is much smaller than the structure space, and identification of common shapes can be done in linear time (in the number of shapes considered), the method is essentially linear in the number of sequences.

Parameter Description

Sequence Format:
An input sequence for RNAshapes may have up to 2k nucleotids. RNAshapes supports sequences in FASTA format via file upload or copy&paste in a textfield. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.

  • The description line starts with a greater than symbol (">").
  • The word following the greater than symbol (">") immediately is the "ID" (name) of the sequence, the rest of the line is the description.
  • The "ID" and the description are optional.
  • all lines of text should be shorter than 80 (normally 60) characters.
  • the sequence ends if there is another greater than symbol (">") symbol at the beginning of a line and another sequence begins.
The following example contains one sequence (sequence_1):
>sequence_1
caagcacagaaacctatggcataaatccctctgagacgcgttgtactatggttatctaat
tctccggcgacacaagttgtctaaccgtgatcaccttaaagggcaagccgcccaatagat
gttagttaatactacgtaccaagtatgcctgcgcttggtaaagccgcctgtccatagttc
tactagggtagagcttcaggatgctctatagttcgagcggttctttgatcaactcgacta
gctaccaccatgtctgtgttttattgcacgcaaagtcgtaagtttaaacggaccaagaag
ccttcttcggtcagtagcaggttaagggccaagtacaagcctctccaggaatgcttaacg
gcatcgatgcaacttggacaagtaaacatcctgaagctta

Shape type:
The shape type is the level of abstraction or dissimilarity which defines a different shape. In general, helical regions are depicted by a pair of opening and closing square brackets and unpaired regions are represented as a single underscore. The differences of the shape types are due to whether a structural element (bulge loop, internal loop, multiloop, hairpin loop, stacking region and external loop) contributes to the shape representation: Five types are implemented. Their differences are shown in the following example:

CGUCUUAAACUCAUCACCGUGUGGAGCUGCGACCCUUCCCUAGAUUCGAAGACGAG 
((((((...(((..(((...))))))...(((..((.....))..))))))))).. 
Type Description Result
1 Most accurate - all loops and all unpaired
[_[_[]]_[_[]_]]_
2 Nesting pattern for all loop types and unpaired regions in external loop and multiloop
[[_[]][_[]_]]
3 Nesting pattern for all loop types but no unpaired regions
[[[]][[]]]
4 Helix nesting pattern in external loop and multiloop
[[][[]]]
5 Most abstract - helix nesting pattern and no unpaired regions
[[][]]
The following image also describes the differences between shape types:

Match shape:
Specify a shape for the corresponding mode of operation.

Calculate structure probabilities:
This calculates the probability of every computed structure. It can be combined with any sequence analysis mode.

Generate structure graphs:
This generates postscript structure graphs for each given sequence.

Allow lonely base pairs:
In default mode, RNAshapes only considers helices of length 2 or longer. With this option, lonely base pairs are also included.

Ignore unstable structures:
This option filters out closed structures with positive free energy.

Window size & Window increment:
Beginning with position 1 of the input sequence, the analysis is repeatedly processed on subsequences of the specified size. After each calculation, the results are printed out and the window is moved by the window position increment, until the end of the input sequence is reached.

Set maximum loop length:
This option sets the maximum lengths of the considered internal and bulge loops. The default value here is 30. Note that this restriction can have a very slight influence on the calculated structure and shape probabilities.

Normal probability mode:
This is the default shape probabilites mode.

Also calculate shreps:
Calculates the shape probabilities based on partition function. Additional to the standard probability mode, the corresponding shreps with their minimum free energies are calculated. Note that this mode is slightly slower and can be used with sequences up to a length of 250 bases.

Shape probabilities for mfe-best shapes:
This mode first calculates the best shapes based on free energy minimization. In a second step, it calculates the probability for each of these best shapes. This mode can be used for longer sequences (up to 500 bases).

Energy range:
This sets the energy range either as percentage value of the minimum free energy (% of mfe) or as the difference to the minimum free energy for the sequence (kcal/mol).

Probability cutoff filter:
This option sets a barrier for filtering out results with very low probabilities during calculation. The default value here is 0.000001, which gives a significant speedup compared to a disabled filter. Note that this filter can have a slight influence on the overall results.

Probability output filter:
This option sets a filter for omitting low probability results during output. Unlike probability cutoff filter, this option does not have any influence on probabilities beyond this value.

Number of sampling iterations:
Number of iterations for Sampling mode.

Omit sampling output.:
Omit sampling output for Sampling mode.



Parameter Description for RapidShapes


RapidShapes calculates exact probabilities for RNA abstract shapes. Since it is a runtime heuristic it calculates these exact values much faster than the exhaustive version of RNAshapes for most of the RNA input sequences. This speed-up is gained by first guessing a handful of promising shapes. In a second phase the exact shape probability is calculated in O(n3) time for each promising shape.
The difference to the exhaustive version is, that RapidShapes analyses only the promising shapes instead of all exponential many existing shapes for an input sequence. Thus the speed-up is the ratio between the promising shapes and all exponential many shapes for the input sequence. Fewer promising shapes means faster runtime.
RapidShapes uses the sampling method to guess promising shapes.

Number of sampling iterations:
RapidShapes uses a sampling method to gain promising shapes for which it calculates exact shape probabilities in a second phase.
Sampling is a stochastic process where one RNA structure is drawn out of the complete folding space of the input sequence. The chance to draw a special RNA structure depends on its minimal free energy.
Repeating this process <Number of sampling iterations> times and translating the RNA structures into shapes, the shape probabilities can simple be estimated by counting their appearance.
More iterations raise the chances to observe more diverse shapes and thus increases the number of promising shapes for RapidShapes.

Minimal shape probability threshold:
The lower the shape probability the less likely is it to find an RNA sequence forming an according structure in a cell. Since form follows function one would expect a non functional RNA to have a relatively high shape probability or at least a probability of <Minimal shape probability threshold> percent.

The problem definition of RapidShapes is as follows: Given an RNA sequence s of length n and a threshold 0<T≤1, compute all shapes p of s with Prob(p)≥T. This definition permits that some shapes with sub-threshold probability will also be computed, but the goal is, of course, to minimize the efforts spent on those. (T is the variable <Minimal shape probability threshold>.)

When the accumulated probability of all analyzed shapes exceeds 1-T, no additional shape with Prob(p)≥T can hide in the remaining unexplored folding space. Thus RapidShapes can stop the calculation.

Welcome
Submission
WebService
References
Manual
Download
Contact
Fri May 3 15:04:54 2013