BiBiServ2 - RapidShapes

RapidShapes computes the shapes above a specified alpha-threshold by generating a list of promising shapes and constructing specialized folding programs for each shape to compute its share of Boltzmann probability. This aims at a heuristic improvement of runtime, while still computing exact probability values.

Because of many independent components of RapidShapes, the runtime can be decrease once more, during parallel computation. Thus, probabilistic shape analysis becomes feasible in medium-scale applications, such as the screening of RNA transcripts in a bacterial genome.

The RapidShape program uses a pipeline to predict exact probabilities of RNA shapes.

This pipeline contains several steps:

Illustration of an example run. Figure 1: Illustration gives an overview of probability calculation for a shape.

Q _p(s) is the sum of partition function values for a specific shape class with the parameters:

s - sequence
E _x - energy of structure x in kcal/mol
R - universal gas constant (0.00198717 kcal/K)
T - temperature in Kelvin
F _p(s) - a set of all possible shapes the sequence s could be fold in by specific TDM

Prob(p,s) is the probability of a specific shape class Q _p(s) with the parameters:

Q _p(s) - partition function value of a specific shape class
Q(s) - partition function value of the complete folding space

Note: this figure is simplified for better understanding.

At first rapidshapes creates a list of sampled abstract RNA shapes. Abstract shape classes are used, because many different concrete RNA shapes can be described as one abstract shape. The advantage of that is used in the next step.

For each abstract shape class there will be generate a specific thermodynamic matcher. For that a tree grammar is used to describe the structures. Tree grammars make explicit the semantics of each grammar rule, and can be compiled directly into executable code using the algebra dynamic programming technology (ADP).

This specific ADP grammar becomes translated into C++ code during the GAPc compiler. Supported by this method shape specific TDM's are generated.

In the next step the C++ code becomes compiled into an executable binary by the gcc compiler.

These shape specific TDM's fold the given sequence s into all possible structures which fit into the abstract shape class the TDM were made from.

Every structure gets ranked by the partition function and the values of all structures of a specific shape get accumulated to calculate the probability of the whole shape class. Thus, rapid shapes computes the probability of a sub-foldingspace by division of the partition function value of an sub-foldingspace by the partition function value of the complete folding space.

While the specified probability alpha is not reached yet, the program continues calculating probabilities of sub-foldingspaces.

Illustration of all steps from shape-sampling to TDM generation.

Figure 2: Shows the exact steps form shape-sampling to TDM generation.

The first step, the sampling, can be replaced by other methods to gain a initial set of promising shape classes whose proability is afterwards exactly computed by the rest of the program. The following four different functions allow for alternative shape guessing methods (sample, kbest, subopt and list):

Type

Description

Result

Most accurate - all loops and all unpaired

[_[_[]]_[_[]_]]_

Nesting pattern for all loop types and unpaired regions in external loop and multiloop

[[_[]][_[]_]]

Nesting pattern for all loop types but no unpaired regions

[[[]][[]]]

Helix nesting pattern in external loop and multiloop

[[][[]]]

Most abstract - helix nesting pattern and no unpaired regions

[[][]]

sample

kbest

subopt

list

In-/Output values

INPUT :: RNA sequence

OUTPUT :: output

Parameter