BiBiServ Logo
Attention:
Due to technical maintenance some tools might be unavailable.
See maintenance information.
BiBiServ -
                                    Bielefeld         University Bioinformatic Service
Tools
Education
Administration
Tools
Genome Comparison
Gecko
REPuter
...more
Alignments
PoSSuMsearch2
ChromA
...more
Primer Design
GeneFisher2
RNA Studio
RNAshapes
KnotInFrame
RNAhybrid
...more
Evolutionary Relationship
ROSE
...more
Others
XenDB
jPREdictor
...more

REPuter - Manual

changes to previous online versions of reputer

We do not offer precalculated genomes any more.The online version of REPuter has only little restrictions now (Our server capacity grows), so there is no reason to offer any precompute genomes (which are mainly not needed). The textual output of reputer can be optional filtered, before downloading. The Graphical Visualisation is available as static image (as before) and as an partly interactive version (activated JavaScript is required). A full dynamic/interactive visualisation will be part of an future release.

Parameter Description

REPuter offers a various parameters. All off them are explained in this chapter

Sequence Format: REPuter supports sequences in FASTA format via file upload or copyîpaste in a textfield. A sequence in FASTA format begins with a single-line description, followed by lines of sequence data.

  • The description line starts with a greater than symbol (">").
  • The word following the greater than symbol (">") immediately is the "ID" (name) of the sequence, the rest of the line is the description.
  • The "ID" and the description are optional.
  • all lines of text should be shorter than 80 (normally 60) characters.
  • the sequence ends if there is another greater than symbol (">") symbol at the beginning of a line and another sequence begins.
The following example contains one sequence (sequence_1):
>sequence_1
caagcacagaaacctatggcataaatccctctgagacgcgttgtactatggttatctaat
tctccggcgacacaagttgtctaaccgtgatcaccttaaagggcaagccgcccaatagat
gttagttaatactacgtaccaagtatgcctgcgcttggtaaagccgcctgtccatagttc
tactagggtagagcttcaggatgctctatagttcgagcggttctttgatcaactcgacta
gctaccaccatgtctgtgttttattgcacgcaaagtcgtaagtttaaacggaccaagaag
ccttcttcggtcagtagcaggttaagggccaagtacaagcctctccaggaatgcttaacg
gcatcgatgcaacttggacaagtaaacatcctgaagctta

Match Direction:REPuter offer four possibilities of searching for repeats:


  1. forward(direct) match

  2. reverse match

  3. complement match

  4. palindromic match

Maximum Computed Repeats: show the repeats with smallest E-value (default :50)

Minimal Repeat Size: specify that repeats must have the given length. Attention : long sequences and a small minimum repeat size results in a long computation time.

Error Distance: Search repeats up to the given hamming/edit distance

Output Description

After the REPuter run has finished, a reputer result page is shown, which offer various opinions to view the result.

Textual Output: The result of a run can viewed/downloaded as a space separated table. Optional the output can be filtered.The head of a sample output looks like :

# 235 -3 8 reputer_bibitest_1091788224_479525172.xmlrpc
 9 150 F  9 151  0 5.92e-02
 8 150 F  8 152  0 2.37e-01
10 150 F 10 153 -1 4.44e-01
 9 150 F  9 154 -1 1.60e+00
[1][2][3][4][5] [6]   [7]
...
The first line, starting with '#' is acomment. The sequence length (235), the maximum allowed distance ([-]3), the minimum repeat size (8) and the processed file are described here. The following lines contain repeats found , one line each .
  • [1] - repeat length of the first part
  • [2] - starting position of the first part
  • [3] - match direction
  • [4] - repeat length of the second part
  • [5] - starting position of the second part
  • [6] - distance of this repeat
  • [7] - calculated evalue of this repeat

Graphical Output: The output of the REPuter is processed and gives a nice overview of the number, the length and the location of repeats in the uploaded sequence. In this version of online REPuter we offer two kinds of visualisations - a static image and partly interactive version (a modern browser like IE 5.5 and above, Mozilla, Netscape 6 and above, Opera, etc. and activated JavaScript are required). For the next release a full interactive visualisation (as Java Applet) is planned.

Theoretical Background

This tool reports maximal forward, reverse, complemented, and reverse complemented repeats for a given input sequence. The definition of 'maximality' as in [1] basically limits the output to only the longest repeats in the sequence. These may contain shorter repeats which are not explicitly reported.

Let your input sequence be a text string s of length n.
The characters in s are indexed from 0 to n-1, therefore s can be written as s=s0s1...sn-1.
For each reported repeat denoted by a triple (l, i, j), i.e. size, starting position of a piece of sequence and starting position of its repeat counterpart, we postulate the size l>0 and the starting positions i, j[0, n-1].

REPuter distinguishes four different kinds of repeats:

  • Maximal forward repeat, MFR
  • Maximal reverse repeat, MRR
  • Maximal complemented repeat, MCR
  • Maximal palindromic (reverse complemented) repeat, MPR

The triple (l, i, j) is a MFR if:
  1. ij
    (There is no identical starting position).
  2. sisi+1...si+l-1 = sjsj+1...sj+l-1
    (Both parts of the repeat have the same size).
  3. If 0i-1, then si-1 sj-1
    (If the first part of the repeat starts at a position greater or equal to 0, then the characters immediately to the left of each part are different).
  4. If j+ln-1, then si+l sj+l
    (If the ending positon of the second part of the repeat is less or equale than the total input sequence size, then the characters immediately to the right of each part are different).

[1] Gusfield, D., Algorithms on Strings, Trees, and Sequences, Cambridge University Press, 1997

REPuter Sample Run

THIS EXAMPLE IS NOT UPDATED UNTIL NOW, SO BE CAREFULL READING THIS CHAPTER

Consider the following 30 bases input sequence, which is a three-fold repetition of 'gacagtcagt':

   >5.seq
   gacagtcagtgacagtcagtgacagtcagt
The reputer engine produces the following raw data output, starting with the input sequence name. Following, each line describes one repeat, its size, starting position of the first part, one of the four possible modi (F, P, R, C), then the starting position of the second part.

The output below therefore reports two repeats, both starting at position 0. The first part of the first repeat starts at position 0, its second part at position 20.

   # /tmp/5.seq.flat 30
   10 0 F 20
   20 0 F 10
Drawing the sequence in dark blue and the repeats in lightblue this might look like this:

Note that according to the 'left character' rule 3. for MFRs in the Theoretical Background section, we do not report a repeat like "10 0 F 10", since this short repeat will become part of "20 0 F 10".

Additionally, to keep the starting position information visible, each part of a repeat is displayed on a separate strand:

Welcome
Submission
References
Manual
WebService
Contact
Wed Apr 3 16:19:54 2013