Introduction

A tool for the systematic study of the repetitive structure of complete genomes must satisfy the following criteria:

The REPuter program family described herein satisfies these requirements in the following way:

repfind uses an efficient and compact implementation of suffix trees in order to locate exact repeats in linear space and time. This time-critical task can be done in linear time for sequences up to the size of the human genome. These exact repeats are used as seeds from which significant degenerate repeats are constructed allowing for mismatches, insertions, and deletions. Note that our program is not heuristic: it guarantees to find all degenerate repeats as specified by the parameters. Output size can be controlled via parameters for minimum length and maximum error. Output is sorted by significance scores (E-values) calculated according to the distance model used.

repselect allows to select interesting repeats from the output of repfind as specified by user-defined criteria. It delivers a list of repeats of chosen length, degeneracy or significance into further analysis routines.

repvis visualizes the output from repfind. A color-code indicates significance scores, and a scroll bar controls the amount of data displayed. A zooming function provides whole genome views as well as detailed presentations of selected regions.

This manual describes the above programs. We postpone the definition of the basic notions to the appendix.




About this Manual

To differenciate the function of information in this manual, we use the following typographic conventions. Note that the different text styles require a Style Sheets enabled browser.

Also note, that due to the limitations of the HTML language, the length parameter l is sometimes written as . The second notation is produced by the LaTeX to HTML converter.