A Sample Application


Prerequisite is the 1,790,383 bases Pyrococcus abyssi whole genome sequence in Fasta format:

> more pabyssi.fna

>emb|AL096836| Pyrococcus abyssi complete genome
GGGCTTTAGCCTCCTTCACCGCTTCCACGATTTTCTGCCTGTCAAAGGGCATTCTAGACATCCCTCCTTA
GGTTTTTAATTAAAAATTCAAGGTGGAGTAAAAAGGGATGTTTTTAAATTTTTCTCACTCTTTCTCGGCC
TTCTCAAATAGCTCGTCGTAAACCCCTTCATCTATTTCTCTCTGAACTTCCCTTGGATCCTTGCCTTCGA
CGGTAACTCCCATGCTTAAAGCCGTTCCAATGACTTCCTTGGCGGCAGCCTTAAGAGTCAATGCTAGCAT
CTGGTTTCTCTTCATCTTAGCTATCTTGATAACTTGCTCCATCGTTAAGTTCCCAACGATATTGTGCTTC
GGCTCACCGCTGCCCTTCTCGAGCCCTAGTTCCTTCTTTATCAACTGGCTAGTTGGAGGGACTCCAACTT
CTATCTCGAACTGCTTGGTTACTGGATCTACGATGATCTTCACTGGGACCTGCATCCCAGCGAACTCTTT
[...]

For simplicity in this sample application we restrict to the case of exact repeats. Note that the file format used by repvis for displaying repeats is compatible with the output format of repfind and repselect.

First, we create the binary output file:

> repfind -f -l 15 -allmax -mem -b pabyssi.fna > pabyssi.fna.bin

# space peak in megabytes: 31.37

The mem switch prints the RAM size used for the calculation. Now, we display all repeats data files in the current directory via repvis:

> repvis ./

The longest forward repeat found in the data file is displayed. Now we adjust the least repeats size to be displayed to 30 bases and switch the color scheme to emphasize longest repeats.

A left mouse click on the repeats graph brings up the Inspector window. There, more left mouse clicks allow to zoom in, right mouse clicks to zoom out. Now let's zoom with 10 mouse clicks round about the position pointed to by the arrow in the picture below.

In the next step we investigate the yellow-green repeat just to the right of the red repeat by clicking on it.

The repeat characteristics appear in the Data Browser at the bottom of the window:

>ID=17958, Kind=F, Length=71, Pos1=384257, Pos2=1316890, Spacer=932562, Exact, E-Value=1.57e-31
gctatgaagtttgcctggctttcagtggcaactgccttagctagcaaggtcttaccagttcccggtgggcc

Each repeat from the repvis binary file is assigned a unique ID. This number is the first entry in the output above. Next is the repeats kind, followed by the repeat length and the two starting positions. The Spacer, calculated as Pos2-(Pos1+Length), describes the size of the input sequence flanked by the repeat. A negative value denotes an overlapping repeat. Select the data line and select the 'View Sequence' button. The repeats DNA sequence is displayed. Alternatively, the repeat sequence can be submitted to a BLAST or Fasta database query.




Batch Mode Run

Finally, we want to create a repeats graph image which can be imported in any word processor. repvis offers a batch mode feature which allows to generate portable pixmap file format (ppm) images without launching the interactive graphical user interface:

> repvis -batch -f -l 20 pabyssi.fna.bin

*** BATCH MODE ***

Repeats Statistics
        Min     Max
F       15      132
R       -       -
C       -       -
P       -       -
Running in command line mode.
Processing file: pabyssi.fna.bin
Sequence size: 1765118
Creating 790x350 image
Writing file: pabyssi.fna.bin.ppm
Writing file: pabyssi.fna.bin_key.ppm
Done.
pabyssi.fna.bin.ppm

pabyssi.fna.bin_key.ppm

Note, that the repeats plot and the color key are stored in two image files.