

repselect allows to select interesting repeats from the output of repfind as specified by user-defined criteria. It delivers repeats of chosen length, degeneracy or significance into further analysis routines. Moreover, it allows to sort repeats according to different criteria. The input for repselect is a file produced by repfind or repselect using option b for binary output. The output of repselect goes to standard output.
The options for repselect are as follows:
Note the following when combining the options:
A selection function must be declared with the following function header:
The first argument seq of selectrepeat points
to the input sequence, possibly after replacing or deleting wildcards
according to the options used for repfind. Moreover, instead
of the bases A, C, G, T, seq contains integers
0, 1, 2, 3 encoding the bases. That is, 0 stands for A,
1 for C, 2 for G, and 3 for T. To show the base at
position i, one can simply write ALPHABET[seq[i]] in a
C-statement. To refer to these integer codes, one can use the
symbolic constants ACODE, CCODE, GCODE, and
TCODE.
The second argument seqlen of selectrepeat
is the length of the input sequence. That is, the bases of the input
sequence are addressed by an index in the range [0,...,seqlen-1].
The third argument rep of selectrepeat refers to a
repeat record, containing all necessary information about a repeat. The
C-declaration of the type Repeat can be found in the file
select.h, which is part of the binary distribution of REPuter,
see below.
The function selectrepeat is applied to each repeat. If
it returns a value different from 0, the repeat is shown. If
the returned value is 0, then it is rejected and not shown.
For example, the following function accepts repeats of length at most 200.
Other examples for selection functions can be found in the subdirectory
SELECT. This also contains the file select.h and a
makefile, showing how to compile a shared object for the supported platforms.
The codes for the different bases are defined in the following lines:
To distinguish forward, palindromic, reverse, and complemented repeats,
we use the following type;
There are basically two types of repeats. Exact and mismatch repeats
refer to sequences of equal length. Differences repeats may refer to
sequences of different length. We use the following enumeration
to distinguish these types of repeats.
A repeat is represented by a record of the following type. If the type of
the repeat is Eqlenreptype, then the component
length2 is not defined, and both instances of the repeat have
length length1. For the distance-value the rules
as sepcified in section Basic Notions hold.
If the E-value of a repeat is smaller than 10-300, then
evalue=0.00.

Specifying a Selection Function
int selectrepeat(unsigned char *seq,unsigned int seqlen,Repeat *rep)
int selectrepeat(unsigned char *seq,unsigned int seqlen,Repeat *rep)
{
if(rep->length1 <= 200)
{
return 1; /* accept */
} else
{
return 0; /* reject */
}
}

The File select.h
#define ACODE 0 /* the integer code for base A */
#define CCODE 1 /* the integer code for base C */
#define GCODE 2 /* the integer code for base G */
#define TCODE 3 /* the integer code for base T */
#define ALPHABET "acgt" /* transform codes into bases */
typedef enum
{
FKIND = 0, /* forward repeat */
PKIND, /* palindromic repeat */
RKIND, /* reversed repeat */
CKIND /* complemented repeat */
} Kind;
typedef enum
{
Eqlenreptype = 0, /* exact and mismatch repeat */
Difflenreptype /* differences repeat */
} Reptype;
typedef struct
{
Reptype reptype; /* type of the repeat */
int distance; /* distance between the two repeat instances */
double evalue; /* E-value */
Kind kind; /* one of the values FKIND, PKIND, CKIND, RKIND */
unsigned int length1, /* the length of the first instance */
start1, /* the starting position of the first instance */
length2, /* the length of the second instance */
start2; /* the starting position of the second instance,*/
/* if defined */
} Repeat;