|
|
AGenDA - Online Manual
Summary:AGenDA is a
novel tool for gene prediction that relies on cross-species
sequence comparison. Homologous genomic sequences (e.g. from
human and mouse) are repeat-masked
and aligned using the CHAOS and DIALIGN
software programs. Similarities identified by DIALIGN are
clustered and searched for conserved splice signals in order to
identify potential exons. Finally, complete gene models
are constructed from these potential exons using a combinatorial
optimization approach. A more detailed description of the method
is online available:
To speed up the DIALIGN program, CHAOS is used. CHAOS
is a rapid search tool that identifies chains of high-scoring
sequence similarities; these similarities are used as anchor
points to speed-up the DIALIGN alignment procedure. A description
is given in
The new version of DIALIGN that is behind AGenDA is
described in
- B. Morgenstern, O. Rinner, S. Abdeddaïm, D. Haase, K.
Mayer, A. Dress, H.-W. Mewes (2002)
Exon Discovery by Genomic Sequence Alignment.
Bioinformatics 18, 777-787.
The
AGenDA WWW server requires two homologous genomic sequences
as input. These sequences must be in FASTA format, with one
sequence per file. The following is an example of the FASTA
sequence file format:
> seq_name
GACCTCAATCAACTGGAATTTAACAAAACTTTATATGCATAATGTTATCTATATGAATG
AAAGGTATTTATAATATTGAATAGTCTGTATCACACAAATTATTATTAGAAATTATTAAT
ACTAGGTTTTAAAAAAAAAAAAAAAAAAAAAAAAACTAGCATTATAGCAAAACGAAAATG
TAACACTGGTTTTCACTCTTTCTAAAACCTAAAAACCAAACCAAGTAAACCGGATTGTCC
ACCTCTATATATAACATATTTAAACCATATATTCAATATGACATTGTTTGTTTAAGAAAG
The first line starts with ">" and contains sequence name
and comments. The current length limitation for the AGenDA WWW
server is 200 kb per sequence.
Options:
As a
starting point for gene prediction, AGenDA uses segment pairs
("fragments") identified by the DIALIGN program. All fragments
that are contained in the optimal alignment of the input
sequences are considered - provided their score exceeds a
threshold value S. By default, S = 1. Note that
this parameter does not affect the alignment procedure; it
determines which fragments from the calculated alignment are used
for gene prediction.
The current version of DIALIGN can use an
iterative procedure. In a first step, strong similarities between
the sequences are identified; in subsequent steps, weaker
similarities between these strong similarities are searched. This
increases the sensitivity of the program but can also increase
the noise of false positive hits. With this parameter, the user
is allowed to select the degree of similarity of the fragments
that he wants to input to AGenDA. By default, only fragments
identified in the first iteration step are used for gene
prediction, but it is possible to use fragments from subsequent
steps as well.
DIALIGN considers two distinct
levels of sequence similarity, namely similarity at the
nucleotide level and similarity at the peptide level. Fragments
with higher similarity at the nucleotide level are called
N-fragments while those with stronger similarity at the peptide
level are called P-fragments. In the new version of DIALIGN, an
alignment of genomic sequences can consist of both types of
fragments ("mixed alignments"). By default, AGenDA uses both, N-
and P-fragments for gene prediction but it is possible to exclude
N-fragments and to consider P-fragments only. This generally
increases the specificity of the program but reduces its
sensitivity.
Program Output:An email is sent back to the user
containing
- The computed gene models for both input sequences
- A complete list of the potential exons that have
been considered for gene prediction
- A hyperlink to a WWW site with a graphical representation
of the predicted exons together with the underlying sequence
alignment.
Back to AGenDA home page.
|
|