BiBiServ Logo
Attention:
Due to technical maintenance some tools might be unavailable.
See maintenance information.
BiBiServ -
                                    Bielefeld         University Bioinformatic Service
Tools
Education
Administration
Tools
Genome Comparison
Gecko
REPuter
...more
Alignments
e2g
PoSSuMsearch
...more
Primer Design
GeneFisher
RNA Studio
RNAshapes
RNAforester
RNAhybrid
...more
Evolutionary Relationship
ROSE
...more
Others
XenDB
jPREdictor
...more

AGenDA - Online Manual



Summary:

AGenDA is a novel tool for gene prediction that relies on cross-species sequence comparison. Homologous genomic sequences (e.g. from human and mouse) are repeat-masked and aligned using the CHAOS and DIALIGN software programs. Similarities identified by DIALIGN are clustered and searched for conserved splice signals in order to identify potential exons. Finally, complete gene models are constructed from these potential exons using a combinatorial optimization approach. A more detailed description of the method is online available: To speed up the DIALIGN program, CHAOS is used. CHAOS is a rapid search tool that identifies chains of high-scoring sequence similarities; these similarities are used as anchor points to speed-up the DIALIGN alignment procedure. A description is given in The new version of DIALIGN that is behind AGenDA is described in
  • B. Morgenstern, O. Rinner, S. Abdeddaïm, D. Haase, K. Mayer, A. Dress, H.-W. Mewes (2002)
    Exon Discovery by Genomic Sequence Alignment.
    Bioinformatics 18, 777-787.


Program Input:

The AGenDA WWW server requires two homologous genomic sequences as input. These sequences must be in FASTA format, with one sequence per file. The following is an example of the FASTA sequence file format:
> seq_name 
GACCTCAATCAACTGGAATTTAACAAAACTTTATATGCATAATGTTATCTATATGAATG
AAAGGTATTTATAATATTGAATAGTCTGTATCACACAAATTATTATTAGAAATTATTAAT
ACTAGGTTTTAAAAAAAAAAAAAAAAAAAAAAAAACTAGCATTATAGCAAAACGAAAATG
TAACACTGGTTTTCACTCTTTCTAAAACCTAAAAACCAAACCAAGTAAACCGGATTGTCC
ACCTCTATATATAACATATTTAAACCATATATTCAATATGACATTGTTTGTTTAAGAAAG
The first line starts with ">" and contains sequence name and comments. The current length limitation for the AGenDA WWW server is 200 kb per sequence.

Options:

Threshold:

As a starting point for gene prediction, AGenDA uses segment pairs ("fragments") identified by the DIALIGN program. All fragments that are contained in the optimal alignment of the input sequences are considered - provided their score exceeds a threshold value S. By default, S = 1. Note that this parameter does not affect the alignment procedure; it determines which fragments from the calculated alignment are used for gene prediction.

Number of Iterations:

The current version of DIALIGN can use an iterative procedure. In a first step, strong similarities between the sequences are identified; in subsequent steps, weaker similarities between these strong similarities are searched. This increases the sensitivity of the program but can also increase the noise of false positive hits. With this parameter, the user is allowed to select the degree of similarity of the fragments that he wants to input to AGenDA. By default, only fragments identified in the first iteration step are used for gene prediction, but it is possible to use fragments from subsequent steps as well.

Nucleotide / peptide-level similarity:

DIALIGN considers two distinct levels of sequence similarity, namely similarity at the nucleotide level and similarity at the peptide level. Fragments with higher similarity at the nucleotide level are called N-fragments while those with stronger similarity at the peptide level are called P-fragments. In the new version of DIALIGN, an alignment of genomic sequences can consist of both types of fragments ("mixed alignments"). By default, AGenDA uses both, N- and P-fragments for gene prediction but it is possible to exclude N-fragments and to consider P-fragments only. This generally increases the specificity of the program but reduces its sensitivity.

Program Output:

An email is sent back to the user containing
  • The computed gene models for both input sequences
  • A complete list of the potential exons that have been considered for gene prediction
  • A hyperlink to a WWW site with a graphical representation of the predicted exons together with the underlying sequence alignment.

Back to AGenDA home page.
Welcome
Submission
Manual
Examples
Contact
Mon Dec 15 11:37:48 2008