Bielefeld University Center of Biotechnoloy Institute of Bioinformatics BiBiServ
seq_head.png 800x120
seq_nav_dist.png Databases Data Formats Genome Browser Alignments Database Search RNA Secondary Structure Webservices Literature
seq_nav_dist.png 31x20 BLAST FASTA e2g  
institution_link.png 35x435  
 
 
Database Search
In this section you will learn about one of the most important things in molecular biology: the comparison of data sequenced in the lab (nucleotide or protein) with all known sequences collected in a certain database. This procedure is often referred to as homology search. The search results, sequences that are similar to our sequence, might give an indication of the function of our new sequenced gene.
NCBI's non-redundant (NR) protein database contains 2.5 million sequences with almost 850 million amino acids (June 2005). This precludes the direct approach of aligning the query sequence with each sequence in the database. Instead, efficient filtering or indexing methods are used to cut down the running time. These methods do not necessarily guarantee to find the best match, but nevertheless they are invaluable tools in a molecular biologist's daily life.
   
BLAST
The probably most well known database search tool is BLAST (Basic Local Alignment Search Tool), developed by S. Altschul et al. in the 1990s [Altschul et al. 1990].
 
FASTA
FASTA is another commonly used search sequence database search tool written by W.R. Pearson and D.J. Lipman in 1988 [Pearson et al. 1988].
 
SSEARCH
SSEARCH performs a rigorous Smith-Waterman alignment between a protein sequence and another protein sequence or a protein database, or with DNA sequence to another DNA sequence or a DNA library. As SSEARCH does a full alignment between the query and all database sequences, it is the most sensitive tool to use. But this also takes some time to compute.
 
e2g
e2g [Krüger et al. 2004] is a specialized tool to compare a genomic sequence against all ESTs of the same organism. It uses an index structure which allows to compute the matches very efficiently.
seq_ctbg1x1.png 1x1