Bielefeld University Center of Biotechnoloy Institute of Bioinformatics BiBiServ
seq_head.png 800x120
seq_nav_dist.png Databases Data Formats Genome Browser Alignments Database Search RNA Secondary Structure Webservices Literature
seq_nav_dist.png 31x20 BLAST FASTA e2g  
institution_link.png 35x435  
   
BLAST
BLAST (Basic Local Alignment Search Tool) is the most well known sequence database search tool. It was developed by S. Altschul et al. in the 1990s [Altschul et al. 1990].
exercise.png 57x15  
BLAST is a heuristic that works by finding word-matches between the query and database sequences ("seeds"). These seeds can then be used to initiate extensions that might lead to full-blown alignments. For nucleotide vs. nucleotide searches an exact match of the entire word is required before an extension is initiated, so that one normally regulates the sensitivity and speed of the search by increasing or decreasing the word-size. For other BLAST searches non-exact word matches are taken into account based upon the similarity between words.
exercise.png 57x15  
BLAST programs
There are many choices to make between different BLAST programs and databases. Some of these choices are better for answering some questions then others. The NCBI has created a selection chart to help you make the decision of BLAST program for the question you are asking. Below is a very short table for choosing the right BLAST program. For more details, see the selection chart.
   
QueryDatabaseProgram to Use
NucleotideNucleotideblastn, megablast, or tblastx
NucleotideProteinblastx
ProteinNucleotidetblastn
ProteinProteinblastp
   
The "x" (blastx, tblastx) and "t" versions (tblastn) of BLAST translate the query or database sequences in all six reading frames on the fly, so that the comparison is done on the protein level.
   
BLAST databases
There are a number of databases offered by NCBI to search in. Below is a table of the most important databases. According to their content, they are grouped into nucleotide and protein databases. The databases and their detailed compositions are listed in the two tables.
   
Protein Databases for BLAST
DatabaseContent Description
nr Non-redundant GenBank CDS translations + PDB + SwissProt + PIR + PRF, excluding those in env_nr.
refseqProtein sequences from NCBI reference sequence project.
swissprotLast major release of the SWISS-PROT protein sequence database (no incremental updates).
monthAll new or revised GenBank CDS translations + PDB + SwissProt + PIR + PRF released in the last 30 days.
pdbSequences derived from the 3-dimensional structure records from the Protein Data Bank.

   
Nucleotide Databases for BLAST
DatabaseContent Description
nrAll GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant" due to computational cost.
refseq_mrnamRNA sequences from NCBI Reference Sequence Project.
refseq_genomicGenomic sequences from NCBI Reference Sequence Project.
estDatabase of GenBank + EMBL + DDBJ sequences from EST division.
est_humanHuman subset of est.
est_mouseMouse subset of est.
est_othersSubset of est other than human or mouse.
gssGenome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences.
htgsUnfinished High Throughput Genomic Sequences: phases 0, 1 and 2. Finished, phase 3 HTG sequences are in nr.
patNucleotides from the Patent division of GenBank.
monthAll new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days.
There are more databases available at NCBI. For a complete list, see the selection chart.
seq_ctbg1x1.png 1x1