|
|
|
|
BLAST (Basic Local Alignment Search Tool) is
the most well known sequence database search tool. It was
developed by S. Altschul et al. in
the 1990s [ Altschul et al. 1990].
|
|
|
BLAST is a heuristic that works by
finding word-matches between the query and database sequences
("seeds"). These seeds can then be used to initiate extensions
that might lead to full-blown alignments. For nucleotide
vs. nucleotide searches an exact match of the entire word is
required before an extension is initiated, so that one normally
regulates the sensitivity and speed of the search by increasing
or decreasing the word-size. For other BLAST searches non-exact
word matches are taken into account based upon the similarity
between words.
|
|
|
There are many choices to make between
different BLAST programs and databases. Some of these choices
are better for answering some questions then others. The NCBI
has created a selection chart to help you make the
decision of BLAST program for the question you are asking. Below
is a very short table for choosing the right BLAST program. For
more details, see the selection chart. |
|
|
Query | Database | Program to Use |
Nucleotide | Nucleotide | blastn, megablast, or tblastx |
Nucleotide | Protein | blastx |
Protein | Nucleotide | tblastn |
Protein | Protein | blastp |
|
|
|
The "x" (blastx, tblastx) and "t"
versions (tblastn) of BLAST translate the query or database
sequences in all six reading frames on the fly, so that the
comparison is done on the protein level.
|
|
|
There are a number of databases
offered by NCBI to search in. Below is a table of the most
important databases. According to their content, they are
grouped into nucleotide and protein databases. The databases and
their detailed compositions are listed in the two tables.
|
|
|
Protein Databases for BLAST |
Database | Content Description |
nr |
Non-redundant GenBank CDS translations + PDB + SwissProt
+ PIR + PRF, excluding those in env_nr. |
refseq | Protein sequences from NCBI reference sequence project. |
swissprot | Last major release of the SWISS-PROT protein sequence database (no incremental updates). |
month | All new or revised GenBank CDS translations + PDB + SwissProt + PIR + PRF released
in the last 30 days. |
pdb | Sequences derived from the 3-dimensional structure records from the Protein Data Bank. |
|
|
|
Nucleotide Databases for BLAST |
Database | Content Description |
nr | All GenBank + EMBL + DDBJ + PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences).
No longer "non-redundant" due to computational cost. |
refseq_mrna | mRNA sequences from NCBI Reference Sequence Project. |
refseq_genomic | Genomic sequences from NCBI Reference Sequence Project. |
est | Database of GenBank + EMBL + DDBJ sequences from EST division. |
est_human | Human subset of est. |
est_mouse | Mouse subset of est. |
est_others | Subset of est other than human or mouse. |
gss | Genome Survey Sequence, includes single-pass genomic data, exon-trapped sequences, and Alu PCR sequences. |
htgs | Unfinished High Throughput Genomic Sequences: phases 0, 1 and 2. Finished, phase 3 HTG sequences are in nr. |
pat | Nucleotides from the Patent division of GenBank. |
month | All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days. |
There are more databases available at
NCBI. For a complete list, see the selection chart.
|
|
|