bibi-help(at)techfak.uni-bielefeld.de    


GeneFisher Help Contents



GeneFisher Input Sequence Format

To submit a set of amino or nucleotide sequences, the FASTA format is recommended. FASTA sequence files consist of a sequence description line starting with '>', followed by the sequence data:
>sequence_description
GAATTC...
GeneFisher also understands the following formats:

  • IG/Stanford, used by Intelligenetics and others
  • GenBank/GB, genbank flatfile format
  • NBRF format
  • EMBL, EMBL flatfile format
  • DNAStrider, for common Mac program
  • Fitch format, limited use
  • Pearson/Fasta, a common format used by Fasta programs and others
  • Zuker format, limited use.
  • Olsen, format printed by Olsen VMS sequence editor.
  • Phylip3.2, sequential format for Phylip programs
  • Phylip, interleaved format for Phylip programs (v3.3, v3.4)

Your sequences may contain a certain number of characters unknown to GeneFisher which are automatically converted into gap characters '-'. If the amount of illegal characters exceed 10% of the input size, your query is not processed.



Primer parameters

The following options specify valid ranges of values for PCR primers: Length, GC-content, Temperature.

Melting Temperature Tm

For oligos of 12 bases size and above GeneFisher calculates the melting temperature for degenerate primer oligos using an enhanced nearest neighbor (NN) approach, based on "A unified view of polymer, dumbbell, and oligonicleotide DNA nearest-neightbor thermodynamics.", John SantaLucia, et. al., 1996.


                     DeltaH * 1000
     Tm [°C] = ------------------------ - 273.15 + 16.6log[Salt]
                (A+DeltaS) + Rln(Ct/4)

  • DeltaH is sum of nearest neighbor (NN) enthalpy changes; [cal/mol].
  • A is the constant value for helix inition. -10.8 for non self-complementary sequences and -12.4 for self complementary sequendes; [cal/(K mol)].
  • DeltaS is sum of NN entropy change;s [cal/(K mol)].
  • R is molar gas constant, 1.987 [cal/(K mol)].
  • Ct total molar concentration of strands. For non self-complementary sequences we use Ct/4, for self-complementary Ct. GeneFisher uses a fixed Ct value of 250 pM oligo concentration.
  • Salt is the salt concentration adjustment constant. A fixed value of 1 M salt concentration is used since the NN parameters were determined at 1M NaCl.

For primers smaller than 12 bases we use a standard approximation calculation:

    Tm = (a*A + t*T) * 2 + (g*G + c*C) * 4 
where a, c, g, t are the occurrences of the bases A, T, G, C in the sequence.

Multiple occurrences

For each primer we check if it occurs more than once within the sequence submission. You can set the number of occurrences allowed for priming sites.

Max. Primer degeneracy

If the degeneracy values reached this threshold, the priming site is rejected.

PCR distance (Product size)

Specifies the minimal/maximal length of the desired PCR product. This option takes effect when we compute possible primer pairs.

Temperature difference in pair

Specifies the maximal temperature difference allowed between two primers when computing primer pairs.



Consensus Parameters

To create the consensus string for a given IUB sequence alignment, we take each column of the alignment and count the occurrences of the 4 nucleotides of the DNA alphabet.

Thus finding an 'A' increases the counter for adenin, finding a 'y' increases the counters for both Cytosine and Thymine. The accumulated percent data findings are then filtered according to a number of criteria:



Backtranslation Parameters

This section is relevant for Amino Acid input only.

  • Maximum Redundancy All synonymous tripletts which code a certain amino acid are contained the in backtranslation result. This is also known as "most ambiguous backtranslation".

  • Codon Usage All organisms have preferential biases in codon usages. This information can be used to our advantage in deciding which codons to chose out of all of the possible choices.

    For amino acid input, GeneFisher offeres a Java applet which allows interactively to select a codon usage table.

    This is from the Codon Usage Java Applet Page:

    Table Description

    Amino: Three-letter code for an amino acid.
    Codon: Unambiguous codons for that amino acid.
    Count: Number of occurrences of a codon in the genetic data from which the
    table was compiled.
    /1000: Expected percentage of occurrences of that codon per 1,000 codons in genes whose codon usage is identical to that compiled in the codon frequency table.
    Fraction: Expected occurrence of a codon in its synonymous codon family.

    Example for the Gly synonymous family:
    Gly GGG 13 1.89 0.02
    Gly GGA 3 0.44 0.00
    Gly GGT 365 52.99 0.59
    Gly GGC 238 34.55 0.38

    o Total Number of codons in the genomic data used to compile this table:
    N=6888

    o Absolute frequency for codon GGG in N:
    n(GGG)=13
    (13 GGG codons were counted in the genomic data)


    o Relative frequency for GGG in 1000 codons:
    p1000(GGG) = 13*1000/6888
    = 1.89
    (looking at 1000 codons, we expect to find 1.89 GGG codons)


    o Relative frequency for GGG in the Gly family:
    pF1000(GGG) = n/Sum(n(Gly\"s))
    = 0.021
    (within the Gly family we expect to find 2.1 percent GGG codons)



    Max. Allowed Cutoff slider

    The initial value is the minimum of the maximum fraction values from each group of synonymous codons.
    Choose smaller values to include more codons in the Triplett Lookup Table, which is used for converting amino acids into nucleotide tripletts. Note that this increases the degeneracy of the backtranslation.



The IUB (IUPAC) Code

IUBT G C AMeaningOrigin of destination
-0 0 0 0--
a0 0 0 1AAdenine
c0 0 1 0CCytosine
m0 0 1 1C, AaMino
g0 1 0 0GGuanine
r0 1 0 1G, ApuRine
s0 1 1 0G, CStrong interaction (3 H bonds)
v0 1 1 1G, C, Anot-T (not-U), V follows U
t1 0 0 0TThymine
w1 0 0 1T, AWeak interaction (2 H bonds)
y1 0 1 0T, CpYrimidine
h1 0 1 1T, C, A not-G, H follows G in the alphabet
k1 1 0 0T, GKeto
d1 1 0 1T, G, Anot-C, D follows C
b1 1 1 0T, G, Cnot-A, B follows A
n1 1 1 1T, G, C, AaNy



3' clamp parameters

A special evaluation is performed for a 3' terminal region (clamp) of a specified length.

  • Max. 3' length: Length of 3' clamp which is evaluated seperately.
  • Max. 3' degeneracy: Max. degeneracy of 3' clamp.
  • Min. GC content: Minimal GC content of 3' clamp.
  • Max. GC content: Maximal GC content of 3' clamp.



Info on Logging

To enable optimization of the GeneFisher project, certain user input data are logged for later evaluation. Of course, no sequence data will be published or used other than for our software design matters.



Rejection statistics

The rejection statistics table shows why possible primer positions were rejected.