GenBank Data Format
Each GenBank entry includes a concise description of the sequence, the scientific name and taxonomy of the source organism, and a table of features that identifies coding regions and other sites of biological significance, such as transcription units, sites of mutations or modifications, and repeats. Protein translations for coding regions are included in the feature table. Bibliographic references are included along with a link to the Medline unique identifier for all published sequences. Each sequence entry is composed of lines. Different types of lines, each with their own format, are used to record the various data that make up the entry.
The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which:
  • perform a biological function,
  • affect or are the result of the expression of a biological function,
  • interact with other molecules,
  • affect replication of a sequence,
  • affect or are the result of recombination of different sequences,
  • are a recognizable repeated unit,
  • have secondary or tertiary structure,
  • exhibit variation, or have been revised or corrected.
General structure of a Genbank entry:
  • LOCUS: Short name for this sequence (Maximum of 32 characters).
  • DEFINITION: Definition of sequence (Maximum of 80 characters).
  • ACCESSION: accession number of the entry.
  • VERSION: Version of the entry.
  • DBSOURCE: Shows the source, the date of creation and last modification of the database entry.
  • KEYWORDS: Keywords for the entry.
  • AUTHORS: Authors for the work.
  • TITLE: Title of the publication.
  • JOURNAL: Journal reference for the entry.
  • MEDLINE: Medline ID.
  • COMMENT: Lines of comments.
  • SOURCE ORGANISM: The organism from which the sequence was derived.
  • ORGANISM: Full name of organism (Maximum of 80 characters).
  • AUTHORS: Authors of this sequence (Maximum of 80 characters).
  • ACCESSION: ID Number for this sequence (Maximum of 80 characters).
  • FEATURES: Features of the sequence.
  • ORIGIN: Beginning of sequence data.
  • // End of sequence data.
For more detailed information check out the description of a sample Genbank record.
LOCUS       MMU22421                2235 bp    DNA     linear   ROD 23-MAR-1995
DEFINITION  Mus musculus obesity protein (ob) gene, complete cds.
VERSION     U22421.1  GI:726296
SOURCE      Mus musculus (house mouse)
  ORGANISM  Mus musculus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia;
            Sciurognathi; Muridae; Murinae; Mus.
REFERENCE   1  (bases 1 to 2235)
  AUTHORS   Chehab,F.F. and Lim,M.E.
  TITLE     Genomic organization and sequence of the mouse obesity gene
  JOURNAL   Unpublished (1995)
REFERENCE   2  (bases 1 to 2235)
  AUTHORS   Chehab,F.F. and Lim,M.E.
  TITLE     Direct Submission
  JOURNAL   Submitted (09-MAR-1995) Farid F. Chehab, Laboratory Medicine,
            University of California, San Francisco, 505 Parnassus Avenue, San
            Francisco, CA 94143-0134, USA
FEATURES             Location/Qualifiers
     source          1..2235
                     /organism="Mus musculus"
                     /mol_type="genomic DNA"
     gene            join(1..144,1876..2235)
     CDS             join(1..144,1876..2235)
                     /product="obesity protein"
     intron          145..1875
     repeat_region   449..585
     misc_feature    1876..1879
                     /note="slippage of acceptor site results in inclusion or
                     exclusion of glutamine at amino acid position 49"
        1 atgtgctgga gacccctgtg tcggttcctg tggctttggt cctatctgtc ttatgttcaa
       61 gcagtgccta tccagaaagt ccaggatgac accaaaaccc tcatcaagac cattgtcacc
      121 aggatcaatg acatttcaca cacggtagga gtcttatggg gggacaaaga tgtaggacta
      181 gaaccagagt ctgagaaaca tgtcatgcac ctcctagaag ctgagagttt ataagcctcg
      241 agtgtacatt atctctggtc atggctcttg tcactgctgc ctgctgaaat acagggctga
      301 gtggttccat ttctaaaccc agcatctaga ctgctcagct gtactgccag tatcgcatga
      361 ttctaatcct aagccacctt agggaattta acttctctct tatactccca ttaagaaacc
      421 ataaggtgtc gggcgtggtg gcacatgccc tctaatccca gaactcggga ggcagaggca
      481 ggtggatttc tgagttcaag gccagcctgg tctacaaaat gagttccagg acagccaggg
      541 ctatacagag aaaccctgtc tcgaaaaacc aaaaaagaag ccataaggtt ctttgatatc
      601 ataaggccat gctcattttc cctctgccac aggaaaccca gcccttggtg gctagctgag
      661 catgtaaggt acacatcaga cctgggagaa cctgggttcc tccctgcttc cacagaccac
      721 cctctcccct tccttagccc cctgtttctg cctctctcat tctctttcat ccatgaaact
      781 acttccttga atttagtacc cagggcataa gaatccctaa aggtcatggt gtcccattga
      841 cacgtggaca gcttcccagc agtgtctcta ctgggcagga ggagcagtag gtttctaatg
      901 gtttttagct acagcttctg cccaccgctc acccactttt caaagtcaca cagaaaacaa
      961 cctttccctc ctttacaacc agtccttgtg tagctgctga tagtggtcgg tgcccaccat
     1021 gttcttcctc cgaggcccag cagcctacat cttcagccat ttcctcagat gtatctaagc
     1081 tatgtgcata tcaccatatc tgcttcccat ctgcaagatc ttaggccagt tctccggtgg
     1141 gttttaaacc ttgattttac catcttgatg agggagacat catatcatat caccaagttg
     1201 ttctaaggct taaatggggt gtagtgaaag actttcttgt ggagccatct ggagactact
     1261 atgtctcctg accagtgtgc gtgtctcaca gtgtggcctt ggcagctagg agaagtcaga
     1321 tattcagaat caagggacag cttaatataa gagacttatg cggagaaagt tctcatcatc
     1381 tctcgacaag agtcatcagg gctgcacatg gagaggccca actacccaaa tgtgggtgga
     1441 aatgagagga agccagtggg gaaagccctt cctggtaacc agactcagca gagtgggggg
     1501 ggggggcacg gctttgaccc taatgaggga gaaccacaga agagtatgac taggagggag
     1561 agatctgata agggcaggag gctagagaga atataaggaa taaagagcta tggctggttc
     1621 ttcacggata tcattggaga aaggaattac tcaagactaa tcagaagtga agggtggagt
     1681 gactcggaat gatcagaaag tccgggagac cagctccgtg gcttccagtc agctgatgac
     1741 aggaagtaag gacctggacc aggaaggtga gaaggaagga ggtagcccag gttcacagat
     1801 gtaatgtaga gctctggagc ccgatgctcc ctgccacttg ctaaaacacc tcttgttctt
     1861 cttcctcctc catatcagtc ggtatccgcc aagcagaggg tcactggctt ggacttcatt
     1921 cctgggcttc accccattct gagtttgtcc aagatggacc agactctggc agtctatcaa
     1981 caggtcctca ccagcctgcc ttcccaaaat gtgctgcaga tagccaatga cctggagaat
     2041 ctccgagacc tcctccatct gctggccttc tccaagagct gctccctgcc tcagaccagt
     2101 ggcctgcaga agccagagag cctggatggc gtcctggaag cctcactcta ctccacagag
     2161 gtggtggctt tgagcaggct gcagggctct ctgcaggaca ttcttcaaca gttggatgtt
     2221 agccctgaat gctga
