Login Logged in as anonymous / My BiBiServ / Logout
Navigation
TCRProfiler
Welcome
Download
File Formats
Manual
References
TCRProfiler has four main objectives:
  1. Process high-throughput (454/Roche) sequenced T-cell receptor (TCR) alpha and/or beta chains (nucleotide sequences) by discriminating the major features of a specific TCR chain, as there are the rearranged V, (D), and J genes, junctional characteristics like P-, and N-nucleotides, a s well as the complementarity determining region three (CDR3).
  2. Handle errors in the analyzed sequences, introduced by the sequencing process, by using sequencer specific quality values to improve result reliability.
  3. Produce probe-specific "profiles" by generating statistics over the used germline genes (combinatorial diversity), CDR3 length polymorphism and P-, and N-nucleotide usage (junctional diversity).
  4. Generate visualization files for CDR3 length polymorphism under consideration of the recombined V-J gene combinations.
Note: TCRProfiler can analyze either TCR alpha or beta chains or both chain types in one analysis. Processing huge amounts of sequence data takes a while, especially when one chooses to perform the analysis on a single CPU desktop computer.

Prerequisites for analyzing TCR chains:

Running the program requires the following basic input:
  • TCR chain sequences in a multiple fasta file. For example: probename454Reads.fna.
  • The corresponding sequencer specific quality values in a quality file. For example: probename454.qual.
  • Basic input parameters are described below or in the help of TCRProfiler itself. To prompt TCRProfilers help please run the command-line call: java -jar TCRProfiler.jar -h

Running TCRProfiler

Download the program from the download section and start it via the command-line call java -jar. For a list of input parameters use -h as a parameter. TCRProfiler requires at least the following parameters:
  • -p the absolute path where to put TCRProfiler results.
  • -fF the absolute path of the multiple fasta file containing the sequences to analyze.
  • -qF the absolute path of the file containing the corresponding quality values.
  • A forward sequencing primer (sequence 5' to 3'): -lap/-lbp (left [alpha/beta] primer) if it is a usual primer, -wa/-wb (wobble [alpha/beta] primer) if it is a "wobble" primer (short primer motif). Wobble primers are usually thought to be characterized by a common nucleotide sequence motive, for example TGGTA whereas thre rest of the primer bases can alter for each TCR V gene.
  • A reverse sequencing primer: -rap/-rbp (right [alpha/beta] primer, sequence 5' to 3').

Note: if only one chain type, e.g. alpha or beta is analyzed and this preference is not determined by the input parameters (-aC/-bC) the programm will perform the primer type analysis for alpha and/or beta locus origin prior to sequence feature characterization for each input sequence. This could slightly reduce the runtime speed, so parameter choice is adviced.

An example command-line call to start TCRProfiler could look like this: java -jar TCRProfiler.jar -bC -p /root/usrhomes/TCRresults/ -fF /root/usrhomes/TCRsources/454Reads.fna -qF /root/usrhomes/TCRsources/454Quals.qual -wb TGGTA -rbp CACAGCGACCTCGGG

Results

  • Analysed_[Alpha/Beta]_Sequences.fna contains the analyzed sequences in a multiple fasta format. The fasta header contains information of the sequence characteristics, and each entry in the header is delimited by '|'. Information contains the identified germline rearranged V-,(D-), and J gene with the number of matches of the sequence to the identified germline reference, the alignment score and the number of errors (mismatch, insertion or deletion) to the reference.
    For example:
    • 'TRBV7-3*01 : 156, 1210.0 (e:7) | TRBD1*01 : 8 | TRBJ2-1*02 : 34, 205.0 (e:5) | TRBC1*01 :33 (e:0)' means TCR beta V gene (family 7, subfamily 3 allele 01), identified with 156 matches to the germline reference, with an alignment score of 1210.0 and seven errors, TCR beta D gene (family 1, allele 01) identified with eight matches to the reference gene, TCR beta J gene (family 2, subfamily 1, allele 02) identified with 34 matches, alignment score 205.0, and five errors to the reference, and TCR beta C gene (family 1, allele 01) identified with 33 matches and zero errors.
    • CDR3 sequence on nucleotide level
    • '-v' V gene nucleotides nibbled at the 3' V gene end
    • '-j' J gene nucleotides nibbled at the 5' J gene
    • -d5' D gene nucleotides nibbled at the 5' D gene end
    • '-d3' D gene nucleotides nibbled at the 3' D gene end
    • 'n1 :' N1 region sequence
    • 'n2 :' N2 region sequence
    • 'vPal = ' V gene P-Nucleotides
    • 'd5Pal =' D gene 5' P-nucleotides
    • 'd3Pal = ' D gene 3' P-nucleotides
    • 'jPal =' J gene P-Nucleotides
    • CDR3 amino acid sequence and its length in amino acids
    • indication of CDR3 being in an open reading frame (ORF)
    • indication of CDR3 containing a stop codon or not
    • indication of CDR3 being identified with mutations in the CDR3 flanking sequence motifs
    • Number of reads with this TCR sequence in the repertoire (clonotype count)
    All informations are easily to parse for further usage using the delimiter | to seperate them.
  • Statistics_[Alpha/Beta]_Result.fna contains the statistics of overall V, D, and J gene usage, V-J gene combination frequencies, CDR3 length distribution as well as error counts (with respect to the identified reference) for V and J genes. The later can be used to make estimates of error frequencies in specific sequencing runs.
  • [Alpha/Beta]_CDR3_nt_Distrib.txt CDR3 observation count on nucleotide sequence level.
  • [Alpha/Beta]_CDR3_aa_length_Distrib.txt CDR3 observation count on amino acid sequence level.
  • [Alpha/Beta]_Junctions_Statistics.txt Statistics concerning the CDR3 junction. N-/P-nucleotide count. N-Region length distribution depending on V, D, and J gene usage.
  • [Alpha/Beta]_Unique_Nucleotides.viz Contains information to be used with the visualization tool TCRViz.jar. This visualization shows the CDR3 length polymorphism under consideration of the recombined V and J genes of all unique nucleotide sequences (Clonotypes, sequences with a unique combination of V, J gene and CDR3 nucleotide sequence).
  • [Alpha/Beta]_Nucleotides.viz Contains information to be used with the TCRViz visualization tool. TCRViz visualization shows the CDR3 length polymorphism under consideration of the recombined V and J genes of all analysed sequences.
  • [Alpha/Beta]_Protein.viz Contains information to be used with the visualization tool TCRViz.jar. This visualization shows the CDR3 length polymorphism under consideration of the recombined V and J genes of all unique amino acid sequences (Clonotypes, sequences with a unique V and J gene and CDR3 amino acid sequence).
  • Uncharacterized[Alpha/Beta]Seqs.fna Contains all sequences without identified CDR3, or partial rearrangement (identifications) - only V gene.
  • SeqsWithoutPrimer.fna Contains all sequences without any of the primer sequence.
.viz files can be visualized by using the TCRViz.jar Jar with the command line call java -jar TCRViz.jar. To access the visualization data simply use the Open a .viz file and the Do Visualisation buttons to visualize the repertoire. For a short description of the visualization file format see the file format page.