glossary of terms used by enclone
┌────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│V..J │ the full sequence of a V(D)J transcript, from the beginning of the V │
│ │ segment to the end of the J segment; this sequence begins with a stop codon│
│ │ and ends with a partial codon (its first base) │
│CDR3 │ The terms CDR3 and junction are commonly mistaken and often │
│ │ used interchangeably. In enclone's nomenclature, "CDR3" │
│ │ actually refers to the junction (the CDR3 loop plus the │
│ │ canonical C and W/F at the N and C termini respectively). │
├────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│clonotype │ all the cells descended from a single fully rearranged T or B cell │
│ │ (approximated computationally) │
│exact subclonotype │ all cells having identical transcripts ○ │
│ │ (every clonotype is a union of exact subclonotypes) │
│clone │ a cell in a clonotype, or in an exact subclonotype │
├────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│onesie │ a clonotype or exact subclonotype having exactly one chain │
│twosie │ a clonotype or exact subclonotype having exactly two chains │
│threesie │ a clonotype or exact subclonotype having exactly three chains; │
│ │ these frequently represent true biological events, arising from expression │
│ │ of both alleles │
│foursie │ a clonotype or exact subclonotype having exactly four chains; │
│ │ these very rarely represent true biological events │
│moresie │ a clonotype having more than four chains; │
│ │ these sad clonotypes do not represent true biological events │
├────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│donor │ an individual from whom datasets of an origin are obtained │
│origin │ a tube of cells from a donor, from a particular tissue at a │
│ │ particular point in time, and possibly enriched for particular cells │
│cell group │ an aliquot from an origin, presumed to be a random draw │
│dataset │ all sequencing data obtained from a particular library type │
│ │ (e.g. TCR or BCR or GEX or FB), from one cell group, processed by running │
│ │ through the Cell Ranger pipeline │
└────────────────────┴─────────────────────────────────────────────────────────────────────────────┘
○ The exact requirements for being in the same exact subclonotype are that cells:
• have the same number of productive contigs identified
• that these have identical bases within V..J
• that they are assigned the same constant region reference sequences
• and that the difference between the V stop and the C start is the same
(noting that this difference is nearly always zero).
Note that we allow mutations within the 5'-UTR and constant regions.
conventions
• When we refer to "V segments", we always include the leader segment.
• Zero or one? We number exact subclonotypes as 1, 2, ... and likewise with
chains within a clonotype, however DNA and amino-acid positions are numbered starting at zero.