other options that control clonotype display
┌───────────────┬───────────────────────────────────────────────────────────────────────────────┐
│PER_CELL │ expand out each exact clonotype line, showing one line per cell, │
│ │ for each such line, displaying the barcode name, the number of UMIs assigned,│
│ │ and the gene expression UMI count, if applicable, under gex_med │
├───────────────┼───────────────────────────────────────────────────────────────────────────────┤
│BARCODES │ print list of all barcodes of the cells in each clonotype, in a │
│ │ single line near the top of the printout for a given clonotype │
├───────────────┼───────────────────────────────────────────────────────────────────────────────┤
│SEQC │ print V..J sequence for each chain in the first exact subclonotype, near │
│ │ the top of the printout for a given clonotype │
├───────────────┼───────────────────────────────────────────────────────────────────────────────┤
│FULL_SEQC │ print full sequence for each chain in the first exact subclonotype, │
│ │ near the top of the printout for a given clonotype │
├───────────────┼───────────────────────────────────────────────────────────────────────────────┤
│SUM │ print sum row for each clonotype (sum is across cells) │
│MEAN │ print mean row for each clonotype (mean is across cells) │
├───────────────┼───────────────────────────────────────────────────────────────────────────────┤
│DIFF_STYLE=C1 │ instead of showing an x for each amino acid column containing a difference, │
│ │ show a C if the column lies within a complementarity-determining region, │
│ │ and F if it lies in a framework region, and an L if it lies in the leader │
│DIFF_STYLE=C2 │ instead of showing an x for each amino acid column containing a difference, │
│ │ show a ◼ if the column lies within a complementarity-determining region, │
│ │ and otherwise show a ▮. │
├───────────────┼───────────────────────────────────────────────────────────────────────────────┤
│CONX │ add an additional row to each clonotype table, showing the amino acid │
│ │ consensus across the clonotype, with X for each variant residue │
│CONP │ add an additional row to each clonotype table, showing the amino acid │
│ │ consensus across the clonotype, with a property symbol whenever two different│
│ │ amino acids are observed, see enclone help cvars
│
├───────────────┼───────────────────────────────────────────────────────────────────────────────┤
│ALIGN<n> │ exhibit a visual alignment for chain n (for each exact subclonotype) to the │
│ │ donor V(D)J reference, picking the best D for heavy chains / TRB │
│ │ Multiple values of n may be specified using multiple arguments. │
│ALIGN_2ND<n> │ same as ALIGN<n> but use second best D segment │
│JALIGN<n> │ same as ALIGN<n> but only show the region from 15 bases before the end of the│
│ │ V segment to 35 bases into the J segment │
│JALIGN_2ND<n> │ same as JALIGN<n> but use second best D segment │
└───────────────┴───────────────────────────────────────────────────────────────────────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
options that control clonotype grouping
By default, enclone organizes clonotypes into groups, and each group contains just one clonotype!
We offer some options to do actual grouping, with the intention of reflecting functional
(antigen-binding) differences, but with many caveats because this is a hard problem.
These options are experimental. There are many natural extensions that we have not implemented.
enclone has two types of grouping: symmetric and asymmetric. Symmetric grouping creates
nonoverlapping groups, whereas asymmetric grouping creates groups that may overlap.
To turn on symmetric grouping, one uses a command of the form
GROUP=c1,...,cn
where each ci is a condition. Two clonotypes are placed in the same group if all the conditions
are satisfied, and that grouping is extended transitively.
In what follows, heavy chain means IGH or TRB, and light chain means IGK or IGL or TRA.
Here are the conditions:
┌───────────────────────┬───────────────────────────────────────────────────────────────────────┐
│vj_refname │ V segments have the same reference sequence name, │
│ │ and likewise for J segments │
│v_heavy_refname │ heavy chain V segments have the same reference sequence name │
│vj_heavy_refname │ heavy chain V segments have the same reference sequence name, │
│ │ and likewise for J segments │
│ │ (only applied to heavy chains) │
│vdj_refname │ V segments have the same reference sequence name, │
│ │ and likewise for D segments, computed from scratch, and J segments │
│vdj_heavy_refname │ V segments have the same reference sequence name, │
│ │ and likewise for D segments, computed from scratch, and J segments │
│ │ (only applied to heavy chains) │
├───────────────────────┼───────────────────────────────────────────────────────────────────────┤
│len │ the lengths of V..J are the same (after correction for indels) │
│cdr3_len │ CDR3 sequences have the same length │
│cdr3_heavy_len │ heavy chain CDR3 sequences have the same length │
│cdr3_light_len │ light chain CDR3 sequences have the same length │
├───────────────────────┼───────────────────────────────────────────────────────────────────────┤
│cdr3_heavy≥n% │ nucleotide identity on heavy chain CDR3 sequences is at least n% │
│cdr3_light≥n% │ nucleotide identity on light chain CDR3 sequences is at least n% │
│cdr3_aa_heavy≥n% │ amino acid identity on heavy chain CDR3 sequences is at least n% │
│cdr3_aa_light≥n% │ amino acid identity on light chain CDR3 sequences is at least n% │
│ │ (note that use of either of these options without at least one of the│
│ │ earlier options may be slow) │
│ │ (in both cases, we also recognize >= (with quoting) and ⩾) │
│ │ (all of the above options use Levenshtein distance) │
│cdr3_aa_heavy≥n%:h:@f │ given a file f containing 20 lines, │
│ │ each having 20 numbers, separated by single spaces, │
│ │ compute the Hamming distance between heavy chain amino acid │
│ │ sequences of the same length, weighted by the matrix defined by f, │
│ │ and require that percent identity is bounded accordingly │
├───────────────────────┼───────────────────────────────────────────────────────────────────────┤
│heavy≥n% │ nucleotide identity on heavy chain V..J sequences is at least n% │
│light≥n% │ nucleotide identity on light chain V..J sequences is at least n% │
│aa_heavy≥n% │ amino acid identity on heavy chain V..J sequences is at least n% │
│aa_light≥n% │ amino acid identity on light chain V..J sequences is at least n% │
│ │ (note that use of either of these options without at least one of the│
│ │ earlier options may be very slow) │
│ │ (in both cases, we also recognize >= (with quoting) and ⩾) │
│ │ (all of the above options use Levenshtein distance) │
└───────────────────────┴───────────────────────────────────────────────────────────────────────┘
To instead turn on asymmetric grouping, one uses the AGROUP option. To use this, it is in
addition necessary to define "center clonotypes", a "distance formula", and a "distance bound".
Each group will then consist of the center clonotype (which comes first), followed by, in order by
distance (relative to the formula), all those clonotypes that satisfy the distance bound (with
ties broken arbitrarily). For each clonotype in a group, we print its distance from the first
clonotype, and this is also available as a parseable variable dist_center.
Center clonotypes. These are in principle any set of clonotypes. For now we allow two options:
AG_CENTER=from_filters
which causes all the filters described at "enclone help filters" to NOT filter clonotypes in the
usual way, but instead filter to define the center, and
AG_CENTER=copy_filters
which effectively does nothing -- it just says that filters apply to all clonotypes, whether in
the center or not.
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│Please note that asymmetric grouping is very time consuming, and run time is roughly a linear│
│function of (number of center clonotypes) * (number of clonotypes). So it is advisable to │
│restrict the number of center clonotypes. │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
Distance formula. This could in principle be any function that takes as input two clonotypes and
returns a number. For now we allow only:
AG_DIST_FORMULA=cdr3_edit_distance
which is the "Levenshtein CDR3 edit distance between two clonotypes". This is the minimum, over
all pairs of exact subclonotypes, one from each of the two clonotypes, of the edit distance
between two exact subclonotypes, which is the sum of the edit distances between the heavy chains
and between the light chains.
Technical note. This is the explanation for the case where there are two chains of different
types. Here is the explanation for the "non-standard" cases. We take the sum, over all pairs of
heavy chains, one from each of the two exact subclonotypes, of the edit distance between the CDR3
sequences for the heavy chains, plus the same for light chains. Exact subclonotypes that lack a
heavy or a light chain are ignored by this computation. Also the distance between two clonotypes
is declared infinite if one of them lacks a heavy chain or one of them lacks a light chain.
Distance bound. For now we allow the following two forms:
AG_DIST_BOUND=top=n
which returns the top n clonotypes (plus the center), and
AG_DIST_BOUND=max=d
which returns all clonotypes having distance ≤ d from the center clonotype.
In addition, there are the following grouping options, for both the symmetric and asymmetric
cases:
┌─────────────────────┬──────────────────────────────────────────────────────────────────┐
│MIN_GROUP │ minimum number of clonotypes in group to print (default = 1) │
│MIN_GROUP_DONORS │ minimum number of donors for a group to be printed (default = 1)│
│GROUP_CDR3H_LEN_VAR │ only print groups having variable heavy chain CDR3 length │
│GROUP_CDR3=x │ only print groups containing the CDR3 amino acid sequence x │
│GROUP_DONOR=d │ only print groups containing a cell from the given donor; │
│ │ multiple instances may be used to jointly restrict │
│GROUP_NAIVE │ only show groups having an exact subclonotype with dref = 0 │
│GROUP_NO_NAIVE │ only show groups lacking an exact subclonotype with dref = 0 │
├─────────────────────┼──────────────────────────────────────────────────────────────────┤
│NGROUP │ don't display grouping messages │
└─────────────────────┴──────────────────────────────────────────────────────────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
options that display dataset variables
enclone has some variables that are computed for each dataset, and whose values may by printed as
a table in the summary, and not otherwise used (currently). These may be specified using
DVARS=var1,...,varn. The dataset-level variables that are supported currently are:
<feature>_cellular_r
<feature>_cellular_u
which are, respectively, the percentage of reads [UMIs] for the given feature that are in cells
that were called by the cellranger pipeline. A feature is e.g. IGHG1_g etc. as discussed at
enclone help lvars
. To compute the metrics, the cellranger output file per_feature_metrics.csv
is read. In addition, one may also use numeric values defined in the file
metrics_summary_json.json, but this file is in general not available. To get it, it may be
necessary to rerun the cellranger pipeline using --vdrmode=disable and then copy the json file to
outs. Finally, variable names may be prefaced with abbreviation:, and in such cases, it is the
abbreviation that is displayed in the table.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
options that control global variables
enclone has some global variables that can be computed, with values printed in the summary, and
not otherwise used (currently). These may be specified using GVARS=var1,...,varn. The global
variables that are supported currently are:
d_inconsistent_%
d_inconsistent_n
Please see https://10xgenomics.github.io/enclone/pages/auto/d_genes.html for more information.