parseable output
The standard output of enclone is designed to be read by humans, but is not readily parseable by
computers. We supplement this with parseable output that can be easily read by computers.
The default behavior for this is to generate a CSV file having "every possible" field (over a
hundred). We also provide an option to print only selected fields, and some options which enable
inspection, short of generating a separate CSV file.
Parseable output is targeted primarily at R and Python users, because of the ease of wrangling CSV
files with these languages.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃Parseable output is invoked by using the argument ┃
┃POUT=filename ┃
┃specifying the name of the file that is to be written to. ┃
┃ The filename "stdout" may be used for a preview; in that case parseable output is generated ┃
┃ separately for each clonotype and the two output types are integrated. There is also ┃
┃ "stdouth", which is similar, but uses spaces instead of commas, and lines things up in columns.┃
┃By default, we show four chains for each clonotype, regardless of how many chains it ┃
┃has, filling in with null entries. One may instead specify n chains using the argument ┃
┃PCHAINS=n ┃
┃and if you use max in place of n, then the maximum value for your dataset will be used. ┃
┃The parseable output fields may be specified using ┃
┃PCOLS=x1,...,xn ┃
┃where each xi is one of the field names shown below. ┃
┃The argument PNO_HEADER may be used to suppress the CSV header line. ┃
┃If you use POUT, the PCOLS option reduces run time and memory usage, and prevents voluminous ┃
┃output. Please use it! ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Over time additional fields may be added and the order of fields may change.
There is an alternate parseable output mode in which one line is emitted for each cell, rather
then each exact subclonotype. This mode is enabled by adding the argument PCELL to the command
line. Each exact subclonotype then yields a sequence of output lines that are identical except as
noted below.
If you want to completely suppress the generation of visual clonotypes, add NOPRINT to the enclone
command line.
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃FASTA output. This is a separate feature. To generate nucleotide FASTA output for each chain in ┃
┃each exact subclonotype, use the argument FASTA=filename. The special case stdout will cause the ┃
┃FASTA records to be shown as part of standard output. The FASTA records that are generated are of┃
┃the form V(D)JC, where V is the full V segment (including the leader) and C is the full constant ┃
┃region, copied verbatim from the reference. If a particular chain in a particular exact ┃
┃subclonotype is not assigned a constant region, then we use the constant region that was assigned ┃
┃to the clonotype. If no constant region at all was assigned, then the FASTA record is omitted. ┃
┃Similarly, FASTA_AA=filename may be used to generate a matching amino acid FASTA file. ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
───────────────────────
parseable output fields
───────────────────────
See also enclone help lvars
, enclone help cvars
, and the inventory of all variables at
https://10xgenomics.github.io/enclone/pages/auto/inventory.html.
1. per clonotype group fields
┌──────────────┬──────────────────────────────────────────┐
│group_id │ identifier of clonotype group - 0,1, ...│
├──────────────┼──────────────────────────────────────────┤
│group_ncells │ total number of cells in the group │
│ │ (cannot be used in linear conditions) │
└──────────────┴──────────────────────────────────────────┘
2. per clonotype fields
┌──────────────┬────────────────────────────────────────────────────────────────┐
│clonotype_id │ identifier of clonotype within the clonotype group = 0, 1, ...│
└──────────────┴────────────────────────────────────────────────────────────────┘
3. per chain fields, where <i> is 1,2,... (see above)
each of these has the same value for each exact clonotype
┌──────────────────────┬──────────────────────────────────────────────────────────────┐
├──────────────────────┼──────────────────────────────────────────────────────────────┤
│var_indices_dna<i> │ DNA positions in chain that vary across the clonotype │
│var_indices_aa<i> │ amino acid positions in chain that vary across the clonotype│
│share_indices_dna<i> │ DNA positions in chain that are constant across the │
│ │ clonotype, but differ from the donor ref │
│share_indices_aa<i> │ amino acid positions in chain that are constant across the │
│ │ clonotype, all of these are comma-separated lists but differ│
│ │ from the donor ref │
└──────────────────────┴──────────────────────────────────────────────────────────────┘
4. per exact subclonotype fields
┌───────────────────────┬─────────────────────────────────────────────────────────────────────────┐
│exact_subclonotype_id │ identifer of exact subclonotype = 1, 2, ... │
├───────────────────────┼─────────────────────────────────────────────────────────────────────────┤
│barcodes │ comma-separated list of barcodes for the exact subclonotype │
│<dataset>_barcodes │ like "barcodes", but restricted to the dataset with the given name │
│barcode │ if PCELL is specified, barcode for one cell │
│<dataset>_barcode │ if PCELL is specified, barcode for one cell, or null, if the barcode is│
│ │ not from the given dataset │
├───────────────────────┴─────────────────────────────────────────────────────────────────────────┤
│In addition, every lead variable may be specified as a field. See enclone help lvars
. │
└─────────────────────────────────────────────────────────────────────────────────────────────────┘
5. per chain, per exact subclonotype fields, where <i> is 1,2,... (see above)
[all apply to chain i of a particular exact clonotype]
┌──────────────┬────────────────────────────────────────────────────────────────────────────┐
│vj_seq<i> │ DNA sequence of V..J │
│vj_seq_nl<i> │ DNA sequence of V..J, but starting after the leader │
│vj_aa<i> │ amino acid sequence of V..J (excludes last base, in incomplete codon) │
│vj_aa_nl<i> │ amino acid sequence of V..J (excludes last base, in incomplete codon), │
│ │ but starting after the leader │
│seq<i> │ full DNA sequence │
├──────────────┼────────────────────────────────────────────────────────────────────────────┤
│var_aa<i> │ amino acids that vary across the clonotype (synonymous changes included) │
├──────────────┴────────────────────────────────────────────────────────────────────────────┤
│In addition, every chain variable, after suffixing by <i>, may be used as a field. However│
│parametrizable chain variables e.g. ndiff1vj1 must be explicitly listed using PCOLS; │
│they are not in the default list. See enclone help cvars
. │
└───────────────────────────────────────────────────────────────────────────────────────────┘