enclone provides several mechanisms for creating, displaying, and exporting a phylogenetic
tree for each
clonotype. These are initial mechanisms, which are likely to be expanded and/or improved over
time in response to feedback. The initial implementation is inspired by the Levenshtein-NJ method described by Yermanos et al. 2017. For all of these mechanisms, we recommend using the argument
COMPLETE
to remove exact subclonotypes that are missing 1 or more chains.
Method 1.
This method is invoked using the argument TREE
, or TREE=v1,...vn
,
where the vi
are parseable variables.
The method first defines defines the distance between any two exact subclonotypes to be their
Levenshtein distance. We then add a root "virtual" exact subclonotype which equals the donor
reference away from the recombination region and which is undefined within that region (i.e. a
germline-reverted exact clonotype without the junction). The distance from the root to any
actual exact subclonotype is the Levenshtein distance, away from the region of recombination.
Next a tree is creating from these data using the neighbor joining algorithm. This sometimes yields negative distances, which we change to zero. We have only observed such negative distances on the edge emanating from the root.
Note that for a given clonotype, the neighbor joining algorithm is
O(n4), where n is the number of exact subclonotypes in the clonotype. Thus
for sample types having highly complex clonotypes (e.g. with ~1000 subclonotypes), the
algorithm will be very slow. Of course for such cases, the tree would be so large that it would
be difficult to do anything with it. You could exclude such clonotypes e.g. with
MAX_CELLS=100
.
Finally, the tree is visualized using plain text, as shown in the example below. The
added field dref
shows the distance of each exact subclonotype from the
donor reference, away from the recombination region.
enclone BCR=123085 TREE COMPLETE CDR3=CARDQNFDESSGYDAFDIW LVARSP=dref
[1] GROUP = 1 CLONOTYPES = 42 CELLS
[1.1] CLONOTYPE = 42 CELLS
ββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β β CHAIN 1 β CHAIN 2 β
β β 159|IGHV3-7 β 53|IGHJ3 β 376|IGLV5-37 β 313|IGLJ3 β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ€
β β 11 1111111111111111111 β 11 11111111111 β
β β 23344445667777788999901 1111112222222222333 β 23556679911 11111122222 β
β β 22324893380156725357903 4567890123456789012 β 17068902403 45678901234 β
β β ββββββββCDR3βββββββ β ββββCDR3βββ β
βreference β LPGAGSSSLNKQEKYVRANLLQβ¦ β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦W β VRSYLYYAAAY CMIWβ¦β¦β¦β¦β¦β¦β¦ β
βdonor ref β LPGAGSSSLNKQEKYVRANLLQβ¦ β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦W β VRSYLYYAAAY CMIWβ¦β¦β¦β¦β¦β¦β¦ β
ββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ€
β# n dref β xxxxxxxxxxxxxxxxxxxxxxx .x......x........x. u const β xxxxxxxxxxx ........... u constβ
β1 10 8 β LPGAGSSSPNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW 2323 IGHG1 β VRGYLYYAAAY CMIWPSNAWVF 14585 IGLC2β
β2 8 9 β LPGAGSSSLNKEEKYMRANLLQY CARDQNFDESSGYDAFDIW 3121 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 11855 IGLC2β
β3 6 9 β LPGAKSNSLNKEQKYVRANLLQY CARDQNFDESSGYDAFDIW 1171 IGHG1 β VRSYLYYTAAY CMIWPSNAWVF 7613 IGLC2β
β4 5 4 β LPGAGSSSLNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW 11 IGHG1 β VRSYLYYAGAY CMIWPSNAWVF 26 IGLC2β
β5 2 8 β LPGAGSSSLNKEEKYMRANLLQY CARDQNFDESSGYDAFDIW 1005 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 4779 IGLC2β
β6 1 8 β LPGAKSNSLNKEQKYVRANLLQY CARDQNFDESSGYDAFDIW 5862 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 14144 IGLC2β
β7 1 7 β LPGAGSNSLNKEEIYVRANLLEY CTRDQNFDESSGYDAFDIW 4384 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 14621 IGLC2β
β8 1 3 β LPGAGSSSLNKEEKYVRANLLQY CARDQNFDDSSGYDAFDIW 3890 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 15577 IGLC2β
β9 1 3 β LPGAGSSSLNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW 3302 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 5256 IGLC2β
β10 1 9 β LPGAGSSSPNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW 3067 IGHG1 β VRGYLYYAAAY CMIWPSNAWVF 6429 IGLC2β
β11 1 13 β LPGAGRNSLNKEEKYVRGNLLQY CARDQNFDESSGYDAFDIW 2724 IGHG3 β VRGYLYYAAAY CMIWPSNAWVF 5775 IGLC2β
β12 1 9 β LPGAGSSSLNKEEKYMRANLLQY CARDQNFDESSGYDAFDIW 2578 IGHA1 β VRSYLYYAAAY CMIWPSNAWVF 14497 IGLC2β
β13 1 7 β LPGAGSSSPNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW 839 IGHG1 β VRGYLYYAAAY CMIWPSNAWVF 7816 IGLC2β
β14 1 7 β LPGAGSSSLNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW 404 IGHG1 β VRGYLYYAAAY CMIWPSNAWVF 3456 IGLC2β
β15 1 1 β LPGAGSSSLNKQEKYVRANLLQY CARDQNFDESSGYDAFDIW 136 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 1023 IGLC2β
β16 1 9 β LPGAGSNSLNKEEIYVRANLLEY CTRDQNFDESSGYDAFDIW 96 IGHG1 β VRSYLYYAAAY CMIWPSNAWVF 1762 IGLC2β
ββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββ
β’
ββ β’ [0.00]
β ββββββββ 15 [1.33]
ββββ β’ [0.50]
β ββββββββββββ β’ [2.00]
β β ββββββ 4 [1.00]
β ββ 9 [0.00]
βββ β’ [0.34]
β ββββββββββββββββββββββββββββββββ β’ [5.53]
β β ββ β’ [0.34]
β β β ββββ β’ [0.61]
β β β β βββ 1 [0.50]
β β β ββββββββββ 10 [1.50]
β β βββ 13 [0.39]
β βββββ 14 [0.66]
ββ β’ [0.10]
β βββββββββββββββ 8 [2.65]
ββββ β’ [0.55]
β βββββββββββββββββββββ β’ [3.58]
β β βββββββββββββββββββββββ β’ [4.00]
β β β βββββ β’ [0.79]
β β β β β 2 [0.00]
β β β ββ 12 [0.00]
β β ββ 5 [0.21]
β βββββββββββββββββββββββββββββββββββββββββββββββ 11 [8.00]
ββ β’ [0.17]
β βββββββββββββββββββββββββββββββββββββββ β’ [6.70]
β β ββββββ 3 [1.00]
β ββ 6 [0.00]
βββββββββββββββββββββββββββββββββββββ β’ [6.30]
β β 7 [0.00]
βββββββββββββ 16 [2.00]
Method 2.
This method is invoked using the argument NEWICK
, and is exactly like
method 1, except that it outputs the resulting tree in
Newick format.
For example, running enclone BCR=123085 NEWICK COMPLETE CDR3=CARDSWYSSGRNTPNWFDPW
will generate the following Newick tree for the largest clonotype:
(((4:0.00,(11:0.00,19:4.00)I4:1.00)I7:0.66,(6:0.80,((12:0.04,14:6.96)I6:0.96,(((((2:0.00,18:0.00)I1:0.94,5:0.06)I2:0.50,17:0.50)I3:5.95,(3:0.04,(8:0.00,9:1.00)I5:0.96)I8:0.55)I11:0.47,(((7:0.02,16:0.98)I9:0.04,20:2.96)I10:0.97,(1:0.00,(10:2.00,(13:1.00,15:0.00)I19:0.00)I18:0.00)I17:0.03)I16:0.00)I15:0.03)I14:0.17)I13:0.20)I12:0.00)0;
This tree can be copied and pasted or otherwise exported to be viewed in tools such as iTOL.
Method 3.
This method is invoked using the argument CLUSTAL_DNA=filename
or
CLUSTAL_AA=filename
, where
filename
can be stdout
, and otherwise must have the extension
".tar". It does not generate a tree, but instead
generates a CLUSTALW
alignment for each clonotype (using either bases or
amino acids), with one sequence for each exact subclonotype.
This sequence is the concatenation of the per-chain sequences, with the appropriate number
of gap (-) characters shown if a chain is missing. As above, we recommend using the
COMPLETE
option to avoid this happening.
If filename
is stdout
, then the alignments are printed out
after each clonotype picture. Otherwise, a tar file is generated, which if untarred yields
one file per clonotype. To avoid confusion, it would be best for filename
to have
the suffix .tar
. We also recommend using MIN_CELLS=...
or some other
argument to restrict the number of files that would be generated upon untarring.
This method can be used to provide input to another program that will generate a tree.
Method 4.
This method is invoked using the argument PHYLIP_DNA=filename
or
PHYLIP_AA=filename
, and is just like method 3, except for the output format.