enclone banner

Making phylogenetic trees

enclone provides several mechanisms for creating, displaying, and exporting a phylogenetic tree for each clonotype. These are initial mechanisms, which are likely to be expanded and/or improved over time in response to feedback. The initial implementation is inspired by the Levenshtein-NJ method described by Yermanos et al. 2017. For all of these mechanisms, we recommend using the argument COMPLETE to remove exact subclonotypes that are missing 1 or more chains.


Method 1. This method is invoked using the argument TREE, or TREE=v1,...vn, where the vi are parseable variables. The method first defines defines the distance between any two exact subclonotypes to be their Levenshtein distance. We then add a root "virtual" exact subclonotype which equals the donor reference away from the recombination region and which is undefined within that region (i.e. a germline-reverted exact clonotype without the junction). The distance from the root to any actual exact subclonotype is the Levenshtein distance, away from the region of recombination.

Next a tree is creating from these data using the neighbor joining algorithm. This sometimes yields negative distances, which we change to zero. We have only observed such negative distances on the edge emanating from the root.

Note that for a given clonotype, the neighbor joining algorithm is O(n4), where n is the number of exact subclonotypes in the clonotype. Thus for sample types having highly complex clonotypes (e.g. with ~1000 subclonotypes), the algorithm will be very slow. Of course for such cases, the tree would be so large that it would be difficult to do anything with it. You could exclude such clonotypes e.g. with MAX_CELLS=100.

Finally, the tree is visualized using plain text, as shown in the example below. The added field dref shows the distance of each exact subclonotype from the donor reference, away from the recombination region.

enclone BCR=123085 TREE COMPLETE CDR3=CARDQNFDESSGYDAFDIW LVARSP=dref
[1] GROUP = 1 CLONOTYPES = 42 CELLS

[1.1] CLONOTYPE = 42 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              β”‚  CHAIN 1                                                   β”‚  CHAIN 2                              β”‚
β”‚              β”‚  159|IGHV3-7 β—† 53|IGHJ3                                    β”‚  376|IGLV5-37 β—† 313|IGLJ3             β”‚
β”‚              β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚              β”‚                       11 1111111111111111111               β”‚           11 11111111111              β”‚
β”‚              β”‚  23344445667777788999901 1111112222222222333               β”‚  23556679911 11111122222              β”‚
β”‚              β”‚  22324893380156725357903 4567890123456789012               β”‚  17068902403 45678901234              β”‚
β”‚              β”‚                          ════════CDR3═══════               β”‚              ════CDR3═══              β”‚
β”‚reference     β”‚  LPGAGSSSLNKQEKYVRANLLQβ—¦ β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W               β”‚  VRSYLYYAAAY CMIWβ—¦β—¦β—¦β—¦β—¦β—¦β—¦              β”‚
β”‚donor ref     β”‚  LPGAGSSSLNKQEKYVRANLLQβ—¦ β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W               β”‚  VRSYLYYAAAY CMIWβ—¦β—¦β—¦β—¦β—¦β—¦β—¦              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#    n  dref  β”‚  xxxxxxxxxxxxxxxxxxxxxxx .x......x........x.     u  const  β”‚  xxxxxxxxxxx ...........      u  constβ”‚
β”‚1   10     8  β”‚  LPGAGSSSPNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW  2323  IGHG1  β”‚  VRGYLYYAAAY CMIWPSNAWVF  14585  IGLC2β”‚
β”‚2    8     9  β”‚  LPGAGSSSLNKEEKYMRANLLQY CARDQNFDESSGYDAFDIW  3121  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF  11855  IGLC2β”‚
β”‚3    6     9  β”‚  LPGAKSNSLNKEQKYVRANLLQY CARDQNFDESSGYDAFDIW  1171  IGHG1  β”‚  VRSYLYYTAAY CMIWPSNAWVF   7613  IGLC2β”‚
β”‚4    5     4  β”‚  LPGAGSSSLNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW    11  IGHG1  β”‚  VRSYLYYAGAY CMIWPSNAWVF     26  IGLC2β”‚
β”‚5    2     8  β”‚  LPGAGSSSLNKEEKYMRANLLQY CARDQNFDESSGYDAFDIW  1005  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF   4779  IGLC2β”‚
β”‚6    1     8  β”‚  LPGAKSNSLNKEQKYVRANLLQY CARDQNFDESSGYDAFDIW  5862  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF  14144  IGLC2β”‚
β”‚7    1     7  β”‚  LPGAGSNSLNKEEIYVRANLLEY CTRDQNFDESSGYDAFDIW  4384  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF  14621  IGLC2β”‚
β”‚8    1     3  β”‚  LPGAGSSSLNKEEKYVRANLLQY CARDQNFDDSSGYDAFDIW  3890  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF  15577  IGLC2β”‚
β”‚9    1     3  β”‚  LPGAGSSSLNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW  3302  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF   5256  IGLC2β”‚
β”‚10   1     9  β”‚  LPGAGSSSPNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW  3067  IGHG1  β”‚  VRGYLYYAAAY CMIWPSNAWVF   6429  IGLC2β”‚
β”‚11   1    13  β”‚  LPGAGRNSLNKEEKYVRGNLLQY CARDQNFDESSGYDAFDIW  2724  IGHG3  β”‚  VRGYLYYAAAY CMIWPSNAWVF   5775  IGLC2β”‚
β”‚12   1     9  β”‚  LPGAGSSSLNKEEKYMRANLLQY CARDQNFDESSGYDAFDIW  2578  IGHA1  β”‚  VRSYLYYAAAY CMIWPSNAWVF  14497  IGLC2β”‚
β”‚13   1     7  β”‚  LPGAGSSSPNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW   839  IGHG1  β”‚  VRGYLYYAAAY CMIWPSNAWVF   7816  IGLC2β”‚
β”‚14   1     7  β”‚  LPGAGSSSLNKEEKYVRANLLQY CARDQNFDESSGYDAFDIW   404  IGHG1  β”‚  VRGYLYYAAAY CMIWPSNAWVF   3456  IGLC2β”‚
β”‚15   1     1  β”‚  LPGAGSSSLNKQEKYVRANLLQY CARDQNFDESSGYDAFDIW   136  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF   1023  IGLC2β”‚
β”‚16   1     9  β”‚  LPGAGSNSLNKEEIYVRANLLEY CTRDQNFDESSGYDAFDIW    96  IGHG1  β”‚  VRSYLYYAAAY CMIWPSNAWVF   1762  IGLC2β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β€’
β•šβ• β€’ [0.00]
   ╠════════ 15 [1.33]
   β•šβ•β•β• β€’ [0.50]
        ╠════════════ β€’ [2.00]
        β•‘             ╠══════ 4 [1.00]
        β•‘             β•šβ• 9 [0.00]
        β•šβ•β• β€’ [0.34]
            ╠════════════════════════════════ β€’ [5.53]
            β•‘                                 ╠══ β€’ [0.34]
            β•‘                                 β•‘   ╠════ β€’ [0.61]
            β•‘                                 β•‘   β•‘     ╠═══ 1 [0.50]
            β•‘                                 β•‘   β•‘     β•šβ•β•β•β•β•β•β•β•β• 10 [1.50]
            β•‘                                 β•‘   β•šβ•β• 13 [0.39]
            β•‘                                 β•šβ•β•β•β• 14 [0.66]
            β•šβ• β€’ [0.10]
               ╠═══════════════ 8 [2.65]
               β•šβ•β•β• β€’ [0.55]
                    ╠═════════════════════ β€’ [3.58]
                    β•‘                      ╠═══════════════════════ β€’ [4.00]
                    β•‘                      β•‘                        ╠═════ β€’ [0.79]
                    β•‘                      β•‘                        β•‘      ╠═ 2 [0.00]
                    β•‘                      β•‘                        β•‘      β•šβ• 12 [0.00]
                    β•‘                      β•‘                        β•šβ• 5 [0.21]
                    β•‘                      β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• 11 [8.00]
                    β•šβ• β€’ [0.17]
                       ╠═══════════════════════════════════════ β€’ [6.70]
                       β•‘                                        ╠══════ 3 [1.00]
                       β•‘                                        β•šβ• 6 [0.00]
                       β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• β€’ [6.30]
                                                             ╠═ 7 [0.00]
                                                             β•šβ•β•β•β•β•β•β•β•β•β•β•β• 16 [2.00]

Method 2. This method is invoked using the argument NEWICK, and is exactly like method 1, except that it outputs the resulting tree in Newick format.

For example, running enclone BCR=123085 NEWICK COMPLETE CDR3=CARDSWYSSGRNTPNWFDPW will generate the following Newick tree for the largest clonotype:

(((4:0.00,(11:0.00,19:4.00)I4:1.00)I7:0.66,(6:0.80,((12:0.04,14:6.96)I6:0.96,(((((2:0.00,18:0.00)I1:0.94,5:0.06)I2:0.50,17:0.50)I3:5.95,(3:0.04,(8:0.00,9:1.00)I5:0.96)I8:0.55)I11:0.47,(((7:0.02,16:0.98)I9:0.04,20:2.96)I10:0.97,(1:0.00,(10:2.00,(13:1.00,15:0.00)I19:0.00)I18:0.00)I17:0.03)I16:0.00)I15:0.03)I14:0.17)I13:0.20)I12:0.00)0;

This tree can be copied and pasted or otherwise exported to be viewed in tools such as iTOL.


Method 3. This method is invoked using the argument CLUSTAL_DNA=filename or CLUSTAL_AA=filename, where filename can be stdout, and otherwise must have the extension ".tar". It does not generate a tree, but instead generates a CLUSTALW alignment for each clonotype (using either bases or amino acids), with one sequence for each exact subclonotype. This sequence is the concatenation of the per-chain sequences, with the appropriate number of gap (-) characters shown if a chain is missing. As above, we recommend using the COMPLETE option to avoid this happening.

If filename is stdout, then the alignments are printed out after each clonotype picture. Otherwise, a tar file is generated, which if untarred yields one file per clonotype. To avoid confusion, it would be best for filename to have the suffix .tar. We also recommend using MIN_CELLS=... or some other argument to restrict the number of files that would be generated upon untarring.

This method can be used to provide input to another program that will generate a tree.


Method 4. This method is invoked using the argument PHYLIP_DNA=filename or PHYLIP_AA=filename, and is just like method 3, except for the output format.