enclone banner

Detecting illusory clonotype expansions

Please read this! This page was written before we added two major filtering steps, based on UMI counts, which completely annihilate the particular illusory expansion described here. The reason we left the page here is that the approach used to analyze the expansion may have utility for other datasets. To reproduce the actual results shown here, you will need to add to each enclone command the arguments NUMI and NUMI_RATIO that turn off the added filters.

This page explains the origin of certain illusory clonotype expansions, and exhibits one example of how to detect them.

These expansions are known to occur occasionally (see below for one possible mechanism), and we hypothesize that they arise when an individual cell disintegrates or leaks. This leaves fragments that seed multiple GEM partitions, producing a clonotype that appears larger than its true size.

We believe that events of this type usually originate from plasma or plasmablast B cells. We thus focus on B cells in this vignette. However with obvious changes, the same methods also apply to T cells.

Disintegration might occur during or after preparation of the sample. One way to document such an event would be to create two libraries from a single tube of cells. If the clonotype is large and appears in only one of two libraries, one could be reasonably certain that a disintegration event occurred during or after cells were drawn from the tube. This method could not be used to detect disintegration events occurring prior to that point.

Here we show that with the aid of gene expression data, illusory clonotype expansions can generally be detected, even if only a single library was made. The easier case would be a sample consisting of pure B cells. The case where one has a mix of cell types is more challenging because a GEM can contain both a B cell fragment, plus a cell of a different type, and thus appear to have a normal level of gene expression, and no evidence of mixing from the VDJ assay either. We therefore focus on the case of samples that contain a mixture of cell types.
cell bits
To that end, we show an example, using two libraries obtained from a single tube of PBMC cells, obtained from a healthy human donor. The two libraries contain 7287 and 9559 cells, respectively, of which ~12% are B cells. All the data shown here are part of the large dataset package described in the download section of the main enclone page.

enclone BCR=128037,128040 NCROSS

The NCROSS option instructs enclone to not filter out expanded clonotypes that appear in only one dataset arising from the same sample (and which based on their sizes are highly improbable). Normally one would want this filtering, but these clonotypes are exactly what we wish to see now! Here is the top clonotype:

[1.1] CLONOTYPE = 120 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  β”‚  CHAIN 1                              β”‚  CHAIN 2                            β”‚
β”‚                  β”‚  146.1.1|IGHV3-53 β—† 55|IGHJ4          β”‚  296|IGKV4-1 β—† 216|IGKJ4            β”‚
β”‚                  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                  β”‚              1111111111111            β”‚              11111111111 1          β”‚
β”‚                  β”‚  12257777789 1111111222222            β”‚  11345778899 11111112222 2          β”‚
β”‚                  β”‚  35831234686 3456789012345            β”‚  78291346825 34567890123 7          β”‚
β”‚                  β”‚              ═════CDR3════            β”‚              ════CDR3═══            β”‚
β”‚reference         β”‚  STGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦            β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T          β”‚
β”‚donor ref         β”‚  LSGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦            β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#  datasets    n  β”‚  ........... .............  u  const  β”‚  ........... ........... .  u  constβ”‚
β”‚1  128040    114  β”‚  LSNNGDGNYFV CARGGTTTYFISW  6  IGHA1  β”‚  TNAFSLYRTSE CQQYCDTPLTF T  5  IGKC β”‚
β”‚2  128040      6  β”‚  LSNNGDGNYFV CARGGTTTYFISW  4  IGHA1  β”‚                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

If we do not use the NCROSS option, and search for the clonotype using the heavy chain CDR3 sequence, we see just one cell (the others having been filtered out):

enclone BCR=128037,128040 CDR3=CARGGTTTYFISW
[1.1] CLONOTYPE = 1 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                β”‚  CHAIN 1                                 β”‚  CHAIN 2                               β”‚
β”‚                β”‚  146.1.1|IGHV3-53 β—† 55|IGHJ4             β”‚  296|IGKV4-1 β—† 216|IGKJ4               β”‚
β”‚                β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                β”‚              1111111111111               β”‚              11111111111 1             β”‚
β”‚                β”‚  12257777789 1111111222222               β”‚  11345778899 11111112222 2             β”‚
β”‚                β”‚  35831234686 3456789012345               β”‚  78291346825 34567890123 7             β”‚
β”‚                β”‚              ═════CDR3════               β”‚              ════CDR3═══               β”‚
β”‚reference       β”‚  STGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦               β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T             β”‚
β”‚donor ref       β”‚  LSGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦               β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#  datasets  n  β”‚  ........... .............     u  const  β”‚  ........... ........... .     u  constβ”‚
β”‚1  128040    1  β”‚  LSNNGDGNYFV CARGGTTTYFISW  1743  IGHA1  β”‚  TNAFSLYRTSE CQQYCDTPLTF T  6319  IGKC β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is a good answer, but only works if libraries were made from two separate draws of cells. Now suppose that both a VDJ and a GEX library have been made, from a single draw of cells. (And we henceforth ignore the data made from the other draw of cells, useful though it is.)

enclone BCR=128040 GEX=127801 CDR3=CARGGTTTYFISW
[1.1] CLONOTYPE = 44 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚  CHAIN 1                              β”‚  CHAIN 2                            β”‚
β”‚           β”‚  146.1.2|IGHV3-53 β—† 55|IGHJ4          β”‚  296|IGKV4-1 β—† 216|IGKJ4            β”‚
β”‚           β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           β”‚              1111111111111            β”‚              11111111111 1          β”‚
β”‚           β”‚  12257777789 1111111222222            β”‚  11345778899 11111112222 2          β”‚
β”‚           β”‚  35831234686 3456789012345            β”‚  78291346825 34567890123 7          β”‚
β”‚           β”‚              ═════CDR3════            β”‚              ════CDR3═══            β”‚
β”‚reference  β”‚  STGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦            β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T          β”‚
β”‚donor ref  β”‚  LSGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦            β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T          β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#   n      β”‚  ........... .............  u  const  β”‚  ........... ........... .  u  constβ”‚
β”‚1  38      β”‚  LSNNGDGNYFV CARGGTTTYFISW  4  IGHA1  β”‚  TNAFSLYRTSE CQQYCDTPLTF T  3  IGKC β”‚
β”‚2   5      β”‚  LSNNGDGNYFV CARGGTTTYFISW  4  IGHA1  β”‚                                     β”‚
β”‚3   1      β”‚                                       β”‚  TNAFSLYRTSE CQQYCDTPLTF T  6  IGKC β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Now we see less cells. This is because the default behavior of enclone is to filter out cells called by the VDJ pipeline that are not also called by the GEX pipeline. Most of these would have consisted of "nearly empty drops", GEMs containing just a B cell fragment.

Now we add the option PER_CELL, causing data for each cell to be displayed, and we also add two fields to the display. One is gex, the normalized count of gene expression UMIs, and the other is a field cred (short for "credibility"), that is more complicated. We will also hide the onesie (single chain) cells.

enclone BCR=128040 GEX=127801 CDR3=CARGGTTTYFISW PER_CELL LVARSP=gex,cred CHAINS_EXACT=2
[1.1] CLONOTYPE = 38 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                        β”‚  CHAIN 1                                 β”‚  CHAIN 2                               β”‚
β”‚                                        β”‚  146.1.2|IGHV3-53 β—† 55|IGHJ4             β”‚  296|IGKV4-1 β—† 216|IGKJ4               β”‚
β”‚                                        β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                        β”‚              1111111111111               β”‚              11111111111 1             β”‚
β”‚                                        β”‚  12257777789 1111111222222               β”‚  11345778899 11111112222 2             β”‚
β”‚                                        β”‚  35831234686 3456789012345               β”‚  78291346825 34567890123 7             β”‚
β”‚                                        β”‚              ═════CDR3════               β”‚              ════CDR3═══               β”‚
β”‚reference                               β”‚  STGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦               β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T             β”‚
β”‚donor ref                               β”‚  LSGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦               β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#  barcode              n    gex  cred  β”‚  ........... .............     u  const  β”‚  ........... ........... .     u  constβ”‚
β”‚1                      38   5080   0.8  β”‚  LSNNGDGNYFV CARGGTTTYFISW     4  IGHA1  β”‚  TNAFSLYRTSE CQQYCDTPLTF T     3  IGKC β”‚
β”‚   AAATGCCCACTGAAGG-1       7309   0.8  β”‚                                2         β”‚                                7       β”‚
β”‚   AACCATGCAAAGAATC-1       4695   0.7  β”‚                                3         β”‚                                2       β”‚
β”‚   AACTGGTGTCGAACAG-1       4342   0.5  β”‚                                8         β”‚                                7       β”‚
β”‚   ACGGGTCGTCGCGGTT-1       2584   0.7  β”‚                                2         β”‚                                3       β”‚
β”‚   AGACGTTAGAGTAAGG-1       5327   0.9  β”‚                                6         β”‚                                3       β”‚
β”‚   AGCATACGTTTCCACC-1       5952   0.8  β”‚                                5         β”‚                                1       β”‚
β”‚   AGTGTCAAGTAGTGCG-1       3236   0.8  β”‚                               10         β”‚                               17       β”‚
β”‚   ATCCGAAAGGACTGGT-1        854   3.2  β”‚                                1         β”‚                                2       β”‚
β”‚   ATCTACTTCAGTTAGC-1       1692   0.5  β”‚                                5         β”‚                                2       β”‚
β”‚   ATCTGCCGTTACGACT-1       6203   1.0  β”‚                                2         β”‚                                2       β”‚
β”‚   CAAGTTGAGTTACGGG-1       4683   0.5  β”‚                                2         β”‚                                3       β”‚
β”‚   CAGAGAGAGATGGGTC-1       7020   0.7  β”‚                                4         β”‚                                1       β”‚
β”‚   CATATTCTCCGCTGTT-1       5069   0.7  β”‚                                7         β”‚                                2       β”‚
β”‚   CGATTGATCCACGCAG-1       4035   0.3  β”‚                                7         β”‚                               11       β”‚
β”‚   CGGCTAGGTCAACTGT-1       5624   0.8  β”‚                                2         β”‚                                2       β”‚
β”‚   CGTAGGCCAAACTGTC-1       1353   2.3  β”‚                                2         β”‚                                1       β”‚
β”‚   CTAGTGACACGGTTTA-1       3982   0.8  β”‚                                1         β”‚                                3       β”‚
β”‚   CTCTAATAGCCGATTT-1       2193   1.6  β”‚                                2         β”‚                                1       β”‚
β”‚   CTGGTCTAGCTGCCCA-1      20213  13.9  β”‚                             1743         β”‚                             6319       β”‚
β”‚   CTTCTCTAGATGCCAG-1       6362   1.1  β”‚                                5         β”‚                                5       β”‚
β”‚   GAAGCAGTCGTTACAG-1       5558   1.0  β”‚                                3         β”‚                                1       β”‚
β”‚   GACGTTATCTACCAGA-1       3989   0.7  β”‚                                2         β”‚                                2       β”‚
β”‚   GAGTCCGTCGGTCTAA-1      11407   9.3  β”‚                                3         β”‚                                1       β”‚
β”‚   GATGAGGAGATCTGCT-1       7682   1.1  β”‚                                4         β”‚                                1       β”‚
β”‚   GCATACATCGACAGCC-1       1680   1.3  β”‚                                3         β”‚                                2       β”‚
β”‚   GGAATAAGTTTGACAC-1       8189   1.3  β”‚                                3         β”‚                                1       β”‚
β”‚   GGCTGGTCAGTGGGAT-1       9925   0.9  β”‚                               16         β”‚                                6       β”‚
β”‚   GGGAGATTCCGCATAA-1       4732   1.0  β”‚                                5         β”‚                                4       β”‚
β”‚   GTACTCCAGGTGTGGT-1       4689   0.5  β”‚                                5         β”‚                                3       β”‚
β”‚   GTTAAGCCACATTAGC-1       7785   0.8  β”‚                                4         β”‚                                2       β”‚
β”‚   TAGTGGTTCGGCGCTA-1       5090   0.8  β”‚                               11         β”‚                               14       β”‚
β”‚   TCAGGATCAAGTTCTG-1       7551   0.4  β”‚                                2         β”‚                                3       β”‚
β”‚   TCAGGATGTTGCCTCT-1       3567   0.5  β”‚                                2         β”‚                                2       β”‚
β”‚   TCCCGATTCTATCCCG-1       6092   0.9  β”‚                                3         β”‚                                6       β”‚
β”‚   TGCGCAGCAAATCCGT-1       5849   0.9  β”‚                                8         β”‚                                5       β”‚
β”‚   TTCCCAGCAAGTTAAG-1       5985   0.7  β”‚                               11         β”‚                               15       β”‚
β”‚   TTGAACGTCCATTCTA-1       4775   0.6  β”‚                                3         β”‚                                2       β”‚
β”‚   TTTGCGCCACACAGAG-1       5054   0.9  β”‚                                4         β”‚                                3       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The field cred is a measure of the extent to which cells having gene expression similar to a given putative B cell are themselves B cells. In more detail, first for any datasets, let n be the number of VDJ cells that are also GEX cells. Now for a given cell, we find the n GEX cells that are closest to it in PCA space, and report the percent of those that are also VDJ cells.
This is cred. The closer this number is to 100, the more the given cell looks like a typical B cell. Conversely, a very low number makes the given cell appear suspect.

The values of cred vary considerably from dataset to dataset, requiring somewhat different interpretation. We show the distribution for this one dataset:

cred_gex_dist

Thus the values of the cells in the reported clonotype are very low indeed, and almost all highly suspect. Probably the clonotype originated from a single cell, which broke up into one major piece (the one for barcode CTGGTCTAGCTGCCCA-1), and many smaller pieces. These smaller pieces reside in GEMs that may or may not contain an actual intact cell. In fact, many of the cells are detected as T cells (using TCR data 128024 from the same cell draw). We can mark these cells in the same display using the command

enclone BCR=128040 GEX=127801 BC=128024_cells.csv CDR3=CARGGTTTYFISW PER_CELL LVARSP=gex,cred,T CHAINS_EXACT=2
where the file 128024_cells.csv is a CSV file with header barcode,T and having one line for each barcode in 128024/outs/cell_barcodes.json, e.g. AAACGGGAGAGAACAG-1,β—―. (We used the character β—― as a value just because we liked it.)
[1.1] CLONOTYPE = 38 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                           β”‚  CHAIN 1                                 β”‚  CHAIN 2                               β”‚
β”‚                                           β”‚  146.1.2|IGHV3-53 β—† 55|IGHJ4             β”‚  296|IGKV4-1 β—† 216|IGKJ4               β”‚
β”‚                                           β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                           β”‚              1111111111111               β”‚              11111111111 1             β”‚
β”‚                                           β”‚  12257777789 1111111222222               β”‚  11345778899 11111112222 2             β”‚
β”‚                                           β”‚  35831234686 3456789012345               β”‚  78291346825 34567890123 7             β”‚
β”‚                                           β”‚              ═════CDR3════               β”‚              ════CDR3═══               β”‚
β”‚reference                                  β”‚  STGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦               β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T             β”‚
β”‚donor ref                                  β”‚  LSGSSGGSYSL β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦               β”‚  AYVLSIYRSSD CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦ T             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#  barcode              n    gex  cred  T  β”‚  ........... .............     u  const  β”‚  ........... ........... .     u  constβ”‚
β”‚1                      38   5080   0.8     β”‚  LSNNGDGNYFV CARGGTTTYFISW     4  IGHA1  β”‚  TNAFSLYRTSE CQQYCDTPLTF T     3  IGKC β”‚
β”‚   AAATGCCCACTGAAGG-1       7309   0.8  β—―  β”‚                                2         β”‚                                7       β”‚
β”‚   AACCATGCAAAGAATC-1       4695   0.7  β—―  β”‚                                3         β”‚                                2       β”‚
β”‚   AACTGGTGTCGAACAG-1       4342   0.5  β—―  β”‚                                8         β”‚                                7       β”‚
β”‚   ACGGGTCGTCGCGGTT-1       2584   0.7     β”‚                                2         β”‚                                3       β”‚
β”‚   AGACGTTAGAGTAAGG-1       5327   0.9  β—―  β”‚                                6         β”‚                                3       β”‚
β”‚   AGCATACGTTTCCACC-1       5952   0.8  β—―  β”‚                                5         β”‚                                1       β”‚
β”‚   AGTGTCAAGTAGTGCG-1       3236   0.8     β”‚                               10         β”‚                               17       β”‚
β”‚   ATCCGAAAGGACTGGT-1        854   3.2     β”‚                                1         β”‚                                2       β”‚
β”‚   ATCTACTTCAGTTAGC-1       1692   0.5  β—―  β”‚                                5         β”‚                                2       β”‚
β”‚   ATCTGCCGTTACGACT-1       6203   1.0  β—―  β”‚                                2         β”‚                                2       β”‚
β”‚   CAAGTTGAGTTACGGG-1       4683   0.5     β”‚                                2         β”‚                                3       β”‚
β”‚   CAGAGAGAGATGGGTC-1       7020   0.7  β—―  β”‚                                4         β”‚                                1       β”‚
β”‚   CATATTCTCCGCTGTT-1       5069   0.7     β”‚                                7         β”‚                                2       β”‚
β”‚   CGATTGATCCACGCAG-1       4035   0.3     β”‚                                7         β”‚                               11       β”‚
β”‚   CGGCTAGGTCAACTGT-1       5624   0.8  β—―  β”‚                                2         β”‚                                2       β”‚
β”‚   CGTAGGCCAAACTGTC-1       1353   2.3     β”‚                                2         β”‚                                1       β”‚
β”‚   CTAGTGACACGGTTTA-1       3982   0.8     β”‚                                1         β”‚                                3       β”‚
β”‚   CTCTAATAGCCGATTT-1       2193   1.6     β”‚                                2         β”‚                                1       β”‚
β”‚   CTGGTCTAGCTGCCCA-1      20213  13.9     β”‚                             1743         β”‚                             6319       β”‚
β”‚   CTTCTCTAGATGCCAG-1       6362   1.1  β—―  β”‚                                5         β”‚                                5       β”‚
β”‚   GAAGCAGTCGTTACAG-1       5558   1.0  β—―  β”‚                                3         β”‚                                1       β”‚
β”‚   GACGTTATCTACCAGA-1       3989   0.7     β”‚                                2         β”‚                                2       β”‚
β”‚   GAGTCCGTCGGTCTAA-1      11407   9.3     β”‚                                3         β”‚                                1       β”‚
β”‚   GATGAGGAGATCTGCT-1       7682   1.1  β—―  β”‚                                4         β”‚                                1       β”‚
β”‚   GCATACATCGACAGCC-1       1680   1.3     β”‚                                3         β”‚                                2       β”‚
β”‚   GGAATAAGTTTGACAC-1       8189   1.3     β”‚                                3         β”‚                                1       β”‚
β”‚   GGCTGGTCAGTGGGAT-1       9925   0.9  β—―  β”‚                               16         β”‚                                6       β”‚
β”‚   GGGAGATTCCGCATAA-1       4732   1.0  β—―  β”‚                                5         β”‚                                4       β”‚
β”‚   GTACTCCAGGTGTGGT-1       4689   0.5  β—―  β”‚                                5         β”‚                                3       β”‚
β”‚   GTTAAGCCACATTAGC-1       7785   0.8  β—―  β”‚                                4         β”‚                                2       β”‚
β”‚   TAGTGGTTCGGCGCTA-1       5090   0.8  β—―  β”‚                               11         β”‚                               14       β”‚
β”‚   TCAGGATCAAGTTCTG-1       7551   0.4     β”‚                                2         β”‚                                3       β”‚
β”‚   TCAGGATGTTGCCTCT-1       3567   0.5  β—―  β”‚                                2         β”‚                                2       β”‚
β”‚   TCCCGATTCTATCCCG-1       6092   0.9  β—―  β”‚                                3         β”‚                                6       β”‚
β”‚   TGCGCAGCAAATCCGT-1       5849   0.9  β—―  β”‚                                8         β”‚                                5       β”‚
β”‚   TTCCCAGCAAGTTAAG-1       5985   0.7  β—―  β”‚                               11         β”‚                               15       β”‚
β”‚   TTGAACGTCCATTCTA-1       4775   0.6  β—―  β”‚                                3         β”‚                                2       β”‚
β”‚   TTTGCGCCACACAGAG-1       5054   0.9  β—―  β”‚                                4         β”‚                                3       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

We thus conclude in this case that the clonotype is likely contaminated with many cells that are not B cells, and in fact that the entire clonotype probably arose from a single true B cell. In other examples we have looked at, there appear to be a few true B cells, along with many that are not, either corresponding to other cell types or nearly empty GEMs.

Overall conclusion: illusory clonotypes are rare, and can generally be detected, either with the aid of a second library made from the same lot of cells, or with gene expression data.