enclone banner

enclone default filters

When enclone is run, a series of filters are applied, resulting in deletion of barcodes or in some cases in changing which cells are combined together. Here we describe the order of the filters and technical details about some of them. Please see also the page enclone help special that describes some details about the filters and how they can be turned off.

Note that when you run enclone, if you specify the SUMMARY option, then a table will be printed showing the filters that removed cells, and how many cells that each removed. You can use this as a guide regarding which filters are most important for your dataset. If you only want to see the summary, then you can use the two options SUMMARY and NOPRINT.


Filter order. The following table enumerates the filters, in order of application, along with a brief description of what they do. Please understand that in general, artifactual barcodes cannot be surgically removed by sharply defined tests. Rather, the filters are heuristic, and have the cumulative effect of removing nearly all artifacts (in most cases), while removing few valid barcodes. In the enclone codebase, there are regression tests, some of which provide representative examples for filtering. When we modify the filters, we examine the effect on these examples, and add others as needed as protection against accidental deterioration of performance.

number filter name brief description
1 cell filter remove barcodes not called cells in Cell Ranger VDJ pipeline
2 maximum contigs filter remove barcodes having more than four productive contigs
3 graph filter remove some exact subclonotypes that appear to be background
4 cross filter use cross-library information to remove spurious exact subclonotypes
5 barcode duplication filter remove duplicated barcodes within an exact subclonotype
6 whitelist filter remove rare artifacts arising from gel bead contamination
7 foursie filter remove some four-chain clonotypes that might represent doublets
8 improper filter remove exact subclonotypes having multiple chains, all of the same type
9 weak onesie filter disintegrate some single-chain clonotypes into single cells
10 UMI filter remove some B cells having very low UMI counts
11 UMI ratio filter remove some B cells having very low UMI counts, relative to clonotype
12 GEX filter remove cells called by VDJ pipeline but not by GEX pipeline
13 doublet filter remove some barcodes that appear to represent doublets
14 signature filter remove some barcodes that appear to be contaminants, based on their chain signature
15 onesie merger prevent merger of some single-chain clonotypes into other clonotypes
16 weak chain filter remove cells having a chain that is probably spurious
17 quality merger filter out exact subclonotypes having a position with low quality scores

The remainder of this page and enclone help special provide more details about the filters.


Maximum contigs filtering. Remove barcodes that were assigned more than four productive contigs. Specifying NMAX turns off this filter. This only has an effect if cell filtering is also turned off. Also deletion of cells by this filter is not tracked by the SUMMARY option or the lvar filter.


Cross filtering. If multiple draws are made from the same tube of cells, and one library made from each, yielding multiple "datasets" having the same "origin", then the clonotypes observed in different libraries should be statistically consistent. Otherwise, they likely represent an artifact, for example, possibly resulting from fragmentation of a plasma cell. We apply the following test as a proxy for statistical consistency (unless NCROSS is specified):

If a V..J segment appears in exactly one dataset, with frequency n, let x be the total number of productive pairs for that dataset, and let y be the total number of productive pairs for all datasets from the same origin. If (x/y)^n ≀ 10^-6, i.e. the probability that assuming even distribution, all instances of that V..J ended up in that one dataset, delete all the productive pairs for that V..J segment that do not have at least 100 supporting UMIs.

This test could clearly be strengthened.


Foursie filtering. Foursie exact subclonotypes are highly enriched for cell doublets. Deleting them all might be justified, but because it is hypothetically possible that sometimes they represent the actual biology of single cells, we do not do this. However we never merge them with other exact subclonotypes, and sometimes we delete them, if we have other evidence they they are doublets. Specifically, for each foursie exact subclonotype, enclone looks at each pair of two chains within it (with one heavy and one light, or TRB/TRA), and if the V..J sequences for those appear in a twosie exact subclonotype having at least ten cells, then the foursie exact subclonotype is deleted, no matter how many cells it has. For example, this shows two foursie clonotypes that are present if the filtering is off:

enclone BCR=123085 CDR3=CARRYFGVVADAFDIW NFOURSIE_KILL
[1] GROUP = 1 CLONOTYPES = 34 CELLS

[1.1] CLONOTYPE = 34 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚  CHAIN 1                            β”‚  CHAIN 2                  β”‚
β”‚           β”‚  740.1.1|IGHV4-30-4 β—† 53|IGHJ3      β”‚  253|IGKV1D-39 β—† 217|IGKJ5β”‚
β”‚           β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           β”‚      1 1111111111111111             β”‚  111111111111             β”‚
β”‚           β”‚  25690 1111122222222223             β”‚  011111111112             β”‚
β”‚           β”‚  01048 5678901234567890             β”‚  901234567890             β”‚
β”‚           β”‚        ══════CDR3══════             β”‚  ════CDR3════             β”‚
β”‚reference  β”‚  LDPSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”‚donor ref  β”‚  VGHSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#   n      β”‚  ..... ................   u  const  β”‚  ............    u  const β”‚
β”‚1  34      β”‚  VGHSA CARRYFGVVADAFDIW  57  IGHM   β”‚  CQQSYSTPPITF  207  IGKC  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸

[2] GROUP = 1 CLONOTYPES = 22 CELLS

[2.1] CLONOTYPE = 22 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚  CHAIN 1                            β”‚  CHAIN 2                                β”‚  CHAIN 3                    β”‚  CHAIN 4                  β”‚
β”‚           β”‚  740.1.1|IGHV4-30-4 β—† 53|IGHJ3      β”‚  144.1.2|IGHV3-49 β—† 737|IGHJ6           β”‚  273|IGKV2D-28 β—† 213|IGKJ1  β”‚  253|IGKV1D-39 β—† 217|IGKJ5β”‚
β”‚           β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           β”‚      1 1111111111111111             β”‚     1111111111111111111111              β”‚  11111111111                β”‚  111111111111             β”‚
β”‚           β”‚  25690 1111122222222223             β”‚  35 1111222222222233333333              β”‚  11111111222                β”‚  011111111112             β”‚
β”‚           β”‚  01048 5678901234567890             β”‚  15 6789012345678901234567              β”‚  23456789012                β”‚  901234567890             β”‚
β”‚           β”‚        ══════CDR3══════             β”‚     ═════════CDR3═════════              β”‚  ════CDR3═══                β”‚  ════CDR3════             β”‚
β”‚reference  β”‚  LDPSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  QV β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦GMDVW              β”‚  CMQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦                β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”‚donor ref  β”‚  VGHSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  QF β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦GMDVW              β”‚  CMQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦                β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#   n      β”‚  ..... ................   u  const  β”‚  .. ......................    u  const  β”‚  ...........    u  const    β”‚  ............    u  const β”‚
β”‚1  22      β”‚  VGHSA CARRYFGVVADAFDIW  65  IGHM   β”‚  KF CTRAGFLSYQLLSYYYYGMDVW  308  IGHG1  β”‚  CMQALQTPWTF  647  IGKC     β”‚  CQQSYSTPPITF  264  IGKC  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸

[3] GROUP = 1 CLONOTYPES = 1 CELLS

[3.1] CLONOTYPE = 1 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚  CHAIN 1                            β”‚  CHAIN 2                                β”‚  CHAIN 3                    β”‚  CHAIN 4                  β”‚
β”‚           β”‚  740.1.1|IGHV4-30-4 β—† 53|IGHJ3      β”‚  144.1.2|IGHV3-49 β—† 737|IGHJ6           β”‚  273|IGKV2D-28 β—† 213|IGKJ1  β”‚  253|IGKV1D-39 β—† 217|IGKJ5β”‚
β”‚           β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           β”‚      1 1111111111111111             β”‚     1111111111111111111111              β”‚  11111111111                β”‚  111111111111             β”‚
β”‚           β”‚  25690 1111122222222223             β”‚  35 1111222222222233333333              β”‚  11111111222                β”‚  011111111112             β”‚
β”‚           β”‚  01048 5678901234567890             β”‚  15 6789012345678901234567              β”‚  23456789012                β”‚  901234567890             β”‚
β”‚           β”‚        ══════CDR3══════             β”‚     ═════════CDR3═════════              β”‚  ════CDR3═══                β”‚  ════CDR3════             β”‚
β”‚reference  β”‚  LDPSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  QV β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦GMDVW              β”‚  CMQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦                β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”‚donor ref  β”‚  VGHSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  QF β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦GMDVW              β”‚  CMQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦                β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#  n       β”‚  ..... ................   u  const  β”‚  .. ......................    u  const  β”‚  ...........    u  const    β”‚  ............    u  const β”‚
β”‚1  1       β”‚  VGHSA CARRYFGVVADAFDIW  69  ?      β”‚  KF CTRAGFLSYQLLSYYYYGMDVW  304  IGHG1  β”‚  CMQALQTPWTF  562  IGKC     β”‚  CQQSYSTPPITF  266  IGKC  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
and which are deleted if the foursie filtering is on:
enclone BCR=123085 CDR3=CARRYFGVVADAFDIW
[1] GROUP = 1 CLONOTYPES = 34 CELLS

[1.1] CLONOTYPE = 34 CELLS
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           β”‚  CHAIN 1                            β”‚  CHAIN 2                  β”‚
β”‚           β”‚  740.1.1|IGHV4-30-4 β—† 53|IGHJ3      β”‚  253|IGKV1D-39 β—† 217|IGKJ5β”‚
β”‚           β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚           β”‚      1 1111111111111111             β”‚  111111111111             β”‚
β”‚           β”‚  25690 1111122222222223             β”‚  011111111112             β”‚
β”‚           β”‚  01048 5678901234567890             β”‚  901234567890             β”‚
β”‚           β”‚        ══════CDR3══════             β”‚  ════CDR3════             β”‚
β”‚reference  β”‚  LDPSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”‚donor ref  β”‚  VGHSA β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦W             β”‚  CQQβ—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦β—¦             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚#   n      β”‚  ..... ................   u  const  β”‚  ............    u  const β”‚
β”‚1  34      β”‚  VGHSA CARRYFGVVADAFDIW  57  IGHM   β”‚  CQQSYSTPPITF  207  IGKC  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

UMI filtering. enclone filters out B cells having low UMI counts, relative to a baseline that is determined for each dataset, according to a heuristic described here, unless the argument NUMI is supplied, to turn off that filter.

The motivation for this filter is to mitigate illusory clonotype expansions arising from fragmentation of plasma cells or other physical processes (not all fully understood). These processes all result in "cells" having low UMI counts, many of which do not correspond to intact real cells. Illusory clonotype expansions are generally infrequent, but occasionally cluster in individual datasets.

Nomenclature: for any cell, find the maximum UMI count for its heavy chains, if any, and the maximum for its light chains, if any. The sum of these two maxima is denoted umitot.

The algorithm for this filter first establishes a baseline for the expected value of umitot, for each dataset taken individually. To do this, all clonotypes having exactly one cell and exactly one heavy and light chain each are examined. If there are less than 20 such cells, the filter is not applied to cells in that dataset. Otherwise, let n_50% denote the median of the umitot values for the dataset, and let n_10% the 10th percentile. Let

umin = min( n_10%, n_50% - 4 * sqrt(n_50%) ).
This is the baseline low value for umitot. The reason for having the second part of the min is to prevent filtering in cases where UMI counts are sufficiently low that poisson variability could cause a real cell to appear fake.

Next we scan each clonotype having at least two cells, and delete every cell having umitot < umin, with the following qualifications:

A better test could probably be devised that started from the expected distribution of UMI counts. The test would trigger based on the number and improbability of low UMI counts. The current test only considers the number of counts that fall below a threshold, and not their particular values.


UMI ratio filtering. enclone filters out B cells having low UMI counts, relative to other UMI counts in a given clonotype, according to a heuristic described here, unless the argument NUMI_RATIO is supplied, to turn off that filter.

First we mark a cell for possible deletion, if the VDJ UMI count for some chain of some other cell is at least 500 times greater than the total VDJ UMI count for the given cell.

Then we scan each clonotype having at least two cells, and delete every cell marked as above, with the following qualification. Let k be the number of cells to be deleted in a clonotype having n cells. Then we require that for a binomial distribution having p = 0.1, the probability of observing k or more events in a sample of size n is less then 0.01.


Doublet filtering. This filtering removes some exact subclonotypes that appear to represent doublets (or possibly higher-order multiplets). The first Cell Ranger version in which this appeared was 6.0.

The algorithm works by first computing pure subclonotypes. This is done by taking each clonotype and breaking it apart according to its chain signature. All the exact subclonotypes that have entries for particular chains (and not entries for the other chains) are merged together to form a pure subclonotype.

In the simplest case, where the clonotype has two chains, the clonotype could give rise to three pure subclonotypes: one for the exact subclonotypes that have both chains, and one each for the subclonotypes that have only one chain.

The algorithm then finds triples (p0, p1, p2) of pure subclonotypes, for which the following three conditions are all satisfied:

Finally, if 5 * ncells(p0) <= min( ncells(p1), ncells(p2) ), the entire pure subclonotype p0 is deleted. And after all these operations are completed, some of the original clonotypes may break up into separate clonotypes, as they may no longer be held together by shared chains.

If the argument NDOUBLET is supplied to enclone, then doublet filtering is not applied.


Signature filtering. This filter removes some exact subclonotypes that appear to represent contaminants, based on their chain signature. This filter sometimes breaks up complex clonotypes having many chains and representing multiple true clonotypes that are glued together into a single clonotype via exact subclonotypes whose constituent barcodes do not arise fully from single cells.

This filter can dramatically affect certain datasets, but has almost no effect on typical data (less than one per million clonotypes tested).

The algorithm uses some terminology described at doublet filtering, above. Given a pure subclonotype p having at least two chains, if the total cells in the two-chain pure subclonotypes that are different from it but share a chain with it is at least 20 times greater than the number of cells in p, then p is deleted.

If the argument NSIG is supplied to enclone, then signature filtering is not applied.

This filter first appeared after cellranger version 6.1.


Weak chain filtering. If a clonotype has three or more chains, and amongst those there is a chain that appears in a relatively small number of cells, we delete all the cells that support that chain. This filter is turned off if NWEAK_CHAINS is specified. The precise condition is that the number of cells supporting the chain is at most 20, and 8 times that number of cells is less than the total number of cells in the clonotype.
For the current Cell Ranger, replace 20 by 5. This will change at some point after Cell Ranger 6.0.