[][src]Module debruijn::filter

Methods for converting sequences into kmers, filtering observed kmers before De Bruijn graph construction, and summarizing 'color' annotations.

Structs

CountFilter

A simple KmerSummarizer that only accepts kmers that are observed at least a given number of times. The metadata returned about a Kmer is the number of times it was observed, capped at 2^16.

CountFilterEqClass
CountFilterSet

A simple KmerSummarizer that only accepts kmers that are observed at least a given number of times. The metadata returned about a Kmer is a vector of the unique data values observed for that kmer.

Traits

KmerSummarizer

Implement this trait to control how multiple observations of a kmer are carried forward into a DeBruijn graph.

Functions

filter_kmers

Process DNA sequences into kmers and determine the set of valid kmers, their extensions, and summarize associated label/'color' data. The input sequences are converted to kmers of type K, and like kmers are grouped together. All instances of each kmer, along with their label data are passed to summarizer, an implementation of the KmerSummarizer which decides if the kmer is 'valid' by an arbitrary predicate of the kmer data, and summarizes the the individual label into a single label data structure for the kmer. Care is taken to keep the memory consumption small. Less than 4G of temporary memory should be allocated to hold intermediate kmers.

remove_censored_exts

Remove extensions in valid_kmers that point to censored kmers. Use this method in a non-partitioned context when valid_kmers includes all kmers that will ultimately be included in the graph.

remove_censored_exts_sharded

Remove extensions in valid_kmers that point to censored kmers. A censored kmer exists in all_kmers but not valid_kmers. Since the kmer exists in this partition, but was censored, we know that we can delete extensions to it. In sharded kmer processing, we will have extensions to kmers in other shards. We don't know whether these are censored until later, so we retain these extension.

Type Definitions

EqClassIdType