Distribution-based Sketching of Single-Cell Samples. (arXiv:2207.00584v1 [q-bio.QM])

Modern high-throughput single-cell immune profiling technologies, such as
flow and mass cytometry and single-cell RNA sequencing can readily measure the
expression of a large number of protein or gene features across the millions of
cells in a multi-patient cohort. While bioinformatics approaches can be used to
link immune cell heterogeneity to external variables of interest, such as,
clinical outcome or experimental label, they often struggle to accommodate such
a large number of profiled cells. To ease this computational burden, a limited
number of cells are typically emph{sketched} or subsampled from each patient.
However, existing sketching approaches fail to adequately subsample rare cells
from rare cell-populations, or fail to preserve the true frequencies of
particular immune cell-types. Here, we propose a novel sketching approach based
on Kernel Herding that selects a limited subsample of all cells while
preserving the underlying frequencies of immune cell-types. We tested our
approach on three flow and mass cytometry datasets and on one single-cell RNA
sequencing dataset and demonstrate that the sketched cells (1) more accurately
represent the overall cellular landscape and (2) facilitate increased
performance in downstream analysis tasks, such as classifying patients
according to their clinical outcome. An implementation of sketching with Kernel
Herding is publicly available at

Source: https://arxiv.org/abs/2207.00584


