Cluster cell embeddings using a KNN graph-based algorithm
Source:R/clustering.R
cluster_cells_graph.Rd
Take in a cell embedding matrix, then find k nearest neighbors (KNN) for each cell, convert the KNN into a graph (adjacency matrix), then run a graph-based clustering algorithm. Each of these steps can be customized by passing a function that performs the step (see details).
Usage
cluster_cells_graph(
mat,
knn_method = knn_hnsw,
knn_to_graph_method = knn_to_geodesic_graph,
cluster_graph_method = cluster_graph_leiden,
threads = 0L,
verbose = FALSE
)
Arguments
- mat
(matrix) Cell embeddings matrix of shape
(cells x n_embeddings)
- knn_method
(function) Function to that takes an embedding matrix as the first argument and returns a k nearest neighbors (KNN) object. For example,
knn_hnsw()
,knn_annoy()
, or a parameterized version (see Details).- knn_to_graph_method
(function) Function that takes a KNN object and returns a graph as an undirected graph (lower-triangular dgCMatrix adjacency matrix). For example,
knn_to_graph()
,knn_to_snn_graph()
,knn_to_geodesic_graph()
, or a parameterized version (see Details).- cluster_graph_method
(function) Function that takes an undirected graph of cell similarity and returns a factor with cluster assignments for each cell. For example,
cluster_graph_leiden()
,cluster_graph_louvain()
,cluster_graph_seurat()
, or a parameterized version (see Details).- threads
(integer) Number of threads to use in
knn_method
,knn_to_graph_method
andcluster_graph_method
. If these functions do not utilize athreads
argument, this is silently ignored.- verbose
(logical) Whether to print progress information in
knn_method
,knn_to_graph_method
andcluster_graph_method
. If these functions do not utilize averbose
argument, this is silently ignored.
Details
Customizing clustering steps
All of the BPCells functions named like knn_*
, knn_to_graph_*
, and cluster_graph_*
support customizing parameters
via partial function application. For example, look for 20 neighbors during the k nearest neighbors search, setting
knn_method=knn_hnsw(k=20)
is a convenient shortcut for knn_method=function(x) knn_hnsw(x, k=20)
. Similarly, lowering the
default clustering resolution can be done as cluster_graph_method=cluster_graph_louvain(resolution=0.5)
. This works because
all these functions are written to return a partially parameterized copy of themselves as a function object when their
first argument is missing.
For even more advanced customization, users can manually call the knn
, knn_to_graph
, and cluster_graph
methods rather
than using cluster_cells_graph()
as a convenient wrapper.
Implementing custom clustering steps
The required interfaces for each step are as follows:
knn_method: First argument is a matrix of cell embeddings, shape (cells x n_embeddings)
.
Returns a named list of two matrices of dimension (cells x k):
idx
: Neighbor indices, whereidx[c, n]
is the index of the nth nearest neighbor to cellc
.dist
: Neighbor distances, wheredist[c, n]
is the distance between cellc
and its nth nearest neighbor. Self-neighbors are allowed, so with sufficient search effortidx[c,1] == c
for nearly all cells.
knn_to_graph_method: First argument is a KNN object as returned by knn_method
.
Returns a weighted similarity graph as a lower triangular sparse adjacency matrix (dgCMatrix
).
For cells i
and j
, their similarity score is in adjacency_mat[max(i,j), min(i,j)]
.
cluster_graph_method: First argument is a weighted similarity graph as returned by knn_to_graph_method
.
Returns a factor vector of length cells
with a cluster assignment for each cell.