Cluster cell embeddings using a KNN graph-based algorithm
Source:R/clustering.R
cluster_cells_graph.RdTake in a cell embedding matrix, then find k nearest neighbors (KNN) for each cell, convert the KNN into a graph (adjacency matrix), then run a graph-based clustering algorithm. Each of these steps can be customized by passing a function that performs the step (see details).
Usage
cluster_cells_graph(
mat,
knn_method = knn_hnsw,
knn_to_graph_method = knn_to_geodesic_graph,
cluster_graph_method = cluster_graph_leiden,
threads = 0L,
verbose = FALSE
)Arguments
- mat
(matrix) Cell embeddings matrix of shape
(cells x n_embeddings)- knn_method
(function) Function to that takes an embedding matrix as the first argument and returns a k nearest neighbors (KNN) object. For example,
knn_hnsw(),knn_annoy(), or a parameterized version (see Details).- knn_to_graph_method
(function) Function that takes a KNN object and returns a graph as an undirected graph (lower-triangular dgCMatrix adjacency matrix). For example,
knn_to_graph(),knn_to_snn_graph(),knn_to_geodesic_graph(), or a parameterized version (see Details).- cluster_graph_method
(function) Function that takes an undirected graph of cell similarity and returns a factor with cluster assignments for each cell. For example,
cluster_graph_leiden(),cluster_graph_louvain(),cluster_graph_seurat(), or a parameterized version (see Details).- threads
(integer) Number of threads to use in
knn_method,knn_to_graph_methodandcluster_graph_method. If these functions do not utilize athreadsargument, this is silently ignored.- verbose
(logical) Whether to print progress information in
knn_method,knn_to_graph_methodandcluster_graph_method. If these functions do not utilize averboseargument, this is silently ignored.
Details
Customizing clustering steps
All of the BPCells functions named like knn_*, knn_to_graph_*, and cluster_graph_* support customizing parameters
via partial function application. For example, look for 20 neighbors during the k nearest neighbors search, setting
knn_method=knn_hnsw(k=20) is a convenient shortcut for knn_method=function(x) knn_hnsw(x, k=20). Similarly, lowering the
default clustering resolution can be done as cluster_graph_method=cluster_graph_louvain(resolution=0.5). This works because
all these functions are written to return a partially parameterized copy of themselves as a function object when their
first argument is missing.
For even more advanced customization, users can manually call the knn, knn_to_graph, and cluster_graph methods rather
than using cluster_cells_graph() as a convenient wrapper.
Implementing custom clustering steps
The required interfaces for each step are as follows:
knn_method: First argument is a matrix of cell embeddings, shape (cells x n_embeddings).
Returns a named list of two matrices of dimension (cells x k):
idx: Neighbor indices, whereidx[c, n]is the index of the nth nearest neighbor to cellc.dist: Neighbor distances, wheredist[c, n]is the distance between cellcand its nth nearest neighbor. Self-neighbors are allowed, so with sufficient search effortidx[c,1] == cfor nearly all cells.
knn_to_graph_method: First argument is a KNN object as returned by knn_method.
Returns a weighted similarity graph as a lower triangular sparse adjacency matrix (dgCMatrix).
For cells i and j, their similarity score is in adjacency_mat[max(i,j), min(i,j)].
cluster_graph_method: First argument is a weighted similarity graph as returned by knn_to_graph_method.
Returns a factor vector of length cells with a cluster assignment for each cell.
