Skip to contents

Take in a cell embedding matrix, then find k nearest neighbors (KNN) for each cell, convert the KNN into a graph (adjacency matrix), then run a graph-based clustering algorithm. Each of these steps can be customized by passing a function that performs the step (see details).

Usage

cluster_cells_graph(
  mat,
  knn_method = knn_hnsw,
  knn_to_graph_method = knn_to_geodesic_graph,
  cluster_graph_method = cluster_graph_leiden,
  threads = 0L,
  verbose = FALSE
)

Arguments

mat

(matrix) Cell embeddings matrix of shape (cells x n_embeddings)

knn_method

(function) Function to that takes an embedding matrix as the first argument and returns a k nearest neighbors (KNN) object. For example, knn_hnsw(), knn_annoy(), or a parameterized version (see Details).

knn_to_graph_method

(function) Function that takes a KNN object and returns a graph as an undirected graph (lower-triangular dgCMatrix adjacency matrix). For example, knn_to_graph(), knn_to_snn_graph(), knn_to_geodesic_graph(), or a parameterized version (see Details).

cluster_graph_method

(function) Function that takes an undirected graph of cell similarity and returns a factor with cluster assignments for each cell. For example, cluster_graph_leiden(), cluster_graph_louvain(), cluster_graph_seurat(), or a parameterized version (see Details).

threads

(integer) Number of threads to use in knn_method, knn_to_graph_method and cluster_graph_method. If these functions do not utilize a threads argument, this is silently ignored.

verbose

(logical) Whether to print progress information in knn_method, knn_to_graph_method and cluster_graph_method. If these functions do not utilize a verbose argument, this is silently ignored.

Value

(factor) Factor vector containing the cluster assignment for each cell.

Details

Customizing clustering steps

All of the BPCells functions named like knn_*, knn_to_graph_*, and cluster_graph_* support customizing parameters via partial function application. For example, look for 20 neighbors during the k nearest neighbors search, setting knn_method=knn_hnsw(k=20) is a convenient shortcut for knn_method=function(x) knn_hnsw(x, k=20). Similarly, lowering the default clustering resolution can be done as cluster_graph_method=cluster_graph_louvain(resolution=0.5). This works because all these functions are written to return a partially parameterized copy of themselves as a function object when their first argument is missing.

For even more advanced customization, users can manually call the knn, knn_to_graph, and cluster_graph methods rather than using cluster_cells_graph() as a convenient wrapper.

Implementing custom clustering steps

The required interfaces for each step are as follows:

knn_method: First argument is a matrix of cell embeddings, shape (cells x n_embeddings). Returns a named list of two matrices of dimension (cells x k):

  • idx: Neighbor indices, where idx[c, n] is the index of the nth nearest neighbor to cell c.

  • dist: Neighbor distances, where dist[c, n] is the distance between cell c and its nth nearest neighbor. Self-neighbors are allowed, so with sufficient search effort idx[c,1] == c for nearly all cells.

knn_to_graph_method: First argument is a KNN object as returned by knn_method. Returns a weighted similarity graph as a lower triangular sparse adjacency matrix (dgCMatrix). For cells i and j, their similarity score is in adjacency_mat[max(i,j), min(i,j)].

cluster_graph_method: First argument is a weighted similarity graph as returned by knn_to_graph_method. Returns a factor vector of length cells with a cluster assignment for each cell.