Skip to contents

Given a (features x cells) matrix, group cells by cell_groups and aggregate counts by method for each feature.

Usage

pseudobulk_matrix(mat, cell_groups, method = "sum", threads = 0L)

Arguments

mat

IterableMatrix object of dimensions features x cells

cell_groups

(Character/factor) Vector of group/cluster assignments for each cell. Length must be ncol(mat).

method

(Character vector) Method(s) to aggregate counts. If one method is provided, the output will be a matrix. If multiple methods are provided, the output will be a named list of matrices.

Current options are: nonzeros, sum, mean, variance.

threads

(integer) Number of threads to use.

Value

  • If method is length 1, returns a matrix of shape (features x groups).

  • If method is greater than length 1, returns a list of matrices with each matrix representing a pseudobulk matrix with a different aggregation method. Each matrix is of shape (features x groups), and names are one of nonzeros, sum, mean, variance.

Details

Some simpler stats are calculated in the process of calculating more complex statistics. So when calculating variance, nonzeros and mean can be included with no extra calculation time, and when calculating mean, adding nonzeros will take no extra time.

Examples

set.seed(12345)
mat <- matrix(rpois(100, lambda = 5), nrow = 10)
rownames(mat) <- paste0("gene", 1:10)
colnames(mat) <- paste0("cell", 1:10) 
mat <- mat %>% as("dgCMatrix") %>% as("IterableMatrix")
groups <- rep(c("Cluster1", "Cluster2"), each = 5)

## When calculating only sum across two groups
pseudobulk_res <- pseudobulk_matrix(
  mat = mat,
  cell_groups = groups,
  method = "sum"
)
pseudobulk_res
#>        Cluster1 Cluster2
#> gene1        26       38
#> gene2        19       27
#> gene3        32       21
#> gene4        27       19
#> gene5        22       27
#> gene6        20       23
#> gene7        24       37
#> gene8        24       22
#> gene9        20       23
#> gene10       34       21

## Can also request multiple summary statistics for pseudoulking
pseudobulk_res_multi <- pseudobulk_matrix(
  mat = mat,
  cell_groups = groups,
  method = c("mean",  "variance")
)

names(pseudobulk_res_multi)
#> [1] "mean"     "variance"

pseudobulk_res_multi$mean
#>        Cluster1 Cluster2
#> gene1       5.2      7.6
#> gene2       3.8      5.4
#> gene3       6.4      4.2
#> gene4       5.4      3.8
#> gene5       4.4      5.4
#> gene6       4.0      4.6
#> gene7       4.8      7.4
#> gene8       4.8      4.4
#> gene9       4.0      4.6
#> gene10      6.8      4.2