Aggregate counts matrices by cell group or feature.
Source:R/singlecell_utils.R
pseudobulk_matrix.RdGiven a (features x cells) matrix, group cells by cell_groups and aggregate counts by method for each
feature.
Arguments
- mat
IterableMatrix object of dimensions features x cells
- cell_groups
(Character/factor) Vector of group/cluster assignments for each cell. Length must be
ncol(mat).- method
(Character vector) Method(s) to aggregate counts. If one method is provided, the output will be a matrix. If multiple methods are provided, the output will be a named list of matrices.
Current options are:
nonzeros,sum,mean,variance.- threads
(integer) Number of threads to use.
Value
If
methodis length1, returns a matrix of shape(features x groups).If
methodis greater than length1, returns a list of matrices with each matrix representing a pseudobulk matrix with a different aggregation method. Each matrix is of shape(features x groups), and names are one ofnonzeros,sum,mean,variance.
Details
Some simpler stats are calculated in the process of calculating more complex
statistics. So when calculating variance, nonzeros and mean can be included with no
extra calculation time, and when calculating mean, adding nonzeros will take no extra time.
Examples
set.seed(12345)
mat <- matrix(rpois(100, lambda = 5), nrow = 10)
rownames(mat) <- paste0("gene", 1:10)
colnames(mat) <- paste0("cell", 1:10)
mat <- mat %>% as("dgCMatrix") %>% as("IterableMatrix")
groups <- rep(c("Cluster1", "Cluster2"), each = 5)
## When calculating only sum across two groups
pseudobulk_res <- pseudobulk_matrix(
mat = mat,
cell_groups = groups,
method = "sum"
)
pseudobulk_res
#> Cluster1 Cluster2
#> gene1 26 38
#> gene2 19 27
#> gene3 32 21
#> gene4 27 19
#> gene5 22 27
#> gene6 20 23
#> gene7 24 37
#> gene8 24 22
#> gene9 20 23
#> gene10 34 21
## Can also request multiple summary statistics for pseudoulking
pseudobulk_res_multi <- pseudobulk_matrix(
mat = mat,
cell_groups = groups,
method = c("mean", "variance")
)
names(pseudobulk_res_multi)
#> [1] "mean" "variance"
pseudobulk_res_multi$mean
#> Cluster1 Cluster2
#> gene1 5.2 7.6
#> gene2 3.8 5.4
#> gene3 6.4 4.2
#> gene4 5.4 3.8
#> gene5 4.4 5.4
#> gene6 4.0 4.6
#> gene7 4.8 7.4
#> gene8 4.8 4.4
#> gene9 4.0 4.6
#> gene10 6.8 4.2
