Aggregate counts matrices by cell group or feature.
Source:R/singlecell_utils.R
pseudobulk_matrix.Rd
Given a (features x cells)
matrix, group cells by cell_groups
and aggregate counts by method
for each
feature.
Arguments
- mat
IterableMatrix object of dimensions features x cells
- cell_groups
(Character/factor) Vector of group/cluster assignments for each cell. Length must be
ncol(mat)
.- method
(Character vector) Method(s) to aggregate counts. If one method is provided, the output will be a matrix. If multiple methods are provided, the output will be a named list of matrices.
Current options are:
nonzeros
,sum
,mean
,variance
.- threads
(integer) Number of threads to use.
Value
If
method
is length1
, returns a matrix of shape(features x groups)
.If
method
is greater than length1
, returns a list of matrices with each matrix representing a pseudobulk matrix with a different aggregation method. Each matrix is of shape(features x groups)
, and names are one ofnonzeros
,sum
,mean
,variance
.
Details
Some simpler stats are calculated in the process of calculating more complex
statistics. So when calculating variance
, nonzeros
and mean
can be included with no
extra calculation time, and when calculating mean
, adding nonzeros
will take no extra time.
Examples
set.seed(12345)
mat <- matrix(rpois(100, lambda = 5), nrow = 10)
rownames(mat) <- paste0("gene", 1:10)
colnames(mat) <- paste0("cell", 1:10)
mat <- mat %>% as("dgCMatrix") %>% as("IterableMatrix")
groups <- rep(c("Cluster1", "Cluster2"), each = 5)
## When calculating only sum across two groups
pseudobulk_res <- pseudobulk_matrix(
mat = mat,
cell_groups = groups,
method = "sum"
)
pseudobulk_res
#> Cluster1 Cluster2
#> gene1 26 38
#> gene2 19 27
#> gene3 32 21
#> gene4 27 19
#> gene5 22 27
#> gene6 20 23
#> gene7 24 37
#> gene8 24 22
#> gene9 20 23
#> gene10 34 21
## Can also request multiple summary statistics for pseudoulking
pseudobulk_res_multi <- pseudobulk_matrix(
mat = mat,
cell_groups = groups,
method = c("mean", "variance")
)
names(pseudobulk_res_multi)
#> [1] "mean" "variance"
pseudobulk_res_multi$mean
#> Cluster1 Cluster2
#> gene1 5.2 7.6
#> gene2 3.8 5.4
#> gene3 6.4 4.2
#> gene4 5.4 3.8
#> gene5 4.4 5.4
#> gene6 4.0 4.6
#> gene7 4.8 7.4
#> gene8 4.8 4.4
#> gene9 4.0 4.6
#> gene10 6.8 4.2