Transpose the storage order for a matrix
Details
This re-sorts the entries of a matrix to change the storage order from row-major to col-major. For large matrices, this can be slow – around 2 minutes to transpose a 500k cell RNA-seq matrix The default load_bytes (4MiB) and sort_bytes (1GiB) parameters allow ~85GB of data to be sorted with two passes through the data, and ~7.3TB of data to be sorted in three passes through the data.
Examples
mat <- matrix(rnorm(50), nrow = 10, ncol = 5)
rownames(mat) <- paste0("gene", seq_len(10))
colnames(mat) <- paste0("cell", seq_len(5))
mat <- mat %>% as("dgCMatrix") %>% as("IterableMatrix")
mat
#> 10 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper
#>
#> Row names: gene1, gene2 ... gene10
#> Col names: cell1, cell2 ... cell5
#>
#> Data type: double
#> Storage order: column major
#>
#> Queued Operations:
#> 1. Load dgCMatrix from memory
## A regular transpose operation switches a user's rows and cols
t(mat)
#> 5 x 10 IterableMatrix object with class Iterable_dgCMatrix_wrapper
#>
#> Row names: cell1, cell2 ... cell5
#> Col names: gene1, gene2 ... gene10
#>
#> Data type: double
#> Storage order: row major
#>
#> Queued Operations:
#> 1. Load dgCMatrix from memory
## Running `transpose_storage_order()` instead changes whether the storage is in row-major or col-major,
## but does not switch the rows and cols
transpose_storage_order(mat)
#> 10 x 5 IterableMatrix object with class MatrixDir
#>
#> Row names: gene1, gene2 ... gene10
#> Col names: cell1, cell2 ... cell5
#>
#> Data type: double
#> Storage order: row major
#>
#> Queued Operations:
#> 1. Load compressed matrix from directory /tmp/RtmpsGFdDm/transpose1588d5621564e