Skip to contents

Transpose the storage order for a matrix

Usage

transpose_storage_order(
  matrix,
  outdir = tempfile("transpose"),
  tmpdir = tempdir(),
  load_bytes = 4194304L,
  sort_bytes = 1073741824L
)

Arguments

matrix

Input matrix

outdir

Directory to store the output

tmpdir

Temporary directory to use for intermediate storage

load_bytes

The minimum contiguous load size during the merge sort passes

sort_bytes

The amount of memory to allocate for re-sorting chunks of entries

Value

MatrixDir object with a copy of the input matrix, but the storage order flipped

Details

This re-sorts the entries of a matrix to change the storage order from row-major to col-major. For large matrices, this can be slow – around 2 minutes to transpose a 500k cell RNA-seq matrix The default load_bytes (4MiB) and sort_bytes (1GiB) parameters allow ~85GB of data to be sorted with two passes through the data, and ~7.3TB of data to be sorted in three passes through the data.

Examples

mat <- matrix(rnorm(50), nrow = 10, ncol = 5)
rownames(mat) <- paste0("gene", seq_len(10))
colnames(mat) <- paste0("cell", seq_len(5))
mat <- mat %>% as("dgCMatrix") %>% as("IterableMatrix")
mat
#> 10 x 5 IterableMatrix object with class Iterable_dgCMatrix_wrapper
#> 
#> Row names: gene1, gene2 ... gene10
#> Col names: cell1, cell2 ... cell5
#> 
#> Data type: double
#> Storage order: column major
#> 
#> Queued Operations:
#> 1. Load dgCMatrix from memory

## A regular transpose operation switches a user's rows and cols 
t(mat)
#> 5 x 10 IterableMatrix object with class Iterable_dgCMatrix_wrapper
#> 
#> Row names: cell1, cell2 ... cell5
#> Col names: gene1, gene2 ... gene10
#> 
#> Data type: double
#> Storage order: row major
#> 
#> Queued Operations:
#> 1. Load dgCMatrix from memory

## Running `transpose_storage_order()` instead changes whether the storage is in row-major or col-major,
## but does not switch the rows and cols
transpose_storage_order(mat)
#> 10 x 5 IterableMatrix object with class MatrixDir
#> 
#> Row names: gene1, gene2 ... gene10
#> Col names: cell1, cell2 ... cell5
#> 
#> Data type: double
#> Storage order: row major
#> 
#> Queued Operations:
#> 1. Load compressed matrix from directory /tmp/RtmpsGFdDm/transpose1588d5621564e