Read/write AnnData matrix — open_matrix_anndata

Read or write a matrix from an anndata hdf5 file. These functions will automatically transpose matrices when converting to/from the AnnData format. This is because the AnnData convention stores cells as rows, whereas the R convention stores cells as columns. If this behavior is undesired, call t() manually on the matrix inputs and outputs of these functions.

Most users writing to AnnData files should default to write_matrix_anndata_hdf5() rather than the dense variant (see details for more information).

Usage

open_matrix_anndata_hdf5(path, group = "X", buffer_size = 16384L)

write_matrix_anndata_hdf5(
  mat,
  path,
  group = "X",
  buffer_size = 16384L,
  chunk_size = 1024L,
  gzip_level = 0L
)

write_matrix_anndata_hdf5_dense(
  mat,
  path,
  dataset = "X",
  buffer_size = 16384L,
  chunk_size = 1024L,
  gzip_level = 0L
)

Arguments

path: Path to the hdf5 file on disk
group: The group within the hdf5 file to write the data to. If writing to an existing hdf5 file this group must not already be in use
buffer_size: For performance tuning only. The number of items to be buffered in memory before calling writes to disk.
chunk_size: For performance tuning only. The chunk size used for the HDF5 array storage.
gzip_level: Gzip compression level. Default is 0 (no compression)
dataset: The dataset within the hdf5 file to write the matrix to. Used for write_matrix_anndata_hdf5_dense

Value

AnnDataMatrixH5 object, with cells as the columns.

Details

Efficiency considerations: Reading from a dense AnnData matrix will generally be slower than sparse for single cell datasets, so it is recommended to re-write any dense AnnData inputs to a sparse format early in processing.

write_matrix_anndata_hdf5() should be used by default, as it always writes in the more efficient sparse format. write_matrix_anndata_hdf5_dense() writes in the AnnData dense format, and can be used for smaller matrices when efficiency and file size are less of a concern than increased portability (e.g. writing to obsm or varm matrices). See the AnnData docs for format details.

Dimension names: Dimnames are inferred from obs/_index or var/_index based on length matching. This helps to infer dimnames for obsp, varm, etc. If the number of len(obs) == len(var), dimname inference will be disabled.

Examples

## Create temporary directory to keep demo matrix
data_dir <- file.path(tempdir(), "mat_anndata")
if (dir.exists(data_dir)) unlink(data_dir, recursive = TRUE)
dir.create(data_dir, recursive = TRUE, showWarnings = FALSE)
mat <- get_demo_mat()


#######################################################################
## write_matrix_anndata_hdf5() example
#######################################################################
mat <- write_matrix_anndata_hdf5(
 mat,
 file.path(data_dir, paste0("new_demo_mat.h5"))
)
mat
#> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5
#> 
#> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#> 
#> Data type: uint32_t
#> Storage order: column major
#> 
#> Queued Operations:
#> 1. AnnData HDF5 matrix in file /tmp/RtmpsGFdDm/mat_anndata/new_demo_mat.h5, group X


#######################################################################
## open_matrix_anndata_hdf5() example
#######################################################################
mat <- open_matrix_anndata_hdf5(
 file.path(data_dir, paste0("new_demo_mat.h5"))
)
mat
#> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5
#> 
#> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#> 
#> Data type: uint32_t
#> Storage order: column major
#> 
#> Queued Operations:
#> 1. AnnData HDF5 matrix in file /tmp/RtmpsGFdDm/mat_anndata/new_demo_mat.h5, group X


#######################################################################
## write_matrix_anndata_hdf5_dense() example
#######################################################################
mat <- write_matrix_anndata_hdf5_dense(
 mat,
 file.path(data_dir, paste0("new_demo_mat_dense.h5"))
)
mat
#> 3582 x 2600 IterableMatrix object with class AnnDataMatrixH5
#> 
#> Row names: ENSG00000272602, ENSG00000250312 ... ENSG00000255512
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#> 
#> Data type: uint32_t
#> Storage order: column major
#> 
#> Queued Operations:
#> 1. AnnData HDF5 matrix in file /tmp/RtmpsGFdDm/mat_anndata/new_demo_mat_dense.h5, group X