Import MatrixMarket files — import_matrix

Read a sparse matrix from a MatrixMarket file. This is a text-based format used by 10x, Parse, and others to store sparse matrices. Format details on the NIST website.

Usage

import_matrix_market(
  mtx_path,
  outdir = tempfile("matrix_market"),
  row_names = NULL,
  col_names = NULL,
  row_major = FALSE,
  tmpdir = tempdir(),
  load_bytes = 4194304L,
  sort_bytes = 1073741824L
)

import_matrix_market_10x(
  mtx_dir,
  outdir = tempfile("matrix_market"),
  feature_type = NULL,
  row_major = FALSE,
  tmpdir = tempdir(),
  load_bytes = 4194304L,
  sort_bytes = 1073741824L
)

Arguments

mtx_path: Path of mtx or mtx.gz file
outdir: Directory to store the output
row_names: Character vector of row names
col_names: Character vector of col names
row_major: If true, store the matrix in row-major orientation
tmpdir: Temporary directory to use for intermediate storage
load_bytes: The minimum contiguous load size during the merge sort passes
sort_bytes: The amount of memory to allocate for re-sorting chunks of entries
mtx_dir: Directory holding matrix.mtx.gz, barcodes.tsv.gz, and features.tsv.gz
feature_type: String or vector of feature types to include. (cellranger 3.0 and newer)

Value

MatrixDir object with the imported matrix

Details

Import MatrixMarket mtx files to the BPCells format. This implementation ensures fixed memory usage even for very large inputs by doing on-disk sorts. It will be much slower than hdf5 inputs, so only use MatrixMarket format when absolutely necessary.

As a rough speed estimate, importing the 17GB Parse 1M PBMC DGE_1M_PBMC.mtx file takes about 4 minutes and 1.3GB of RAM, producing a compressed output matrix of 1.5GB. mtx.gz files will be slower to import due to gzip decompression.

When importing from 10x mtx files, the row and column names can be read automatically using the import_matrix_market_10x() convenience function.