Read a sparse matrix from a MatrixMarket file. This is a text-based format used by 10x, Parse, and others to store sparse matrices. Format details on the NIST website.
Usage
import_matrix_market(
mtx_path,
outdir = tempfile("matrix_market"),
row_names = NULL,
col_names = NULL,
row_major = FALSE,
tmpdir = tempdir(),
load_bytes = 4194304L,
sort_bytes = 1073741824L
)
import_matrix_market_10x(
mtx_dir,
outdir = tempfile("matrix_market"),
feature_type = NULL,
row_major = FALSE,
tmpdir = tempdir(),
load_bytes = 4194304L,
sort_bytes = 1073741824L
)
Arguments
- mtx_path
Path of mtx or mtx.gz file
- outdir
Directory to store the output
- row_names
Character vector of row names
- col_names
Character vector of col names
- row_major
If true, store the matrix in row-major orientation
- tmpdir
Temporary directory to use for intermediate storage
- load_bytes
The minimum contiguous load size during the merge sort passes
- sort_bytes
The amount of memory to allocate for re-sorting chunks of entries
- mtx_dir
Directory holding matrix.mtx.gz, barcodes.tsv.gz, and features.tsv.gz
- feature_type
String or vector of feature types to include. (cellranger 3.0 and newer)
Details
Import MatrixMarket mtx files to the BPCells format. This implementation ensures fixed memory usage even for very large inputs by doing on-disk sorts. It will be much slower than hdf5 inputs, so only use MatrixMarket format when absolutely necessary.
As a rough speed estimate, importing the 17GB Parse
1M PBMC
DGE_1M_PBMC.mtx
file takes about 4 minutes and 1.3GB of RAM, producing a compressed output matrix of 1.5GB. mtx.gz
files will be slower to import due to gzip decompression.
When importing from 10x mtx files, the row and column names can be read automatically
using the import_matrix_market_10x()
convenience function.