---
jupytext:
  formats: ipynb,md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.15.1
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

# Matrix slicing tutorial

BPCells has prototype Python bindings that allow for matrix creation and slicing, with optional multithreaded reads.

## Performance estimates
On the scimilarity dataset of 15M human cells, compressed storage is 64GB (2.2 bytes/non-zero)

Read speeds for 10k random cells from the 15M human cells (range of 5 random tests)

| Storage location         | 1 thread     | 4 threads |
|--------------|-----------|------------|
| Memory | 2.8-4.7 seconds      | 1.0-1.1 seconds |
| Local SSD      | 4.5-4.9 seconds  | 1.5-1.7 seconds       |
| Networked FS (warm cache)  | 20-21 seconds  | 5.5-6.2 seconds       |
| Networked FS (cold cache) | 🙁 | 76-115 seconds |

+++

## Demo data setup

```{code-cell} ipython3
import bpcells.experimental

import os
import tempfile

import numpy as np
import scipy.sparse
```

```{code-cell} ipython3
tmp = tempfile.TemporaryDirectory()
os.chdir(tmp.name)
```

```{code-cell} ipython3
mat = scipy.sparse.csc_matrix(np.array([
    [1, 0, 4, 0],
    [0, 0, 5, 7],
    [2, 3, 6, 0]]
))
mat
```

```{code-cell} ipython3
mat.toarray()
```

## Basic usage from scipy.sparse

```{code-cell} ipython3
bp_mat = bpcells.experimental.DirMatrix.from_scipy_sparse(mat, "basic_mat")
bp_mat
```

Slicing the matrix returns a `scipy.sparse` matrix

```{code-cell} ipython3
bp_mat[:,:]
```

```{code-cell} ipython3
bp_mat[:,:].toarray()
```

We can use many of the same slicing options as standard numpy matrices

```{code-cell} ipython3
bp_mat[1:3, [0,2]].toarray()
```

```{code-cell} ipython3
bp_mat[[True, False, True], -2:].toarray()
```

We can also make a transposed view of the matrix similar to numpy. No work is done, we just switch between row-major and col-major representations

```{code-cell} ipython3
bp_mat.T
```

## Reopening the matrix later

+++

The matrix path has 13 files (for compressed integer matrices), which contain data and metadata

```{code-cell} ipython3
!ls -l basic_mat
```

```{code-cell} ipython3
bp_mat = bpcells.experimental.DirMatrix("basic_mat")
```

## Import from h5ad

```{code-cell} ipython3
import anndata
anndata.AnnData(mat).write("mat.h5ad")
bp_mat = bpcells.experimental.DirMatrix.from_h5ad("mat.h5ad", "basic_mat_from_h5ad")
bp_mat[:,:].toarray()
```

## Concatenate multiple matrices

+++

We can concatenate multiple matrices to a single file on disk with low memory usage. This allows importing many samples in parallel, then concatenating them together into a single matrix

```{code-cell} ipython3
bpcells.experimental.DirMatrix.from_hstack(
    [bp_mat, bp_mat], 
    "basic_mat_hstack"
)[:,:].toarray()
```

```{code-cell} ipython3
bpcells.experimental.DirMatrix.from_vstack(
    [bp_mat, bp_mat], 
    "basic_mat_vstack"
)[:,:].toarray()
```

## Multithreaded operation

For larger matrices, it can be desirable to perform matrix reading in a multi-threaded manner. When using multiple threads, BPCells will divide the matrix slice query into chunks that are loaded in parallel, then recombined in memory after all threads are completed.

When performing random slicing along the major storage axis, seek latency is the primary performance bottleneck. Setting a high number of threads (even above the actual core count of the machine) can help mitigate filesystem seek latency.

When slicing across the non-major storage axis, decompression speed can become the performance bottleneck. Setting threads to the number of available cores can help to parallelize the decompression speed. For cell-major RNA-seq matrices, each thread can process compressed input at a rate of about 1 GB/s, so filesystems with >1GB/s sequential read speeds will benefit from some parallelization.

```{code-cell} ipython3
bp_mat.threads = 8
bp_mat[:,:].toarray() # This will be performed with 8 threads now
```

## Compressed in-memory storage

For neural network training use-cases, fast slicing performance may be critical to avoid bottlenecking on data loads. In this case, BPCells supports loading the compressed data in memory, which eliminates seek latency while saving ~4x memory usage compared to an uncompressed scipy sparse matrix.

Loading can only be performed from an existing BPCells matrix directory, and the current version involves re-compressing the data in-memory at load time (avoidable, but a bit trickier to code so direct loading isn't implemented yet)

```{code-cell} ipython3
bp_mat_mem = bpcells.experimental.MemMatrix("basic_mat")
bp_mat_mem.threads = 8
bp_mat_mem[:,:].toarray()
```