bpcells.experimental.pseudobulk_insertion_counts#

bpcells.experimental.pseudobulk_insertion_counts(fragments: str, regions: DataFrame, cell_groups: Sequence[int], bin_size: int = 1) ndarray[source]#

Calculate a pseudobulk coverage matrix

Coverage is calculated as the number of start/end coordinates falling into a given position bin.

Parameters:
  • fragments (str) – Path to BPCells fragments directory

  • regions (pandas.DataFrame) – Pandas dataframe with columns (chrom, start, end) representing genomic ranges (0-based, end-exclusive like BED format). All regions must be the same size. chrom should be a string column; start/end should be numeric.

  • cell_groups (list[int]) – List of pseudbulk groupings as created by build_cell_groups()

  • bin_size (int) – Size for bins within each region given in basepairs. If the region width is not an even multiple of resolution_bp, then the last region may be truncated.

Returns:

Numpy array with dimensions (region, psudobulks, position) and type numpy.int32

Return type:

numpy.ndarray