Skip to contents

Calculate ranges x cells tile overlap matrix

Usage

tile_matrix(
  fragments,
  ranges,
  mode = c("insertions", "fragments"),
  zero_based_coords = !is(ranges, "GRanges"),
  explicit_tile_names = FALSE
)

Arguments

fragments

Input fragments object

ranges

Tiled regions given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

  • tile_width: Size of each tile in this region in basepairs

Must be non-overlapping and sorted by (chr, start), with chromosomes ordered according to the chromosome names of fragments

mode

Mode for counting tile overlaps. (See "value" section for more detail)

zero_based_coords

Whether to convert the ranges from a 1-based end-inclusive coordinate system to a 0-based end-exclusive coordinate system. Defaults to true for GRanges and false for other formats (see this archived UCSC blogpost)

explicit_tile_names

Boolean for whether to add rownames to the output matrix in format e.g chr1:500-1000, where start and end coords are given in a 0-based coordinate system. For whole-genome Tile matrices the names will take ~5 seconds to generate and take up 400MB of memory. Note that either way, tile names will be written when the matrix is saved.

Value

Iterable matrix object with dimension ranges x cells. When saved, the column names will be in the format chr1:500-1000, where start and end coords are given in a 0-based coordinate system.

mode options

  • "insertions": Start and end coordinates are separately overlapped with each tile

  • "fragments": Like "insertions", but each fragment can contribute at most 1 count to each tile, even if both the start and end coordinates overlap

Note

When calculating the matrix directly from a fragments tsv, it's necessary to first call select_chromosomes() in order to provide the ordering of chromosomes to expect while reading the tsv.

Examples

## Prep demo data
frags <- get_demo_frags(subset = FALSE)
chrom_sizes <- read_ucsc_chrom_sizes(file.path(tempdir(), "references"), genome="hg38")
blacklist <- read_encode_blacklist(file.path(tempdir(), "references"), genome="hg38")
frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE)
ranges <- tibble::tibble(
  chr = "chr4",
  start = 0,
  end = "190214555", 
  tile_width = 200
)


## Get tile matrix
tile_matrix(frags_filter_blacklist, ranges)
#> 951073 x 2600 IterableMatrix object with class TileMatrix
#> 
#> Row names: unknown names
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#> 
#> Data type: uint32_t
#> Storage order: row major
#> 
#> Queued Operations:
#> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered
#> 2. Subset to fragments not overlapping 636 ranges: chr10:1-45700 ... chrY:26637301-57227400
#> 3. Calculate 951073 tiles over 1 ranges: chr4:1-190214555 (200bp), chr4:1-190214555 (200bp)