Calculate ranges x cells tile overlap matrix
Arguments
- fragments
Input fragments object
- ranges
Tiled regions given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic positiontile_width
: Size of each tile in this region in basepairs
Must be non-overlapping and sorted by (chr, start), with chromosomes ordered according to the chromosome names of
fragments
- mode
Mode for counting tile overlaps. (See "value" section for more detail)
- zero_based_coords
Whether to convert the ranges from a 1-based end-inclusive coordinate system to a 0-based end-exclusive coordinate system. Defaults to true for GRanges and false for other formats (see this archived UCSC blogpost)
- explicit_tile_names
Boolean for whether to add rownames to the output matrix in format e.g chr1:500-1000, where start and end coords are given in a 0-based coordinate system. For whole-genome Tile matrices the names will take ~5 seconds to generate and take up 400MB of memory. Note that either way, tile names will be written when the matrix is saved.
Value
Iterable matrix object with dimension ranges x cells. When saved, the column names will be in the format chr1:500-1000, where start and end coords are given in a 0-based coordinate system.
mode
options
"insertions"
: Start and end coordinates are separately overlapped with each tile"fragments"
: Like"insertions"
, but each fragment can contribute at most 1 count to each tile, even if both the start and end coordinates overlap
Note
When calculating the matrix directly from a fragments tsv, it's necessary to first call select_chromosomes()
in order to
provide the ordering of chromosomes to expect while reading the tsv.
Examples
## Prep demo data
frags <- get_demo_frags(subset = FALSE)
chrom_sizes <- read_ucsc_chrom_sizes(file.path(tempdir(), "references"), genome="hg38")
blacklist <- read_encode_blacklist(file.path(tempdir(), "references"), genome="hg38")
frags_filter_blacklist <- frags %>% select_regions(blacklist, invert_selection = TRUE)
ranges <- tibble::tibble(
chr = "chr4",
start = 0,
end = "190214555",
tile_width = 200
)
## Get tile matrix
tile_matrix(frags_filter_blacklist, ranges)
#> 951073 x 2600 IterableMatrix object with class TileMatrix
#>
#> Row names: unknown names
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#>
#> Data type: uint32_t
#> Storage order: row major
#>
#> Queued Operations:
#> 1. Read compressed fragments from directory /home/imman/.local/share/R/BPCells/demo_data/demo_frags_filtered
#> 2. Subset to fragments not overlapping 636 ranges: chr10:1-45700 ... chrY:26637301-57227400
#> 3. Calculate 951073 tiles over 1 ranges: chr4:1-190214555 (200bp), chr4:1-190214555 (200bp)