Given a set of genomic ranges, find the distance to the nearest neighbors both upstream and downstream.
Usage
range_distance_to_nearest(
ranges,
addArchRBug = FALSE,
zero_based_coords = !is(ranges, "GRanges")
)
Arguments
- ranges
Genomic regions given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic positionstrand
: +/- or TRUE/FALSE for positive or negative strand
- addArchRBug
boolean to reproduce ArchR's bug that incorrectly handles nested genes
- zero_based_coords
If true, coordinates start and 0 and the end coordinate is not included in the range. If false, coordinates start at 1 and the end coordinate is included in the range
Value
A 2-column data.frame with columns upstream and downstream, containing
the distances to the nearest neighbor in the respective directions.
For ranges on +
or *
strand, distance is calculated as:
upstream =
max(start(range) - end(upstreamNeighbor), 0)
downstream =
max(start(downstreamNeighbor) - end(range), 0)
For ranges on -
strand, the definition of upstream and downstream is flipped.
Note that this definition of distance is one off from
GenomicRanges::distance()
, as ranges that neighbor but don't overlap are given
a distance of 1 rather than 0.
Examples
## Prep data
ranges <- tibble::tibble(
chr = "chr1",
start = seq(10, 410, 100),
end = start + 50,
strand = "+"
)
## Add one range that is completely nested in the other ranges
ranges_with_nesting <- ranges %>%
tibble::add_row(chr = "chr1", start = 11, end = 20, strand = "+")
## Get range distance to nearest
range_distance_to_nearest(ranges_with_nesting)
#> # A tibble: 6 × 2
#> upstream downstream
#> <dbl> <dbl>
#> 1 Inf 51
#> 2 51 51
#> 3 51 51
#> 4 51 51
#> 5 51 Inf
#> 6 0 0