Skip to contents

Given a set of genomic ranges, find the distance to the nearest neighbors both upstream and downstream.

Usage

range_distance_to_nearest(
  ranges,
  addArchRBug = FALSE,
  zero_based_coords = !is(ranges, "GRanges")
)

Arguments

ranges

Genomic regions given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

  • strand: +/- or TRUE/FALSE for positive or negative strand

addArchRBug

boolean to reproduce ArchR's bug that incorrectly handles nested genes

zero_based_coords

If true, coordinates start and 0 and the end coordinate is not included in the range. If false, coordinates start at 1 and the end coordinate is included in the range

Value

A 2-column data.frame with columns upstream and downstream, containing the distances to the nearest neighbor in the respective directions. For ranges on + or * strand, distance is calculated as:

  • upstream = max(start(range) - end(upstreamNeighbor), 0)

  • downstream = max(start(downstreamNeighbor) - end(range), 0)

For ranges on - strand, the definition of upstream and downstream is flipped. Note that this definition of distance is one off from GenomicRanges::distance(), as ranges that neighbor but don't overlap are given a distance of 1 rather than 0.

Examples

## Prep data
ranges <- tibble::tibble(
 chr = "chr1",
 start = seq(10, 410, 100),
 end = start + 50,
 strand = "+"
)
## Add one range that is completely nested in the other ranges
ranges_with_nesting <- ranges %>% 
 tibble::add_row(chr = "chr1", start = 11, end = 20, strand = "+")


## Get range distance to nearest
range_distance_to_nearest(ranges_with_nesting)
#> # A tibble: 6 × 2
#>   upstream downstream
#>      <dbl>      <dbl>
#> 1      Inf         51
#> 2       51         51
#> 3       51         51
#> 4       51         51
#> 5       51        Inf
#> 6        0          0