Skip to contents

Normalize an object representing genomic ranges

Usage

normalize_ranges(
  ranges,
  metadata_cols = character(0),
  zero_based_coords = !is(ranges, "GRanges"),
  n = 1
)

Arguments

ranges

Genomic regions given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

metadata_cols

Optional list of metadata columns to require & extract

zero_based_coords

If true, coordinates start and 0 and the end coordinate is not included in the range. If false, coordinates start at 1 and the end coordinate is included in the range

Value

data frame with zero-based coordinates, and elements chr (factor), start (int), and end (int). If ranges does not have chr level information, chr levels are the sorted unique values of chr.

If strand is in metadata_cols, then the output strand element will be TRUE for positive strand, and FALSE for negative strand. (Converted from a character vector of "+"/"-" if necessary)

Examples

## Prep data
ranges <- GenomicRanges::GRanges(
  seqnames = S4Vectors::Rle(c("chr1", "chr2", "chr3"), c(1, 2, 2)),
  ranges = IRanges::IRanges(101:105, end = 111:115, names = head(letters, 5)),
  strand = S4Vectors::Rle(GenomicRanges::strand(c("-", "+", "*")), c(1, 2, 2)),
  score = 1:5,
  GC = seq(1, 0, length=5))
ranges
#> GRanges object with 5 ranges and 2 metadata columns:
#>     seqnames    ranges strand |     score        GC
#>        <Rle> <IRanges>  <Rle> | <integer> <numeric>
#>   a     chr1   101-111      - |         1      1.00
#>   b     chr2   102-112      + |         2      0.75
#>   c     chr2   103-113      + |         3      0.50
#>   d     chr3   104-114      * |         4      0.25
#>   e     chr3   105-115      * |         5      0.00
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths


## Normalize ranges
normalize_ranges(ranges)
#> # A tibble: 5 × 3
#>   chr   start   end
#>   <fct> <int> <int>
#> 1 chr1    100   111
#> 2 chr2    101   112
#> 3 chr2    102   113
#> 4 chr3    103   114
#> 5 chr3    104   115


## With metadata information
normalize_ranges(ranges, metadata_cols = c("strand", "score", "GC"))
#> # A tibble: 5 × 6
#>   strand chr   start   end score    GC
#>   <lgl>  <fct> <int> <int> <int> <dbl>
#> 1 FALSE  chr1    100   111     1  1   
#> 2 TRUE   chr2    101   112     2  0.75
#> 3 TRUE   chr2    102   113     3  0.5 
#> 4 TRUE   chr3    103   114     4  0.25
#> 5 TRUE   chr3    104   115     5  0