Skip to contents

ArchR-style gene activity scores are based on a weighted sum of each tile according to the signed distance from the tile to a gene body. This function calculates the signed distances according to ArchR's default parameters.

Usage

gene_score_tiles_archr(
  genes,
  chromosome_sizes = NULL,
  tile_width = 500,
  addArchRBug = FALSE
)

Arguments

genes

Gene coordinates given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

  • strand: +/- or TRUE/FALSE for positive or negative strand

chromosome_sizes

(optional) Size of chromosomes as a genomic-ranges object

tile_width

Size of tiles to consider

addArchRBug

Replicate ArchR bug in handling nested genes

Value

Tibble with one range per tile, with additional metadata columns gene_idx (row index of the gene this tile corresponds to) and distance.

Distance is a signed distance calculated such that if the tile has a smaller start coordinate than the gene and the gene is on the + strand, distance will be negative. The distance of adjacent but non-overlapping regions is 1bp, counting up from there.

Details

ArchR's tile distance algorithm works as follows

  1. Genes are extended 5kb upstream

  2. Genes are linked to any tiles 1kb-100kb upstream + downstream, but tiles beyond a neighboring gene are not considered

Examples

## Prep data
directory  <- file.path(tempdir(), "references")
genes <- read_gencode_genes(
    directory,
    release = "42",
    annotation_set = "basic",    
)


## Get gene scores by tile
gene_score_tiles_archr(
    genes
)
#> # A tibble: 6,900,314 × 5
#>    chr   start   end gene_idx distance
#>    <fct> <dbl> <dbl>    <int>    <dbl>
#>  1 chr1      0   500        1    -6369
#>  2 chr1    500  1000        1    -5869
#>  3 chr1   1000  1500        1    -5369
#>  4 chr1   1500  2000        1    -4869
#>  5 chr1   2000  2500        1    -4369
#>  6 chr1   2500  3000        1    -3869
#>  7 chr1   3000  3500        1    -3369
#>  8 chr1   3500  4000        1    -2869
#>  9 chr1   4000  4500        1    -2369
#> 10 chr1   4500  5000        1    -1869
#> # ℹ 6,900,304 more rows