Skip to contents

Gene activity scores can be calculated as a distance-weighted sum of per-tile accessibility. The tile weights for each gene can be represented as a sparse matrix of dimension genes x tiles. If we multiply this weight matrix by a corresponding tile matrix (tiles x cells), then we can get a gene activity score matrix of genes x cells. gene_score_weights_archr() calculates the weight matrix (best if you have a pre-computed tile matrix), while gene_score_archr() provides a easy-to-use wrapper.

Usage

gene_score_weights_archr(
  genes,
  chromosome_sizes,
  blacklist = NULL,
  tile_width = 500,
  gene_name_column = "gene_id",
  addArchRBug = FALSE
)

gene_score_archr(
  fragments,
  genes,
  chromosome_sizes,
  blacklist = NULL,
  tile_width = 500,
  gene_name_column = "gene_id",
  addArchRBug = FALSE,
  tile_max_count = 4,
  scale_factor = 10000,
  tile_matrix_path = tempfile(pattern = "gene_score_tile_mat")
)

Arguments

genes

Gene coordinates given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

  • strand: +/- or TRUE/FALSE for positive or negative strand

chromosome_sizes

Chromosome start and end coordinates given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

See read_ucsc_chrom_sizes().

blacklist

Regions to exclude from calculations, given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

tile_width

Size of tiles to consider

gene_name_column

If not NULL, a column name of genes to use as row names

addArchRBug

Replicate ArchR bug in handling nested genes

fragments

Input fragments object

tile_max_count

Maximum value in the tile counts matrix. If not null, tile counts higher than this will be clipped to tile_max_count. Equivalent to ceiling argument of ArchR::addGeneScoreMatrix()

scale_factor

If not null, counts for each cell will be scaled to sum to scale_factor. Equivalent to scaleTo argument of ArchR::addGeneScoreMatrix()

tile_matrix_path

Path of a directory where the intermediate tile matrix will be saved

Value

gene_score_weights_archr

Weight matrix of dimension genes x tiles

gene_score_archr

Gene score matrix of dimension genes x cells.

Details

gene_score_weights_archr:

Given a set of tile coordinates and distances returned by gene_score_tiles_archr(), calculate a weight matrix of dimensions genes x tiles. This matrix can be multiplied with a tile matrix to obtain ArchR-compatible gene activity scores.

Examples

## Prep data
reference_dir <- file.path(tempdir(), "references")
frags <- get_demo_frags()
genes <- read_gencode_genes(
  reference_dir,
  release="42", 
  annotation_set = "basic", 
) %>% dplyr::filter(chr %in% c("chr4", "chr11"))
blacklist <- read_encode_blacklist(reference_dir, genome="hg38") %>% dplyr::filter(chr %in% c("chr4", "chr11"))
chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome="hg38") %>% dplyr::filter(chr %in% c("chr4", "chr11"))
chrom_sizes$tile_width = 500


#######################################################################
## gene_score_weights_archr() example
#######################################################################
## Get gene score weight matrix (genes x tiles)
gene_score_weights <- gene_score_weights_archr(
    genes, chrom_sizes, blacklist
)

## Get tile matrix (tiles x cells)
tiles <- tile_matrix(frags, chrom_sizes, mode = "fragments")

## Get gene scores per cell 
gene_score_weights %*% tiles
#> 3849 x 2600 IterableMatrix object with class MatrixMultiply
#> 
#> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#> 
#> Data type: double
#> Storage order: row major
#> 
#> Queued Operations:
#> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * ConvertMatrixType (2600x650604)


#######################################################################
## gene_score_archr() example
#######################################################################
## This is a wrapper that creates both the gene score weight 
## matrix and tile matrix together
gene_score_archr(frags, genes, chrom_sizes, blacklist)
#> 3849 x 2600 IterableMatrix object with class TransformScaleShift
#> 
#> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#> 
#> Data type: double
#> Storage order: row major
#> 
#> Queued Operations:
#> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * TransformMin (2600x650604)
#> 2. Scale columns by 0.917, 0.495 ... 8.53