Gene activity scores can be calculated as a distance-weighted sum of per-tile accessibility.
The tile weights for each gene can be represented as a sparse matrix of dimension genes x tiles.
If we multiply this weight matrix by a corresponding tile matrix (tiles x cells), then we can
get a gene activity score matrix of genes x cells. gene_score_weights_archr()
calculates the
weight matrix (best if you have a pre-computed tile matrix), while gene_score_archr()
provides
a easy-to-use wrapper.
Usage
gene_score_weights_archr(
genes,
chromosome_sizes,
blacklist = NULL,
tile_width = 500,
gene_name_column = "gene_id",
addArchRBug = FALSE
)
gene_score_archr(
fragments,
genes,
chromosome_sizes,
blacklist = NULL,
tile_width = 500,
gene_name_column = "gene_id",
addArchRBug = FALSE,
tile_max_count = 4,
scale_factor = 10000,
tile_matrix_path = tempfile(pattern = "gene_score_tile_mat")
)
Arguments
- genes
Gene coordinates given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic positionstrand
: +/- or TRUE/FALSE for positive or negative strand
- chromosome_sizes
Chromosome start and end coordinates given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic position
- blacklist
Regions to exclude from calculations, given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic position
- tile_width
Size of tiles to consider
- gene_name_column
If not NULL, a column name of
genes
to use as row names- addArchRBug
Replicate ArchR bug in handling nested genes
- fragments
Input fragments object
- tile_max_count
Maximum value in the tile counts matrix. If not null, tile counts higher than this will be clipped to
tile_max_count
. Equivalent toceiling
argument ofArchR::addGeneScoreMatrix()
- scale_factor
If not null, counts for each cell will be scaled to sum to
scale_factor
. Equivalent toscaleTo
argument ofArchR::addGeneScoreMatrix()
- tile_matrix_path
Path of a directory where the intermediate tile matrix will be saved
Value
gene_score_weights_archr
Weight matrix of dimension genes x tiles
gene_score_archr
Gene score matrix of dimension genes x cells.
Details
gene_score_weights_archr:
Given a set of tile coordinates and distances returned by gene_score_tiles_archr()
,
calculate a weight matrix of dimensions genes x tiles. This matrix can be
multiplied with a tile matrix to obtain ArchR-compatible gene activity scores.
Examples
## Prep data
reference_dir <- file.path(tempdir(), "references")
frags <- get_demo_frags()
genes <- read_gencode_genes(
reference_dir,
release="42",
annotation_set = "basic",
) %>% dplyr::filter(chr %in% c("chr4", "chr11"))
blacklist <- read_encode_blacklist(reference_dir, genome="hg38") %>% dplyr::filter(chr %in% c("chr4", "chr11"))
chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome="hg38") %>% dplyr::filter(chr %in% c("chr4", "chr11"))
chrom_sizes$tile_width = 500
#######################################################################
## gene_score_weights_archr() example
#######################################################################
## Get gene score weight matrix (genes x tiles)
gene_score_weights <- gene_score_weights_archr(
genes, chrom_sizes, blacklist
)
## Get tile matrix (tiles x cells)
tiles <- tile_matrix(frags, chrom_sizes, mode = "fragments")
## Get gene scores per cell
gene_score_weights %*% tiles
#> 3849 x 2600 IterableMatrix object with class MatrixMultiply
#>
#> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#>
#> Data type: double
#> Storage order: row major
#>
#> Queued Operations:
#> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * ConvertMatrixType (2600x650604)
#######################################################################
## gene_score_archr() example
#######################################################################
## This is a wrapper that creates both the gene score weight
## matrix and tile matrix together
gene_score_archr(frags, genes, chrom_sizes, blacklist)
#> 3849 x 2600 IterableMatrix object with class TransformScaleShift
#>
#> Row names: ENSG00000272602.6, ENSG00000289361.1 ... ENSG00000255512.2
#> Col names: TTTAGCAAGGTAGCTT-1, AGCCGGTTCCGGAACC-1 ... TACTAAGTCCAATAGC-1
#>
#> Data type: double
#> Storage order: row major
#>
#> Queued Operations:
#> 1. Multiply sparse matrices: Iterable_dgCMatrix_wrapper (650604x3849) * TransformMin (2600x650604)
#> 2. Scale columns by 0.917, 0.495 ... 8.53