Call peaks from tiles — call_peaks

Calling peaks from a pre-set list of tiles can be much faster than using dedicated peak-calling software like macs3. The resulting peaks are less precise in terms of exact coordinates, but should be sufficient for most analyses.

Usage

call_peaks_tile(
  fragments,
  chromosome_sizes,
  cell_groups = rep.int("all", length(cellNames(fragments))),
  effective_genome_size = NULL,
  peak_width = 200,
  peak_tiling = 3,
  fdr_cutoff = 0.01,
  merge_peaks = c("all", "group", "none")
)

Arguments

fragments

IterableFragments object

chromosome_sizes

Chromosome start and end coordinates given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

chr, start, end: genomic position

See read_ucsc_chrom_sizes().

cell_groups

Grouping vector with one entry per cell in fragments, e.g. cluster IDs

effective_genome_size

(Optional) effective genome size for poisson background rate estimation. See deeptools for values for common genomes. Defaults to sum of chromosome sizes, which overestimates peak significance

peak_width

Width of candidate peaks

peak_tiling

Number of candidate peaks overlapping each base of genome. E.g. peak_width = 300 and peak_tiling = 3 results in candidate peaks of 300bp spaced 100bp apart

fdr_cutoff

Adjusted p-value significance cutoff

merge_peaks

How to merge significant peaks with merge_peaks_iterative()

"all" Merge the full set of peaks
"group" Merge peaks within each group
"none" Don't perform any merging

Value

tibble with peak calls and the following columns:

chr, start, end: genome coordinates
group: group ID that this peak was identified in
p_val, q_val: Poission p-value and BH-corrected p-value
enrichment: Enrichment of counts in this peak compared to a genome-wide background

Details

Peak calling steps:

Estimate the genome-wide expected insertions per tile based on peak_width, effective_genome_size, and per-group read counts
Tile the genome with nonoverlapping tiles of size peak_width
For each tile and group, calculate p_value based on a Poisson model
Compute adjusted p-values using BH method and using the total number of tiles as the number of hypotheses tested.
Repeat steps 2-4 peak_tiling times, with evenly spaced offsets
If merge_peaks is "all" or "group": use merge_peaks_iterative() within each group to keep only the most significant of the overlapping candidate peaks
If merge_peaks is "all", perform a final round of merge_peaks_iterative(), prioritizing each peak by its within-group significance rank

Examples

## Prep data
reference_dir <- file.path(tempdir(), "references")
frags <- get_demo_frags() 
## Remove blacklist regions from fragments
blacklist <- read_encode_blacklist(reference_dir, genome="hg38")
frags_filter_blacklist <- select_regions(frags, blacklist, invert_selection = TRUE)
chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome="hg38") %>% dplyr::filter(chr %in% c("chr4", "chr11"))


## Call peaks
call_peaks_tile(frags_filter_blacklist, chrom_sizes, effective_genome_size = 2.8e9)
#> # A tibble: 73,160 × 7
#>    chr       start       end group p_val q_val enrichment
#>    <fct>     <int>     <int> <chr> <dbl> <dbl>      <dbl>
#>  1 chr11  65615400  65615600 all       0     0      6764.
#>  2 chr4    2262266   2262466 all       0     0      6422.
#>  3 chr11 119057200 119057400 all       0     0      6188.
#>  4 chr11    695133    695333 all       0     0      6180.
#>  5 chr11   2400400   2400600 all       0     0      6166.
#>  6 chr4    1346933   1347133 all       0     0      6109.
#>  7 chr11   3797600   3797800 all       0     0      6017.
#>  8 chr11  64878600  64878800 all       0     0      5948.
#>  9 chr11  57667733  57667933 all       0     0      5946.
#> 10 chr11  83156933  83157133 all       0     0      5913.
#> # ℹ 73,150 more rows