Calling peaks from a pre-set list of tiles can be much faster than using
dedicated peak-calling software like macs3
. The resulting peaks are less
precise in terms of exact coordinates, but should be sufficient for most
analyses.
Arguments
- fragments
IterableFragments object
- chromosome_sizes
Chromosome start and end coordinates given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic position
- cell_groups
Grouping vector with one entry per cell in fragments, e.g. cluster IDs
- effective_genome_size
(Optional) effective genome size for poisson background rate estimation. See deeptools for values for common genomes. Defaults to sum of chromosome sizes, which overestimates peak significance
- peak_width
Width of candidate peaks
- peak_tiling
Number of candidate peaks overlapping each base of genome. E.g. peak_width = 300 and peak_tiling = 3 results in candidate peaks of 300bp spaced 100bp apart
- fdr_cutoff
Adjusted p-value significance cutoff
- merge_peaks
How to merge significant peaks with
merge_peaks_iterative()
"all"
Merge the full set of peaks"group"
Merge peaks within each group"none"
Don't perform any merging
Value
tibble with peak calls and the following columns:
chr
,start
,end
: genome coordinatesgroup
: group ID that this peak was identified inp_val
,q_val
: Poission p-value and BH-corrected p-valueenrichment
: Enrichment of counts in this peak compared to a genome-wide background
Details
Peak calling steps:
Estimate the genome-wide expected insertions per tile based on
peak_width
,effective_genome_size
, and per-group read countsTile the genome with nonoverlapping tiles of size peak_width
For each tile and group, calculate p_value based on a Poisson model
Compute adjusted p-values using BH method and using the total number of tiles as the number of hypotheses tested.
Repeat steps 2-4
peak_tiling
times, with evenly spaced offsetsIf
merge_peaks
is "all" or "group": usemerge_peaks_iterative()
within each group to keep only the most significant of the overlapping candidate peaksIf
merge_peaks
is "all", perform a final round ofmerge_peaks_iterative()
, prioritizing each peak by its within-group significance rank
Examples
## Prep data
reference_dir <- file.path(tempdir(), "references")
frags <- get_demo_frags()
## Remove blacklist regions from fragments
blacklist <- read_encode_blacklist(reference_dir, genome="hg38")
frags_filter_blacklist <- select_regions(frags, blacklist, invert_selection = TRUE)
chrom_sizes <- read_ucsc_chrom_sizes(reference_dir, genome="hg38") %>% dplyr::filter(chr %in% c("chr4", "chr11"))
## Call peaks
call_peaks_tile(frags_filter_blacklist, chrom_sizes, effective_genome_size = 2.8e9)
#> # A tibble: 73,160 × 7
#> chr start end group p_val q_val enrichment
#> <fct> <int> <int> <chr> <dbl> <dbl> <dbl>
#> 1 chr11 65615400 65615600 all 0 0 6764.
#> 2 chr4 2262266 2262466 all 0 0 6422.
#> 3 chr11 119057200 119057400 all 0 0 6188.
#> 4 chr11 695133 695333 all 0 0 6180.
#> 5 chr11 2400400 2400600 all 0 0 6166.
#> 6 chr4 1346933 1347133 all 0 0 6109.
#> 7 chr11 3797600 3797800 all 0 0 6017.
#> 8 chr11 64878600 64878800 all 0 0 5948.
#> 9 chr11 57667733 57667933 all 0 0 5946.
#> 10 chr11 83156933 83157133 all 0 0 5913.
#> # ℹ 73,150 more rows