Export pseudobulk bed files as input for MACS, then run MACS and read the output peaks as a tibble. Each step can can be run independently, allowing for quickly re-loading the results of an already completed call, or running MACS externally (e.g. via cluster job submisison) for increased parallelization. See details for more information.
Usage
call_peaks_macs(
fragments,
path,
cell_groups = rlang::rep_along(cellNames(fragments), "all"),
effective_genome_size = 2.9e+09,
insertion_mode = c("both", "start_only", "end_only"),
step = c("all", "prep-inputs", "run-macs", "read-outputs"),
macs_executable = NULL,
additional_params =
"--call-summits --keep-dup all --shift -75 --extsize 150 --nomodel --nolambda",
verbose = FALSE,
threads = 1
)
Arguments
- fragments
IterableFragments object
- path
(string) Parent directory to store MACS inputs and outputs. Inputs are stored in
<path>/input/
and outputs in<path>/output/<group>/
. See "File format" in details- cell_groups
Grouping vector with one entry per cell in fragments, e.g. cluster IDs
- effective_genome_size
(numeric) Effective genome size for MACS. Default is
2.9e9
following MACS default for GRCh38. See deeptools for values for other common genomes.- insertion_mode
(string) Which fragment ends to use for coverage calculation. One of
both
,start_only
, orend_only
.- step
(string) Which step to run. One of
all
,prep-inputs
,run-macs
,read-outputs
. Ifprep-inputs
, create the input bed files for macs, and provides a shell script per cell group with the command to run macs. Ifrun-macs
, also run bash scripts to execute macs. Ifread-outputs
, read the outputs into tibbles.- macs_executable
(string) Path to either MACS2/3 executable. Default (
NULL
) will autodetect from PATH.- additional_params
(string) Additional parameters to pass to MACS2/3.
- verbose
(bool) Whether to provide verbose output from MACS. Only used if step is
run-macs
orall
.- threads
(int) Number of threads to use.
Value
If step is
prep-inputs
, return script paths for each cell group given as a character vector.If step is
run-macs
, returnNULL
.If step is
read-outputs
orall
, returns a tibble with all the peaks from each cell group concatenated. Columnns arechr
,start
,end
,group
,name
,score
,strand
,fold_enrichment
,log10_pvalue
,log10_qvalue
,summit_offset
Details
File format:
Inputs are written such that a bed file used as input into MACS, as well as a shell file containing a call to MACS are written for each cell group.
Bed files containing
chr
,start
, andend
coordinates of insertions are written at<path>/input/<group>.bed.gz
.Shell commands to run MACS manually are written at
<path>/input/<group>.sh
.
Outputs are written to an output directory with a subdirectory for each cell group. Each cell group's output directory contains a file for narrowPeaks, peaks, and summits.
NarrowPeaks are written at
<path>/output/<group>/<group>_peaks.narrowPeak
.Peaks are written at
<path>/output/<group>/<group>_peaks.xls
.Summits are written at
<path>/output/<group>/<group>_summits.bed
.
Only the narrowPeaks file is read into a tibble and returned. For more information on outputs from MACS, visit the MACS docs
Performance:
Running on a 2600 cell dataset and taking both start and end insertions into account, written input bedfiles and MACS outputs used 364 MB and 158 MB of space respectively. With 4 threads, running this function end to end took 74 seconds, with 61 of those seconds spent on running MACS.
Running MACS manually:
To run MACS manually, you will first run call_peaks_macs()
with step="prep-inputs
. Then, manually run all of the
shell scripts generated at <path>/input/<group>.sh
. Finally, run call_peaks_macs()
again with the same original arguments, but
setting step="read-outputs"
.