Calculate ArchR-compatible per-cell QC statistics
Arguments
- fragments
IterableFragments object
- genes
Gene coordinates given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic position
- blacklist
Blacklisted regions given as GRanges, data.frame, or list. See
help("genomic-ranges-like")
for details on format and coordinate systems. Required attributes:chr
,start
,end
: genomic position
Details
This implementation mimics ArchR's default parameters. For uses requiring more flexibility to tweak default parameters, the best option is to re-implement this function with required changes. Output columns of data.frame:
cellName
: cell name for each cellnFrags
: number of fragments per cellsubNucleosomal
,monoNucleosomal
,multiNucleosomal
: number of fragments of size 1-146bp, 147-254bp, and 255bp + respectively. equivalent to ArchR's nMonoFrags, nDiFrags, nMultiFrags respectivelyTSSEnrichment
:AvgInsertInTSS / max(AvgInsertFlankingTSS, 0.1)
, whereAvgInsertInTSS
isReadsInTSS / 101
(window size), andAvgInsertFlankingTSS
isReadsFlankingTSS / (100*2)
(window size). Themax(0.1)
ensures that very low-read cells do not get assigned spuriously high TSSEnrichment.ReadsInPromoter
: Number of reads from 2000bp upstream of TSS to 101bp downstream of TSSReadsInBlacklist
: Number of reads in the provided blacklist regionReadsInTSS
: Number of reads overlapping the 101bp centered around each TSSReadsFlankingTSS
: Number of reads overlapping 1901-2000bp +/- each TSS
Differences from ArchR: Note that ArchR by default uses a different set of annotations to derive TSS sites and promoter sites. This function uses just one annotation for gene start+end sites, so must be called twice to exactly re-calculate the ArchR QC stats.
ArchR's PromoterRatio
and BlacklistRatio
are not included in the output, as they can be easily calculated
from ReadsInPromoter / nFrags
and ReadsInBlacklist / nFrags
. Similarly, ArchR's NucleosomeRatio
can be calculated
as (monoNucleosomal + multiNucleosomal) / subNucleosomal
.
Examples
## Prep data
frags <- get_demo_frags(subset = FALSE)
reference_dir <- file.path(tempdir(), "references")
genes <- read_gencode_transcripts(
reference_dir,
release="42",
transcript_choice="MANE_Select",
annotation_set = "basic",
features="transcript"
)
blacklist <- read_encode_blacklist(reference_dir, genome = "hg38")
## Run qc
head(qc_scATAC(frags, genes, blacklist))
#> # A tibble: 6 × 10
#> cellName TSSEnrichment nFrags subNucleosomal monoNucleosomal multiNucleosomal
#> <chr> <dbl> <int> <int> <int> <int>
#> 1 TTTAGCAA… 45.1 16363 8069 5588 2706
#> 2 AGCCGGTT… 30.9 33313 15855 11868 5590
#> 3 TGATTAGT… 41.9 11908 6103 3817 1988
#> 4 ATTGACTC… 43.9 13075 6932 4141 2002
#> 5 CGTTAGGT… 31.5 14874 6833 5405 2636
#> 6 AAACCGCG… 41.9 30141 15085 10199 4857
#> # ℹ 4 more variables: ReadsInTSS <dbl>, ReadsFlankingTSS <dbl>,
#> # ReadsInPromoter <dbl>, ReadsInBlacklist <dbl>