Skip to contents

Calculate ArchR-compatible per-cell QC statistics

Usage

qc_scATAC(fragments, genes, blacklist)

Arguments

fragments

IterableFragments object

genes

Gene coordinates given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

blacklist

Blacklisted regions given as GRanges, data.frame, or list. See help("genomic-ranges-like") for details on format and coordinate systems. Required attributes:

  • chr, start, end: genomic position

Value

data.frame with QC data

Details

This implementation mimics ArchR's default parameters. For uses requiring more flexibility to tweak default parameters, the best option is to re-implement this function with required changes. Output columns of data.frame:

  • cellName: cell name for each cell

  • nFrags: number of fragments per cell

  • subNucleosomal, monoNucleosomal, multiNucleosomal: number of fragments of size 1-146bp, 147-254bp, and 255bp + respectively. equivalent to ArchR's nMonoFrags, nDiFrags, nMultiFrags respectively

  • TSSEnrichment: AvgInsertInTSS / max(AvgInsertFlankingTSS, 0.1), where AvgInsertInTSS is ReadsInTSS / 101 (window size), and AvgInsertFlankingTSS is ReadsFlankingTSS / (100*2) (window size). The max(0.1) ensures that very low-read cells do not get assigned spuriously high TSSEnrichment.

  • ReadsInPromoter: Number of reads from 2000bp upstream of TSS to 101bp downstream of TSS

  • ReadsInBlacklist: Number of reads in the provided blacklist region

  • ReadsInTSS: Number of reads overlapping the 101bp centered around each TSS

  • ReadsFlankingTSS: Number of reads overlapping 1901-2000bp +/- each TSS

Differences from ArchR: Note that ArchR by default uses a different set of annotations to derive TSS sites and promoter sites. This function uses just one annotation for gene start+end sites, so must be called twice to exactly re-calculate the ArchR QC stats.

ArchR's PromoterRatio and BlacklistRatio are not included in the output, as they can be easily calculated from ReadsInPromoter / nFrags and ReadsInBlacklist / nFrags. Similarly, ArchR's NucleosomeRatio can be calculated as (monoNucleosomal + multiNucleosomal) / subNucleosomal.

Examples

## Prep data
frags <- get_demo_frags(subset = FALSE)
reference_dir <- file.path(tempdir(), "references")
genes <- read_gencode_transcripts(
  reference_dir, 
  release="42", 
  transcript_choice="MANE_Select",
  annotation_set = "basic",
  features="transcript"
)
blacklist <- read_encode_blacklist(reference_dir, genome = "hg38")


## Run qc
head(qc_scATAC(frags, genes, blacklist))
#> # A tibble: 6 × 10
#>   cellName  TSSEnrichment nFrags subNucleosomal monoNucleosomal multiNucleosomal
#>   <chr>             <dbl>  <int>          <int>           <int>            <int>
#> 1 TTTAGCAA…          45.1  16363           8069            5588             2706
#> 2 AGCCGGTT…          30.9  33313          15855           11868             5590
#> 3 TGATTAGT…          41.9  11908           6103            3817             1988
#> 4 ATTGACTC…          43.9  13075           6932            4141             2002
#> 5 CGTTAGGT…          31.5  14874           6833            5405             2636
#> 6 AAACCGCG…          41.9  30141          15085           10199             4857
#> # ℹ 4 more variables: ReadsInTSS <dbl>, ReadsFlankingTSS <dbl>,
#> #   ReadsInPromoter <dbl>, ReadsInBlacklist <dbl>