Plot UMAP or embeddings — plot_embedding • BPCells

Plot one or more features by coloring cells in a UMAP plot.

Usage

plot_embedding(
  source,
  embedding,
  features = NULL,
  quantile_range = c(0.01, 0.99),
  randomize_order = TRUE,
  smooth = NULL,
  smooth_rounds = 3,
  gene_mapping = human_gene_mapping,
  size = NULL,
  rasterize = FALSE,
  raster_pixels = 512,
  legend_continuous = c("auto", "quantile", "value"),
  labels_quantile_range = TRUE,
  colors_continuous = c("lightgrey", "#4682B4"),
  legend_discrete = TRUE,
  labels_discrete = TRUE,
  colors_discrete = discrete_palette("stallion"),
  return_data = FALSE,
  return_plot_list = FALSE,
  apply_styling = TRUE
)

Arguments

source: Matrix, or data frame to pull features from, or a vector of feature values for a single feature. For a matrix, the features must be rows.
embedding: A matrix of dimensions cells x 2 with embedding coordinates
features: Character vector of features to plot if source is not a vector.
quantile_range: (optional) Length 2 vector giving the quantiles to clip the minimum and maximum color scale values, as fractions between 0 and 1. NULL or NA values to skip clipping
randomize_order: If TRUE, shuffle cells to prevent overplotting biases. Can pass an integer instead to specify a random seed to use.
smooth: (optional) Sparse matrix of dimensions cells x cells with cell-cell distance weights for smoothing.
smooth_rounds: Number of multiplication rounds to apply when smoothing.
gene_mapping: An optional vector for gene name matching with match_gene_symbol(). Ignored if source is a data frame.
size: Point size for plotting
rasterize: Whether to rasterize the point drawing to speed up display in graphics programs.
raster_pixels: Number of pixels to use when rasterizing. Can provide one number for square dimensions, or two numbers for width x height.
legend_continuous: Whether to label continuous features by quantile or value. "auto" labels by quantile only when all features are continuous and quantile_range is not NULL. Quantile labeling adds text annotation listing the range of displayed values.
labels_quantile_range: Whether to add a text label with the value range of each feature when the legend is set to quantile
colors_continuous: Vector of colors to use for continuous color palette
legend_discrete: Whether to show the legend for discrete (categorical) features.
labels_discrete: Whether to add text labels at the center of each group for discrete (categorical) features.
colors_discrete: Vector of colors to use for discrete (categorical) features.
return_data: If true, return data from just before plotting rather than a plot.
return_plot_list: If TRUE, return multiple plots as a list, rather than a single plot combined using patchwork::wrap_plots()
apply_styling: If false, return a plot without pretty styling applied

Value

By default, returns a ggplot2 object with all the requested features plotted in a grid. If return_data or return_plot_list is called, the return value will match that argument.

Details

Smoothing

Smoothing is performed as follows: first, the smoothing matrix is normalized so the sum of incoming weights to every cell is 1. Then, the raw data values are repeatedly multiplied by the smoothing matrix and re-scaled so the average value stays the same.

Examples

set.seed(123)
mat <- get_demo_mat()
## Normalize matrix
mat_norm <- log1p(multiply_cols(mat, 1/colSums(mat)) * 10000) %>% write_matrix_memory(compress = FALSE)
## Get variable genes
stats <- matrix_stats(mat, row_stats = "variance")
variable_genes <- order(stats$row_stats["variance",], decreasing=TRUE) %>% 
  head(1000) %>% 
  sort()
# Z score normalize genes
mat_norm <- mat[variable_genes, ]
gene_means <- stats$row_stats['mean', variable_genes]
gene_vars <- stats$row_stats['variance', variable_genes]
mat_norm <- (mat_norm - gene_means) / gene_vars
## Save matrix to memory
mat_norm <- mat_norm %>% write_matrix_memory(compress = FALSE)
## Run SVD
svd <- BPCells::svds(mat_norm, k = 10)
pca <- multiply_cols(svd$v, svd$d)
## Get UMAP
umap <- uwot::umap(pca)
## Get clusters
clusts <- knn_hnsw(pca, ef = 500) %>%
  knn_to_snn_graph() %>%
  cluster_graph_louvain()
#> 14:20:50 Building HNSW index with metric 'euclidean' ef = 200 M = 16 using 1 threads
#> 14:20:50 Finished building index
#> 14:20:50 Searching HNSW index with ef = 500 and 1 threads
#> 14:20:50 Finished searching


## Plot embedding
print(length(clusts))
#> [1] 2600

plot_embedding(clusts, umap)
#> Error in names(x) <- value: 'names' attribute [175] must be the same length as the vector [1]


### Can also plot by features
#plot_embedding(
#  source = mat,
#  umap,
#  features = c("MS4A1", "CD3E"),
#)