dotools_py.pp.quality_control

dotools_py.pp.quality_control#

dotools_py.pp.quality_control(adata, batch_key, min_genes_in_cell=300, min_cells_with_genes=5, cut_mt=5, n_reads=10000, min_counts=None, max_counts=None, min_genes=None, max_genes=None, low_quantile=None, high_quantile=None, remove_doublets=False, doublet_tool='scDblFinder', normalization_method='LogNormalisation', log_data=True, metrics_patterns=('mt-', ('rbs', 'rpl')), metrics_names=('mt', 'ribo'), random_state=0, technology='snrna')[source]#

Basic quality control for snRNA-seq / Spatial Transcriptomics.

For each sample in an AnnData object, several quality and filtering steps are applied:

Filter genes expressed in a low number of cells.
Filter cells with a low number of genes.
Filter cells with high mitochondrial content (recommended: 5% for scRNA, 3% for snRNA).
Filter cells based on nUMI and features using two modes:
1. Absolute filtering: Sets absolute values for min/max UMI and features.
2. Quantile filtering: Filters top/lower quantiles.
Remove doublets using scDblFinder, Scrublet, or DoubletDetection.

Note

This function reproduces the quality control steps of dotools_py.pp.importer_py() but allows to provide an AnnData object as input. This function assumes that adata.X contains raw counts.

Warning

Depending on the technology some steps will be omitted or adapted.

Parameters:

adata AnnData: Annotated data matrix with raw counts in adata.X.
batch_key str: Column in adata.obs with sample information.
min_genes_in_cell int (default: 300): Minimum number of genes per cell.
min_cells_with_genes int (default: 5): Minimum number of cells expressing a gene.
cut_mt int (default: 5): Maximum percentage of mitochondrial genes per cell.
n_reads int (default: 10000): Target sum after normalization per cell.
min_counts int | None (default: None): Minimum number of counts per cell.
max_counts int | None (default: None): Maximum number of counts per cell.
min_genes int | None (default: None): Minimum number of genes per cell.
max_genes int | None (default: None): Maximum number of genes per cell.
low_quantile int | None (default: None): Low quantile to filter cells based on counts.
high_quantile int | None (default: None): Upper quantile to filter cells based on counts.
remove_doublets bool (default: False): Identify and remove doublets.
doublet_tool Literal['scDblFinder', 'DoubletDetection', 'Scrublet'] (default: 'scDblFinder'): Method to use for the removal of doublets.
normalization_method Literal['LogNormalisation', 'PearsonResiduals'] (default: 'LogNormalisation'): Normalization method to use.
log_data bool (default: True): Whether to log the data after normalization or not.
metrics_patterns tuple (default: ('mt-', ('rbs', 'rpl'))): Patterns to use to annotate features. Use mt- for mitochondrial, rps and rpl for ribosomal, and ^hb*- for hemoglobin. Should be written in lowercase.
metrics_names list (default: ('mt', 'ribo')): Name for the patterns use “mt” for mitochondrial, “ribo” for ribosomal and “hb” for hemoglobin.
technology Literal['snrna', 'scrna', 'visium', 'xenium'] (default: 'snrna'): Type of the input dataset.
random_state int (default: 0): Seed for random number generator,

Return type:

AnnData

Returns:

Returns a processed AnnData object.

dotools_py.pp.quality_control

Contents

dotools_py.pp.quality_control#