dotools_py.pp.quality_control

dotools_py.pp.quality_control#

dotools_py.pp.quality_control(adata, batch_key, min_genes_in_cell=300, min_cells_with_genes=5, cut_mt=5, n_reads=10000, min_counts=None, max_counts=None, min_genes=None, max_genes=None, low_quantile=None, high_quantile=None, remove_doublets=False, doublet_tool='scDblFinder', normalization_method='LogNormalisation', log_data=True, metrics_patterns=('mt-', ('rbs', 'rpl')), metrics_names=('mt', 'ribo'), random_state=0, technology='snrna')[source]#

Basic quality control for snRNA-seq / Spatial Transcriptomics.

For each sample in an AnnData object, several quality and filtering steps are applied:

  • Filter genes expressed in a low number of cells.

  • Filter cells with a low number of genes.

  • Filter cells with high mitochondrial content (recommended: 5% for scRNA, 3% for snRNA).

  • Filter cells based on nUMI and features using two modes:
    1. Absolute filtering: Sets absolute values for min/max UMI and features.

    2. Quantile filtering: Filters top/lower quantiles.

  • Remove doublets using scDblFinder, Scrublet, or DoubletDetection.

Note

This function reproduces the quality control steps of dotools_py.pp.importer_py() but allows to provide an AnnData object as input. This function assumes that adata.X contains raw counts.

Warning

Depending on the technology some steps will be omitted or adapted.

Parameters:
adata AnnData

Annotated data matrix with raw counts in adata.X.

batch_key str

Column in adata.obs with sample information.

min_genes_in_cell int (default: 300)

Minimum number of genes per cell.

min_cells_with_genes int (default: 5)

Minimum number of cells expressing a gene.

cut_mt int (default: 5)

Maximum percentage of mitochondrial genes per cell.

n_reads int (default: 10000)

Target sum after normalization per cell.

min_counts int | None (default: None)

Minimum number of counts per cell.

max_counts int | None (default: None)

Maximum number of counts per cell.

min_genes int | None (default: None)

Minimum number of genes per cell.

max_genes int | None (default: None)

Maximum number of genes per cell.

low_quantile int | None (default: None)

Low quantile to filter cells based on counts.

high_quantile int | None (default: None)

Upper quantile to filter cells based on counts.

remove_doublets bool (default: False)

Identify and remove doublets.

doublet_tool Literal['scDblFinder', 'DoubletDetection', 'Scrublet'] (default: 'scDblFinder')

Method to use for the removal of doublets.

normalization_method Literal['LogNormalisation', 'PearsonResiduals'] (default: 'LogNormalisation')

Normalization method to use.

log_data bool (default: True)

Whether to log the data after normalization or not.

metrics_patterns tuple (default: ('mt-', ('rbs', 'rpl')))

Patterns to use to annotate features. Use mt- for mitochondrial, rps and rpl for ribosomal, and ^hb*- for hemoglobin. Should be written in lowercase.

metrics_names list (default: ('mt', 'ribo'))

Name for the patterns use “mt” for mitochondrial, “ribo” for ribosomal and “hb” for hemoglobin.

technology Literal['snrna', 'scrna', 'visium', 'xenium'] (default: 'snrna')

Type of the input dataset.

random_state int (default: 0)

Seed for random number generator,

Return type:

AnnData

Returns:

Returns a processed AnnData object.