dotools_py.pp.quality_control

dotools_py.pp.quality_control#

dotools_py.pp.quality_control(adata, batch_key=None, min_genes_in_cell=300, min_cells_with_genes=5, cut_mt=5, min_counts=None, max_counts=None, min_genes=None, max_genes=None, low_quantile=None, high_quantile=None, include_rbs=True, remove_doublets=False, doublet_tool='scDblFinder', metrics=True, qc_path=None, random_state=0)[source]#

Basic quality control for sc/snRNA-seq.

For each sample in an AnnData object, several quality and filtering steps are applied:

  • Filter genes expressed in a low number of cells.

  • Filter cells with a low number of genes.

  • Filter cells with high mitochondrial content (recommended: 5% for scRNA, 3% for snRNA).

  • Filter cells based on nUMI and features using two modes:
    1. Absolute filtering: Sets absolute values for min/max UMI and features.

    2. Quantile filtering: Filters top/lower quantiles.

  • Remove doublets using scDblFinder, Scrublet, or DoubletDetection.

An Excel sheet summarizing how many cells/genes were removed at each step will be generated, along with violin plots showing the distribution of total_counts, n_genes_by_counts, and pct_mt_content before and after QC.

Note

This function reproduces the quality control steps of dotools_py.pp.importer_py() but allows to provide an AnnData object as input. This function assumes that adata.X contains raw counts.

Parameters:
adata AnnData

Annotated data matrix with raw counts in adata.X.

batch_key str | None (default: None)

Column in adata.obs with sample information.

min_genes_in_cell int (default: 300)

Minimum number of genes per cell.

min_cells_with_genes int (default: 5)

Minimum number of cells expressing a gene.

cut_mt int (default: 5)

Maximum percentage of mitochondrial genes per cell.

min_counts int | None (default: None)

Minimum number of counts per cell.

max_counts int | None (default: None)

Maximum number of counts per cell.

min_genes int | None (default: None)

Minimum number of genes per cell.

max_genes int | None (default: None)

Maximum number of genes per cell.

low_quantile int | None (default: None)

Low quantile to filter cells based on counts.

high_quantile int | None (default: None)

Upper quantile to filter cells based on counts.

include_rbs bool (default: True)

Calculate statistics for ribosomal genes.

remove_doublets bool (default: False)

Identify and remove doublets.

doublet_tool Literal['scDblFinder', 'DoubletDetection', 'Scrublet'] (default: 'scDblFinder')

Method to use for the removal of doublets.

metrics bool (default: True)

Whether to compute statistics of how many cells and genes are remove in each step.

qc_path str | Path | None (default: None)

Directory where the quality control plots and metrics are saved.

random_state int (default: 0)

Seed for random number generator,

Return type:

AnnData

Returns:

Returns a processed AnnData object.