dotools_py.pp.quality_control#
- dotools_py.pp.quality_control(adata, batch_key, min_genes_in_cell=300, min_cells_with_genes=5, cut_mt=5, n_reads=10000, min_counts=None, max_counts=None, min_genes=None, max_genes=None, low_quantile=None, high_quantile=None, remove_doublets=False, doublet_tool='scDblFinder', normalization_method='LogNormalisation', log_data=True, metrics_patterns=('mt-', ('rbs', 'rpl')), metrics_names=('mt', 'ribo'), random_state=0, technology='snrna')[source]#
Basic quality control for snRNA-seq / Spatial Transcriptomics.
For each sample in an AnnData object, several quality and filtering steps are applied:
Filter genes expressed in a low number of cells.
Filter cells with a low number of genes.
Filter cells with high mitochondrial content (recommended: 5% for scRNA, 3% for snRNA).
- Filter cells based on nUMI and features using two modes:
Absolute filtering: Sets absolute values for min/max UMI and features.
Quantile filtering: Filters top/lower quantiles.
Remove doublets using scDblFinder, Scrublet, or DoubletDetection.
Note
This function reproduces the quality control steps of
dotools_py.pp.importer_py()but allows to provide an AnnData object as input. This function assumes thatadata.Xcontains raw counts.Warning
Depending on the
technologysome steps will be omitted or adapted.- Parameters:
- adata
AnnData Annotated data matrix with raw counts in
adata.X.- batch_key
str Column in
adata.obswith sample information.- min_genes_in_cell
int(default:300) Minimum number of genes per cell.
- min_cells_with_genes
int(default:5) Minimum number of cells expressing a gene.
- cut_mt
int(default:5) Maximum percentage of mitochondrial genes per cell.
- n_reads
int(default:10000) Target sum after normalization per cell.
- min_counts
int|None(default:None) Minimum number of counts per cell.
- max_counts
int|None(default:None) Maximum number of counts per cell.
- min_genes
int|None(default:None) Minimum number of genes per cell.
- max_genes
int|None(default:None) Maximum number of genes per cell.
- low_quantile
int|None(default:None) Low quantile to filter cells based on counts.
- high_quantile
int|None(default:None) Upper quantile to filter cells based on counts.
- remove_doublets
bool(default:False) Identify and remove doublets.
- doublet_tool
Literal['scDblFinder','DoubletDetection','Scrublet'] (default:'scDblFinder') Method to use for the removal of doublets.
- normalization_method
Literal['LogNormalisation','PearsonResiduals'] (default:'LogNormalisation') Normalization method to use.
- log_data
bool(default:True) Whether to log the data after normalization or not.
- metrics_patterns
tuple(default:('mt-', ('rbs', 'rpl'))) Patterns to use to annotate features. Use
mt-for mitochondrial,rpsandrplfor ribosomal, and^hb*-for hemoglobin. Should be written in lowercase.- metrics_names
list(default:('mt', 'ribo')) Name for the patterns use “mt” for mitochondrial, “ribo” for ribosomal and “hb” for hemoglobin.
- technology
Literal['snrna','scrna','visium','xenium'] (default:'snrna') Type of the input dataset.
- random_state
int(default:0) Seed for random number generator,
- adata
- Return type:
- Returns:
Returns a processed AnnData object.