dotools_py.pp.find_doublets

Contents

dotools_py.pp.find_doublets#

dotools_py.pp.find_doublets(adata, batch_key=None, cluster_key=None, doublet_rate=None, scdblfinder_metric='logloss', method='scDblFinder', ovrlpy_keys=None, ovrlpy_report_path=None, random_state=0)[source]#

Detect doublets in scRNAseq and iST.

Detect doublets in sc/snRNA-seq and imaged-based spatial transcriptomics (iST). For the iST, vertical doublets will be detected (i.e., regions where doublets are detected over the Z axis).

Note

For iST, a report will be generated but no vertical doublets will be removed.

Parameters:
adata AnnData | DataFrame

Annotated data matrix or a pandas DataFrame if method is set to Ovrlpy.

batch_key str | None (default: None)

Column in adata.obs with batch information. If omitted, doublets will be searched for with all cells together. If given, doublets will be searched for independently for each sample, which is preferable if they represent different captures.

cluster_key str | bool | None (default: None)

Column in adata.obs with clustering information. This is used to make doublets more efficiently. Alternatively, if cluster_key=True, fast clustering will be performed. If cluster_key is None or False, purely random artificial doublets will be generated.

doublet_rate float (default: None)

The expected doublet rate, i.e. the proportion of the cells expected to be doublets. If omitted, will be calculated automatically for scDblFinder and will be set to 0.05 for Scrublet.

scdblfinder_metric Literal['merror', 'logloss', 'auc', 'aucpr'] (default: 'logloss')

Error metric to optimize during training (e.g. ‘merror’, ‘logloss’, ‘auc’, ‘aucpr’).

method Literal['scDblFinder', 'DoubletDetection', 'Scrublet', 'Ovrlpy'] (default: 'scDblFinder')

Library to use for detecting doublets. For scRNA-seq data the available methods are: scDblFinder, DoubletDetection, and Scrublet. For Spatial Transcriptomics at single cell resolution, like Xenium the available methods are: Ovrlpy (Allow the detection of vertical doublets in image based ST).

ovrlpy_keys Dict (default: None)

Dictionary with the following keys: gene_key, x_key, y_key and z_key indicating the name of the column in the dataframe with the gene names and the x, y and z coordinate.

ovrlpy_report_path str | PathLike[str] | Path (default: None)

Directory where the quality control plots and the ovrlpy object will be saved.

random_state int (default: 0)

Seed for random number generator

Return type:

None

Returns:

None

Returns None. Sets the following fields:

adata.obs['doublet_class']pandas.Series (dtype str)

Class indicating predicted doublet status

adata.obs['doublet_score']pandas.Series (dtype float)

Doublet scores for each observed transcriptome

Examples

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> find_doublets(adata, batch_key="batch", method="scDblFinder")
>>> adata.obs[["doublet_class", "doublet_score"]].head()
                              doublet_class doublet_score
CAAAGAATCAGATTGC-1-batch2       singlet      0.692706
AGCTTCCCAGTCAACT-1-batch1       singlet      0.014858
GAGAGGTTCCCTCTAG-1-batch1       singlet      0.172094
CTAACTTCAGATCATC-1-batch1       singlet      0.092695
CATGGTACAAACGGCA-1-batch1       singlet      0.237514