dotools_py.pp.find_doublets#
- dotools_py.pp.find_doublets(adata, batch_key=None, cluster_key=None, doublet_rate=None, scdblfinder_metric='logloss', method='scDblFinder', ovrlpy_keys=None, ovrlpy_report_path=None, random_state=0)[source]#
Detect doublets in scRNAseq and iST.
Detect doublets in sc/snRNA-seq and imaged-based spatial transcriptomics (iST). For the iST, vertical doublets will be detected (i.e., regions where doublets are detected over the Z axis).
Note
For iST, a report will be generated but no vertical doublets will be removed.
- Parameters:
- adata
AnnData|DataFrame Annotated data matrix or a pandas DataFrame if method is set to
Ovrlpy.- batch_key
str|None(default:None) Column in
adata.obswith batch information. If omitted, doublets will be searched for with all cells together. If given, doublets will be searched for independently for each sample, which is preferable if they represent different captures.- cluster_key
str|bool|None(default:None) Column in
adata.obswith clustering information. This is used to make doublets more efficiently. Alternatively, ifcluster_key=True, fast clustering will be performed. Ifcluster_keyis None or False, purely random artificial doublets will be generated.- doublet_rate
float(default:None) The expected doublet rate, i.e. the proportion of the cells expected to be doublets. If omitted, will be calculated automatically for scDblFinder and will be set to 0.05 for Scrublet.
- scdblfinder_metric
Literal['merror','logloss','auc','aucpr'] (default:'logloss') Error metric to optimize during training (e.g. ‘merror’, ‘logloss’, ‘auc’, ‘aucpr’).
- method
Literal['scDblFinder','DoubletDetection','Scrublet','Ovrlpy'] (default:'scDblFinder') Library to use for detecting doublets. For scRNA-seq data the available methods are: scDblFinder, DoubletDetection, and Scrublet. For Spatial Transcriptomics at single cell resolution, like Xenium the available methods are: Ovrlpy (Allow the detection of vertical doublets in image based ST).
- ovrlpy_keys
Dict(default:None) Dictionary with the following keys:
gene_key,x_key,y_keyandz_keyindicating the name of the column in the dataframe with the gene names and the x, y and z coordinate.- ovrlpy_report_path
str|PathLike[str] |Path(default:None) Directory where the quality control plots and the ovrlpy object will be saved.
- random_state
int(default:0) Seed for random number generator
- adata
- Return type:
- Returns:
- None
Returns
None. Sets the following fields:adata.obs['doublet_class']pandas.Series(dtypestr)Class indicating predicted doublet status
adata.obs['doublet_score']pandas.Series(dtypefloat)Doublet scores for each observed transcriptome
Examples
>>> import dotools_py as do >>> adata = do.dt.example_10x_processed() >>> find_doublets(adata, batch_key="batch", method="scDblFinder") >>> adata.obs[["doublet_class", "doublet_score"]].head() doublet_class doublet_score CAAAGAATCAGATTGC-1-batch2 singlet 0.692706 AGCTTCCCAGTCAACT-1-batch1 singlet 0.014858 GAGAGGTTCCCTCTAG-1-batch1 singlet 0.172094 CTAACTTCAGATCATC-1-batch1 singlet 0.092695 CATGGTACAAACGGCA-1-batch1 singlet 0.237514