dotools_py.tl.full_recluster#
- dotools_py.tl.full_recluster(adata, cluster_key, batch_key, recluster_approach, hvg_batch=False, use_rep=None, bbknn=False, resolution=0.3, neighbors_batch=3, majority=True, convert=True, key_added='annotation_fullrecluster')[source]#
Re-clustering of all clusters in dataset.
Perform reclustering on an integrated AnnData object over all clusters. Can recluster for the following integration methods: CCA (v4/v5) integration from Seurat; Harmony integration; BBKNN integration; SCVI integration, Scanorama integration and PCA. Assumes that
Xhas logcounts.Note
For CCA (v4/v5) and scVI the corrected expression matrix (CC4 v5), the CCA representation (CCA v5) and the latent space (scvi) to be in
.obsm. When re-clustering with harmony and BBKNN the pipeline will be re-run over the clusters.- Parameters:
- adata
AnnData Annotated data matrix.
- cluster_key
str Metadata column in
obswith cluster groups.- batch_key
str Metadata column in
obswith batch groups.- hvg_batch
bool(default:False) If set to
True. The highly variable genes that are shared across samples will be used.- recluster_approach
Literal['cca4','cca5','harmony','scanorama','pca','scvi'] Reclustering approach to use.
- use_rep
str(default:None) Name in
obsmwith the representation. Required for SCVI, CCA and Scanorama approach.- bbknn
bool(default:False) Use BBKNN to compute neighbors.
- resolution
float(default:0.3) Resolution for the leiden clustering.
- neighbors_batch
int(default:3) To compute the nearest neighbors distance matrix and a neighborhood graph of observations a BBKNN is employed, which calculate a batch balanced KNN graph. It is recommended to use 3 with when <100000 cells and 25 for >100000. If there are not enough cells per batch the default approach will be used (
sc.pp.neighbors).- majority
bool(default:True) Whether to refine the predicted labels by running the majority voting classifier after over-clustering.
- convert
bool(default:True) Convert the gene format of the model. If a Human model is provided, and is set to
True, then gene in mouse format will be use and viceverse.- key_added
str(default:'annotation_fullrecluster') Metadata column name in
obsto save the reclustering information.
- adata
- Return type:
- Returns:
Returns
None. The following fields will be set:adata.obs['annotation_fullrecluster' | key_added]:pandas.Series(dtypecategory)Array that stores the re-clusters groups consisting of the original group_id + the new cluster id (e.g., for a the monocyte cluster with 3 sub-clusters the new clusters are monocyte_0, monocyte_1, and monocyte_2).
See also
dotools_py.tl.reclustering()re-cluster specific clusters.
Example
>>> import dotools_py as do >>> adata = do.dt.example_10x_processed() >>> do.tl.full_recluster( ... adata, cluster_key="annotation", batch_key="batch", recluster_approach="cca5", use_rep="X_CCA" ... ) >>> adata AnnData object with n_obs × n_vars = 700 × 1851 obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_fullrecluster' var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection' uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap' obsm: 'X_CCA', 'X_pca', 'X_umap' varm: 'PCs' layers: 'counts', 'logcounts' obsp: 'connectivities', 'distances'