dotools_py.tl.reclustering#
- dotools_py.tl.reclustering(adata, cluster_key, batch_key, recluster_approach, use_clusters=None, bbknn=False, hvg_batch=False, use_rep=None, resolution=0.3, neighbors_batch=3, automatic_annot=False, majority=True, convert=True, model='Healthy_Adult_Heart.pkl', get_subset=False, key_added='annotation_recluster', key_added_autoannot='autoAnnot_recluster', random_state=0)[source]#
Re-clustering of dataset.
- Perform reclustering on an integrated AnnData object. Can recluster for the following integration methods:
CCA (v4/v5) integration from Seurat.
Harmony integration.
BBKNN integration.
SCVI integration.
PCA.
Assume that
Xhas logcounts.Note
For CCA (v4/v5) and scVI the corrected expression matrix (CC4 v5), the CCA representation (CCA v5) and the latent space (scvi) to be in
.obsm. When re-clustering with harmony and BBKNN the pipeline will be re-run over the clusters.- Parameters:
- adata
AnnData Annotated data matrix.
- cluster_key
str Metadata column in
obswith cluster groups.- batch_key
str Metadata column in
obswith batch groups.- use_clusters
str|list|None(default:None) Clusters in
cluster_keyto re-cluster. If several clusters are provided, the re-clustering will be performed subsetting for all the clusters specified.- hvg_batch
bool(default:False) If set to
True. The highly variable genes that are shared across samples will be used.- recluster_approach
Literal['cca4','cca5','harmony','scanorama','pca','scvi'] Reclustering approach to use.
- bbknn
bool(default:False) Use BBKNN to compute neighbors.
- use_rep
str(default:None) Name in
obsmwith the representation. Required for SCVI, CCA and Scanorama approach.- resolution
float(default:0.3) Resolution for the leiden clustering.
- neighbors_batch
int(default:3) To compute the nearest neighbors distance matrix and a neighborhood graph of observations a BBKNN is employed, which calculate a batch balanced KNN graph. It is recommended to use 3 with when <100000 cells and 25 for >100000. If there are not enough cells per batch the default approach will be used (
sc.pp.neighbors).- automatic_annot
bool(default:False) Perform semi-automatic annotation with Celltypist.
- majority
bool(default:True) Whether to refine the predicted labels by running the majority voting classifier after over-clustering.
- convert
bool(default:True) Convert the gene format of the model. If a Human model is provided, and is set to
True, then gene in mouse format will be use and viceverse.- model
str(default:'Healthy_Adult_Heart.pkl') Celltypist model to use for the prediction.
- get_subset
bool(default:False) if set to
True, returns an AnnData ofuse_clustersafter re-clustering.- key_added
str(default:'annotation_recluster') metadata column name in
obsto save reclustering information.- key_added_autoannot
str(default:'autoAnnot_recluster') metadata column name in
obsto save reclustering information after automatic annotation.- random_state
int(default:0) seed for random number generator.
- adata
- Return type:
- Returns:
Returns
Noneifget_subsetis set to False, otherwise a subsetted AnnData after the re-clustering is returned. Additionally, the following fields will be set:adata.obs['annotation_recluster' | key_added]pandas.Series(dtypecategory)Array that stores the re-clusters groups consisting of the original group_id + the new cluster id (e.g., for a the monocyte cluster with 3 sub-clusters the new clusters are monocyte_0, monocyte_1, and monocyte_2).
adata.obs['autoAnnot_recluster' | key_added_autoannot]pandas.Series(dtypecategory)Array that stores the re-clusters groups after re-running the automatic annotation pipeline.
See also
dotools_py.tl.full_recluster()Recluster all clusters automatically
Example
>>> import dotools_py as do >>> adata = do.dt.example_10x_processed() >>> t_cells = do.tl.reclustering(adata, "annotation", "batch", "harmony", use_clusters="T_cells", get_subset=True) >>> t_cells AnnData object with n_obs × n_vars = 464 × 1851 obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster' var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection' uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap' obsm: 'X_CCA', 'X_pca', 'X_umap', 'X_pca_harmony' varm: 'PCs' layers: 'counts', 'logcounts' obsp: 'connectivities', 'distances' >>> adata AnnData object with n_obs × n_vars = 700 × 1851 obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster' var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection' uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap' obsm: 'X_CCA', 'X_pca', 'X_umap' varm: 'PCs' layers: 'counts', 'logcounts' obsp: 'connectivities', 'distances'