dotools_py.tl.full_recluster

Contents

dotools_py.tl.full_recluster#

dotools_py.tl.full_recluster(adata, cluster_key, batch_key, recluster_approach, hvg_batch=False, use_rep=None, bbknn=False, resolution=0.3, neighbors_batch=3, majority=True, convert=True, key_added='annotation_fullrecluster')[source]#

Re-clustering of all clusters in dataset.

Perform reclustering on an integrated AnnData object over all clusters. Can recluster for the following integration methods: CCA (v4/v5) integration from Seurat; Harmony integration; BBKNN integration; SCVI integration, Scanorama integration and PCA. Assumes that X has logcounts.

Note

For CCA (v4/v5) and scVI the corrected expression matrix (CC4 v5), the CCA representation (CCA v5) and the latent space (scvi) to be in .obsm. When re-clustering with harmony and BBKNN the pipeline will be re-run over the clusters.

Parameters:
adata AnnData

Annotated data matrix.

cluster_key str

Metadata column in obs with cluster groups.

batch_key str

Metadata column in obs with batch groups.

hvg_batch bool (default: False)

If set to True. The highly variable genes that are shared across samples will be used.

recluster_approach Literal['cca4', 'cca5', 'harmony', 'scanorama', 'pca', 'scvi']

Reclustering approach to use.

use_rep str (default: None)

Name in obsm with the representation. Required for SCVI, CCA and Scanorama approach.

bbknn bool (default: False)

Use BBKNN to compute neighbors.

resolution float (default: 0.3)

Resolution for the leiden clustering.

neighbors_batch int (default: 3)

To compute the nearest neighbors distance matrix and a neighborhood graph of observations a BBKNN is employed, which calculate a batch balanced KNN graph. It is recommended to use 3 with when <100000 cells and 25 for >100000. If there are not enough cells per batch the default approach will be used (sc.pp.neighbors).

majority bool (default: True)

Whether to refine the predicted labels by running the majority voting classifier after over-clustering.

convert bool (default: True)

Convert the gene format of the model. If a Human model is provided, and is set to True, then gene in mouse format will be use and viceverse.

key_added str (default: 'annotation_fullrecluster')

Metadata column name in obs to save the reclustering information.

Return type:

None

Returns:

Returns None. The following fields will be set:

adata.obs['annotation_fullrecluster' | key_added]: pandas.Series (dtype category)

Array that stores the re-clusters groups consisting of the original group_id + the new cluster id (e.g., for a the monocyte cluster with 3 sub-clusters the new clusters are monocyte_0, monocyte_1, and monocyte_2).

See also

dotools_py.tl.reclustering()

re-cluster specific clusters.

Example

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> do.tl.full_recluster(
...     adata, cluster_key="annotation", batch_key="batch", recluster_approach="cca5", use_rep="X_CCA"
... )
>>> adata
AnnData object with n_obs × n_vars = 700 × 1851
    obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts',
         'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo',
         'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score',
         'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_fullrecluster'
    var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches',
         'highly_variable_intersection'
    uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors',
         'log1p', 'neighbors', 'pca', 'umap'
    obsm: 'X_CCA', 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'connectivities', 'distances'