dotools_py.tl.integrate_data

Contents

dotools_py.tl.integrate_data#

dotools_py.tl.integrate_data(adata, batch_key, hvg_batch=True, integration_method='scvi', bbknn=False, resolution=0.3, categorical_covariates=None, continuous_covariates=None, technology='scrna', get_model=False, random_state=0, workers=1, spatial_neigh_kwargs=None, **kwargs)[source]#

Integrate a concatenated AnnData.

Integrate and perform batch correction for an AnnData with several samples. Different batch correction methods are available: Harmony, Scanorama, BBKNN, scVI and CCA (v4 or v5).

Note

The integration method CCA is based on Seurat. The v4 will generate a corrected expression matrix of all the highly variable genes (HVGs) that is then used to perform dimensionality reduction. In v5 the dimensionality reduction is performed before producing the CCA embeddings.

Parameters:
adata AnnData

Annotated data matrix.

batch_key str

Metadata column in obs with batch information.

hvg_batch bool (default: True)

If set to True, the highly variable genes shared across samples will be used for the integration.

integration_method Literal['scanorama', 'scvi', 'cca4', 'cca5', 'harmony', 'pca'] (default: 'scvi')

Method to use for the integration.

bbknn bool (default: False)

Use BBKNN to compute neighbors instead of sc.pp.neighbors().

resolution float (default: 0.3)

Resolution for the leiden clustering.

categorical_covariates list (default: None)

Categorical covariates for scVI.

continuous_covariates list (default: None)

Continuous covariates for scVI.

technology Literal['scrna', 'spatial'] (default: 'scrna')

The type of technology of the input.

get_model bool (default: False)

Set to True to Return the scVI model.

random_state int (default: 0)

seed for random number generator.

workers int (default: 1)

number of threads to use for harmony.

spatial_neigh_kwargs dict (default: None)

Additional arguments when computing the spatial neighborhood graph. See Squidpy

kwargs

Additional arguments for scVI model.

Return type:

SCVI | None

Returns:

Returns None or the scVI model if get_model is True. The following fields will be set:

adata.obsm['X_pca']: numpy.ndarray (dtype float)

PCA representation of data.

adata.varm['PCs']numpy.ndarray

The principal components containing the loadings.

adata.uns['pca']['variance_ratio']numpy.ndarray (shape (n_comps,))

Ratio of explained variance.

adata.uns['pca']['variance']numpy.ndarray (shape (n_comps,))

Explained variance, equivalent to the eigenvalues of the covariance matrix.

adata.obsm[representation]: numpy.ndarray (dtype float)

Representation will be set to X_pca_harmony for harmony; X_scanorama for scanorama; X_CCA for CCA4/CC5, and X_scVI for scVI.

adata.obsp['distances']scipy.sparse.csr_matrix (dtype float)

Distance matrix of the nearest neighbors search.

adata.obsp['connectivities']scipy.sparse._csr.csr_matrix (dtype float)

Weighted adjacency matrix of the neighborhood graph of data points. Weights should be interpreted as connectivities.

adata.uns['neighbors']dict

neighbors parameters.

adata.obsm['X_umap']: numpy.ndarray (dtype float)

UMAP coordinates of the data

adata.obs['leiden']: pandas.Series (dtype category)

Array that stores the cluster groups.

adata.uns['leiden']['params']dict

A dict with the values for the parameters resolution, random_state, and n_iterations.

Example

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> do.tl.integrate_data(adata, batch_key="batch", harmony=True)
>>> adata
AnnData object with n_obs × n_vars = 700 × 1851
obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts',
     'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo',
      'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type',
      'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster'
var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches',
     'highly_variable_intersection'
uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors',
     'log1p', 'neighbors', 'pca', 'umap'
obsm: 'X_CCA', 'X_pca', 'X_umap', 'X_harmony'
varm: 'PCs'
layers: 'counts', 'logcounts'
obsp: 'connectivities', 'distances'