dotools_py.tl.integrate_data#
- dotools_py.tl.integrate_data(adata, batch_key, hvg_batch=True, integration_method='scvi', bbknn=False, resolution=0.3, categorical_covariates=None, continuous_covariates=None, technology='scrna', get_model=False, random_state=0, workers=1, spatial_neigh_kwargs=None, **kwargs)[source]#
Integrate a concatenated AnnData.
Integrate and perform batch correction for an AnnData with several samples. Different batch correction methods are available: Harmony, Scanorama, BBKNN, scVI and CCA (v4 or v5).
Note
The integration method CCA is based on Seurat. The v4 will generate a corrected expression matrix of all the highly variable genes (HVGs) that is then used to perform dimensionality reduction. In v5 the dimensionality reduction is performed before producing the CCA embeddings.
- Parameters:
- adata
AnnData Annotated data matrix.
- batch_key
str Metadata column in
obswith batch information.- hvg_batch
bool(default:True) If set to
True, the highly variable genes shared across samples will be used for the integration.- integration_method
Literal['scanorama','scvi','cca4','cca5','harmony','pca'] (default:'scvi') Method to use for the integration.
- bbknn
bool(default:False) Use BBKNN to compute neighbors instead of sc.pp.neighbors().
- resolution
float(default:0.3) Resolution for the leiden clustering.
- categorical_covariates
list(default:None) Categorical covariates for scVI.
- continuous_covariates
list(default:None) Continuous covariates for scVI.
- technology
Literal['scrna','spatial'] (default:'scrna') The type of technology of the input.
- get_model
bool(default:False) Set to True to Return the scVI model.
- random_state
int(default:0) seed for random number generator.
- workers
int(default:1) number of threads to use for harmony.
- spatial_neigh_kwargs
dict(default:None) Additional arguments when computing the spatial neighborhood graph. See Squidpy
- kwargs
Additional arguments for scVI model.
- adata
- Return type:
- Returns:
Returns
Noneor the scVI model ifget_modelisTrue. The following fields will be set:adata.obsm['X_pca']:numpy.ndarray(dtypefloat)PCA representation of data.
adata.varm['PCs']numpy.ndarrayThe principal components containing the loadings.
adata.uns['pca']['variance_ratio']numpy.ndarray(shape(n_comps,))Ratio of explained variance.
adata.uns['pca']['variance']numpy.ndarray(shape(n_comps,))Explained variance, equivalent to the eigenvalues of the covariance matrix.
adata.obsm[representation]:numpy.ndarray(dtypefloat)Representation will be set to
X_pca_harmonyfor harmony;X_scanoramafor scanorama;X_CCAfor CCA4/CC5, andX_scVIfor scVI.adata.obsp['distances']scipy.sparse.csr_matrix(dtypefloat)Distance matrix of the nearest neighbors search.
adata.obsp['connectivities']scipy.sparse._csr.csr_matrix(dtypefloat)Weighted adjacency matrix of the neighborhood graph of data points. Weights should be interpreted as connectivities.
adata.uns['neighbors']dictneighbors parameters.
adata.obsm['X_umap']:numpy.ndarray(dtypefloat)UMAP coordinates of the data
adata.obs['leiden']:pandas.Series(dtypecategory)Array that stores the cluster groups.
adata.uns['leiden']['params']dictA dict with the values for the parameters
resolution,random_state, andn_iterations.
Example
>>> import dotools_py as do >>> adata = do.dt.example_10x_processed() >>> do.tl.integrate_data(adata, batch_key="batch", harmony=True) >>> adata AnnData object with n_obs × n_vars = 700 × 1851 obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster' var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection' uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap' obsm: 'X_CCA', 'X_pca', 'X_umap', 'X_harmony' varm: 'PCs' layers: 'counts', 'logcounts' obsp: 'connectivities', 'distances'