dotools_py.pp.pearson_residuals_normalize#
- dotools_py.pp.pearson_residuals_normalize(adata, batch_key=None, layer=None, backend='scanpy', theta=100)[source]#
Apply analytic Pearson residual normalization.
The residuals are based on a negative binomial offset model with overdispersion theta shared across genes. By default, residuals are clipped to sqrt(n_obs) and overdispersion theta=100 is used. It expects raw counts as input.
- Parameters:
- adata
AnnData Annotated data matrix.
- batch_key
str(default:None) Column in adata.obs with batch information.
- layer
str(default:None) Layer to use instead of
adata.X- backend
Literal['scanpy','seurat'] (default:'scanpy') If set to
scanpyit will use scanpy implementation. Otherwise set toseutatto use SCTransform.- theta
int(default:100) he negative binomial overdispersion parameter for Pearson residuals.
- adata
- Return type:
- Returns:
Returns
AnnData. Depending on the backend new layers will be added. The normalized values will also be set inadata.X
Example
>>> import dotools_py as do >>> adata = do.dt.example_10x_processed() >>> adata = pearson_residuals_normalisation(adata, batch_key="batch", layer="counts", backend="scanpy") normalizing counts per cell finished (0:00:00) computing analytic Pearson residuals on counts finished (0:00:00) computing analytic Pearson residuals on counts finished (0:00:00) >>> adata AnnData object with n_obs × n_vars = 700 × 1851 obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster' obsm: 'X_CCA', 'X_pca', 'X_umap' layers: 'counts', 'logcounts', 'sqrt_norm', 'pearson_norm' >>> adata = do.dt.example_10x_processed() >>> adata = pearson_residuals_normalisation(adata, batch_key="batch", layer="counts", backend="seurat") 2026-03-05 15:45:26,911 - Preparing to transfer to R 2026-03-05 15:45:26,928 - Running SCTransform in R >>> adata AnnData object with n_obs × n_vars = 700 × 1181 obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster' var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection', 'SCT_rm' obsm: 'SCT_rm' varm: 'PCs' layers: 'counts', 'logcounts', 'SCT_norm', 'SCT_counts' obsp: 'connectivities', 'distances'