dotools_py.pp.pearson_residuals_normalize

dotools_py.pp.pearson_residuals_normalize#

dotools_py.pp.pearson_residuals_normalize(adata, batch_key=None, layer=None, backend='scanpy', theta=100)[source]#

Apply analytic Pearson residual normalization.

The residuals are based on a negative binomial offset model with overdispersion theta shared across genes. By default, residuals are clipped to sqrt(n_obs) and overdispersion theta=100 is used. It expects raw counts as input.

Parameters:
adata AnnData

Annotated data matrix.

batch_key str (default: None)

Column in adata.obs with batch information.

layer str (default: None)

Layer to use instead of adata.X

backend Literal['scanpy', 'seurat'] (default: 'scanpy')

If set to scanpy it will use scanpy implementation. Otherwise set to seutat to use SCTransform.

theta int (default: 100)

he negative binomial overdispersion parameter for Pearson residuals.

Return type:

AnnData

Returns:

Returns AnnData. Depending on the backend new layers will be added. The normalized values will also be set in adata.X

Example

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> adata = pearson_residuals_normalisation(adata, batch_key="batch", layer="counts", backend="scanpy")
normalizing counts per cell
finished (0:00:00)
computing analytic Pearson residuals on counts
    finished (0:00:00)
computing analytic Pearson residuals on counts
    finished (0:00:00)
>>> adata
AnnData object with n_obs × n_vars = 700 × 1851
obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts',
     'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo',
     'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot',
     'celltypist_conf_score', 'annotation', 'annotation_recluster'
obsm: 'X_CCA', 'X_pca', 'X_umap'
layers: 'counts', 'logcounts', 'sqrt_norm', 'pearson_norm'
>>> adata = do.dt.example_10x_processed()
>>> adata = pearson_residuals_normalisation(adata, batch_key="batch", layer="counts", backend="seurat")
2026-03-05 15:45:26,911 - Preparing to transfer to R
2026-03-05 15:45:26,928 - Running SCTransform in R
>>> adata
AnnData object with n_obs × n_vars = 700 × 1181
obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts',
     'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo',
     'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot',
     'celltypist_conf_score', 'annotation', 'annotation_recluster'
var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection', 'SCT_rm'
obsm: 'SCT_rm'
varm: 'PCs'
layers: 'counts', 'logcounts', 'SCT_norm', 'SCT_counts'
obsp: 'connectivities', 'distances'