dotools_py.get.pseudobulk

Contents

dotools_py.get.pseudobulk#

dotools_py.get.pseudobulk(adata, batch_key, cluster_key, keep_metadata=None, min_cells=10, pseudobulk_approach='sum', technical_replicates=1, min_counts=10, layer=None, workers=5, random_state=0)[source]#

Generate pseudobulk AnnData of clusters.

Generate a pseudobulk AnnData for each cluster, the input is expected to be raw counts. To generate the pseudobulk AnnData object two modes for aggregating the counts can be used: sum or mean. Additionally, pseudo-replicates can be generated if specified.

Parameters:
adata AnnData

Annotated data matrix.

batch_key str

Metadata column in obs with batch groups.

cluster_key str

Metadata column in obs with cluster groups.

keep_metadata list (default: None)

Metadata in obs to keep. If more than one value is available for a group the first one is taken.

min_cells int (default: 10)

Minimum number of cells in a cluster for each sample in order to generate a pseudobulk. If the cluster has less it will be excluded.

pseudobulk_approach Literal['sum', 'mean'] (default: 'sum')

Mode of aggregations.

technical_replicates int (default: 1)

Number of technical replicates to generate.

min_counts int (default: 10)

Minimum number of counts for a gene to be included.

layer str (default: None)

Layer to use.

workers int (default: 5)

Number of theads to use to parallelize the pseudo-bulking

random_state int (default: 0)

Seed for random number generator.

Return type:

AnnData

Returns:

AnnData with pseudobulk counts for each cluster.

Example

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> pdata = do.get.pseudobulk(adata, batch_key="batch", cluster_key="annotation")
Pseudo-bulked groups: 100%|██████████| 10/10 [00:08<00:00,  1.19it/s]
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
2025-08-01 16:41:13,927 - Removed 796 genes for having less than 10 total counts
>>> pdata
AnnData object with n_obs × n_vars = 7 × 1055
    obs: 'annotation', 'batch', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts',
         'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes',
         'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes'
    var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts',
         'log1p_total_counts'