dotools_py.get.pseudobulk#
- dotools_py.get.pseudobulk(adata, batch_key, cluster_key, keep_metadata=None, min_cells=10, pseudobulk_approach='sum', technical_replicates=1, min_counts=10, layer=None, workers=5, random_state=0)[source]#
Generate pseudobulk AnnData of clusters.
Generate a pseudobulk AnnData for each cluster, the input is expected to be raw counts. To generate the pseudobulk AnnData object two modes for aggregating the counts can be used:
sumormean. Additionally, pseudo-replicates can be generated if specified.- Parameters:
- adata
AnnData Annotated data matrix.
- batch_key
str Metadata column in
obswith batch groups.- cluster_key
str Metadata column in
obswith cluster groups.- keep_metadata
list(default:None) Metadata in
obsto keep. If more than one value is available for a group the first one is taken.- min_cells
int(default:10) Minimum number of cells in a cluster for each sample in order to generate a pseudobulk. If the cluster has less it will be excluded.
- pseudobulk_approach
Literal['sum','mean'] (default:'sum') Mode of aggregations.
- technical_replicates
int(default:1) Number of technical replicates to generate.
- min_counts
int(default:10) Minimum number of counts for a gene to be included.
- layer
str(default:None) Layer to use.
- workers
int(default:5) Number of theads to use to parallelize the pseudo-bulking
- random_state
int(default:0) Seed for random number generator.
- adata
- Return type:
- Returns:
AnnData with pseudobulk counts for each cluster.
Example
>>> import dotools_py as do >>> adata = do.dt.example_10x_processed() >>> pdata = do.get.pseudobulk(adata, batch_key="batch", cluster_key="annotation") Pseudo-bulked groups: 100%|██████████| 10/10 [00:08<00:00, 1.19it/s] OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead. 2025-08-01 16:41:13,927 - Removed 796 genes for having less than 10 total counts >>> pdata AnnData object with n_obs × n_vars = 7 × 1055 obs: 'annotation', 'batch', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes' var: 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts'