dotools_py.tl.rank_genes_pseudobulk

dotools_py.tl.rank_genes_pseudobulk#

dotools_py.tl.rank_genes_pseudobulk(adata, ctrl_cond, disease_cond, cluster_key, method='deseq2', batch_key='batch', condition_key='condition', design='~condition', layer='counts', min_cells=50, pseudobulk_approach='sum', technical_replicates=1, min_counts=10, workers=8, path=None, filename='DEA_Pseudobulk.xlsx', get_results=True, key_added='rank_genes_pseudobulk', random_state=0)[source]#

Running DEA using pseudobulk approach.

Perform differential expression analysis (DEA) using DESeq2 or EdgeR. This functions has a similar behavior as dotools_py.tl.rank_genes_condition(). For each cluster it will test for differential gene expression between two conditions. The input is expected to be raw counts.

Parameters:
adata AnnData

Annotated data matrix.

ctrl_cond str

Control condition.

disease_cond str

Disease condition.

cluster_key str

Metadata column in obs with cluster groups.

method Literal['deseq2', 'edger'] (default: 'deseq2')

Differential expression method to use, DESeq2 or EdgeR.

batch_key str (default: 'batch')

Metadata column in obs with batch groups

condition_key str (default: 'condition')

Metadata column in obs with condition groups.

design str (default: '~condition')

Design factors for DESeq2.

layer str (default: 'counts')

Layer to use. Expected raw counts.

min_cells int (default: 50)

Minimum number of cells per batch/sample required when generating the pseudo-bulk. If there are fewer cells, DESeq2 / EdgeR will not be run on the cluster.

pseudobulk_approach Literal['sum', 'mean'] (default: 'sum')

How to generate the pseudobulk counts.

technical_replicates int (default: 1)

How many technical replicates should be generated per sample.

min_counts int (default: 10)

Minimum number of total counts for a gene to be tested after pseudo-bulking.

workers int (default: 8)

Number of CPUs to use for DESEq2.

path str | PathLike[str] | Path (default: None)

Path to save the file.

filename str (default: 'DEA_Pseudobulk.xlsx')

Name of the file.

get_results bool (default: True)

Get dataframe with DEA results.

key_added str (default: 'rank_genes_pseudobulk')

Name of the uns attribute with the results.

random_state int (default: 0)

seed for random number generator.

Return type:

None | DataFrame

Returns:

Returns a DataFrame with DEA results if get_results is set to True. The following field will also be set:

adata.uns['rank_genes_pseudobulk' | key_added]

Dataframe with results of the differential expression analysis

See also

dotools_py.tl.rank_genes_condition()

run DEA at single-cell level between condition for all clusters

dotools_py.tl.rank_genes_consensus()

run DEA at pseudobulk and single-cell level between condition for all clusters