dotools_py.tl.rank_genes_consensus

dotools_py.tl.rank_genes_consensus#

dotools_py.tl.rank_genes_consensus(adata, ctrl_cond, disease_cond, cluster_key, batch_key='batch', condition_key='condition', design='~condition', count_layer='counts', logcounts_layer='logcounts', min_cells=50, pseudobulk_approach='sum', technical_replicates=1, min_counts=10, workers=8, path=None, filename='DEA.xlsx', test_pseudobulk='deseq2', test='wilcoxon', mast_covariates=None, pval_cutoff=0.05, get_results=True, key_added='rank_genes_consensus', random_state=0)[source]#

Run single-cell and pseudo-bulk differential expression analysis.

This function performs differential gene expression analysis between two conditions for an all the clusters in the AnnData object using a single-cell level and pseudo-bulk level approach. For the single-cell level, it will test for DEGs using wilcoxon, MAST, t-test, logistic regression or t-test overestimate. For the pseudobulk level it will test for DEGs using DESeq2 or edgeR.

A dataframe will be produce with the results of both tests including the foldchanges, p-values, statistics, percentage of cells in each group expressing the gene and the mean expression per sample in each cluster for each gene. The dataframe will be saved in the uns attribute and can also be saved if a path a filename is provided.

Parameters:
adata AnnData

Annotated data matrix.

ctrl_cond str

Control condition.

disease_cond str

Disease or alternative condition to test.

cluster_key str

Metadata column in obs with clustering groups.

batch_key str (default: 'batch')

Metadata column in obs with batch groups.

condition_key str (default: 'condition')

Metadata column in obs with condition groups.

design str (default: '~condition')

Design for the differential expression analysis in DESeq2.

count_layer str (default: 'counts')

Layer with counts. Required for DESeq2.

logcounts_layer str (default: 'logcounts')

Layer with logcounts.

min_cells int (default: 50)

Minimum number of cells per batch/sample required when generating the pseudo-bulk. If there are fewer cells, DESeq2 / EdgeR will not be run on the cluster.

pseudobulk_approach Literal['sum', 'mean'] (default: 'sum')

How to generate the pseudobulk counts.

technical_replicates int (default: 1)

How many technical replicates should be generated per sample.

min_counts int (default: 10)

Minimum number of total counts for a gene to be tested in DESeq2 after pseudobulking.

workers int (default: 8)

Number of CPUs to use for DESEq2.

path str | PathLike[str] | Path (default: None)

Path to save results.

filename str (default: 'DEA.xlsx')

Name of the file.

test_pseudobulk Literal['deseq2', 'edger'] (default: 'deseq2')

Test to use for doing differential expression analysis on pseudobulk level.

test Literal['wilcoxon', 'mast', 't-test', 'logreg', 't-test_overestim_var'] (default: 'wilcoxon')

Test to use for doing differential expression analysis on single-cell level.

mast_covariates list (default: None)

Covariates for MAST test.

pval_cutoff float (default: 0.05)

Cutoff for considering a gene significant.

get_results bool (default: True)

Get a dataframe with the consensus results

key_added str (default: 'rank_genes_consensus')

Name of the uns attribute with the results

random_state int (default: 0)

seed for random number generator

Return type:

None | DataFrame

Returns:

Returns a DataFrame with DEA results if get_results is set to True. The following field will also be set:

adata.uns['rank_genes_consensus' | key_added]

Dataframe with results of the differential expression analysis

See also

dotools_py.tl.rank_genes_condition()

run DEA at single-cell level between condition for all clusters

dotools_py.tl.rank_genes_pseudobulk()

run DEA at pseudobulk level between condition for all clusters