dotools_py.tl.rank_genes_consensus#
- dotools_py.tl.rank_genes_consensus(adata, ctrl_cond, disease_cond, cluster_key, batch_key='batch', condition_key='condition', design='~condition', count_layer='counts', logcounts_layer='logcounts', min_cells=50, pseudobulk_approach='sum', technical_replicates=1, min_counts=10, workers=8, path=None, filename='DEA.xlsx', test_pseudobulk='deseq2', test='wilcoxon', mast_covariates=None, pval_cutoff=0.05, get_results=True, key_added='rank_genes_consensus', random_state=0)[source]#
Run single-cell and pseudo-bulk differential expression analysis.
This function performs differential gene expression analysis between two conditions for an all the clusters in the AnnData object using a single-cell level and pseudo-bulk level approach. For the single-cell level, it will test for DEGs using wilcoxon, MAST, t-test, logistic regression or t-test overestimate. For the pseudobulk level it will test for DEGs using DESeq2 or edgeR.
A dataframe will be produce with the results of both tests including the foldchanges, p-values, statistics, percentage of cells in each group expressing the gene and the mean expression per sample in each cluster for each gene. The dataframe will be saved in the
unsattribute and can also be saved if a path a filename is provided.- Parameters:
- adata
AnnData Annotated data matrix.
- ctrl_cond
str Control condition.
- disease_cond
str Disease or alternative condition to test.
- cluster_key
str Metadata column in obs with clustering groups.
- batch_key
str(default:'batch') Metadata column in obs with batch groups.
- condition_key
str(default:'condition') Metadata column in obs with condition groups.
- design
str(default:'~condition') Design for the differential expression analysis in DESeq2.
- count_layer
str(default:'counts') Layer with counts. Required for DESeq2.
- logcounts_layer
str(default:'logcounts') Layer with logcounts.
- min_cells
int(default:50) Minimum number of cells per batch/sample required when generating the pseudo-bulk. If there are fewer cells, DESeq2 / EdgeR will not be run on the cluster.
- pseudobulk_approach
Literal['sum','mean'] (default:'sum') How to generate the pseudobulk counts.
- technical_replicates
int(default:1) How many technical replicates should be generated per sample.
- min_counts
int(default:10) Minimum number of total counts for a gene to be tested in DESeq2 after pseudobulking.
- workers
int(default:8) Number of CPUs to use for DESEq2.
- path
str|PathLike[str] |Path(default:None) Path to save results.
- filename
str(default:'DEA.xlsx') Name of the file.
- test_pseudobulk
Literal['deseq2','edger'] (default:'deseq2') Test to use for doing differential expression analysis on pseudobulk level.
- test
Literal['wilcoxon','mast','t-test','logreg','t-test_overestim_var'] (default:'wilcoxon') Test to use for doing differential expression analysis on single-cell level.
- mast_covariates
list(default:None) Covariates for MAST test.
- pval_cutoff
float(default:0.05) Cutoff for considering a gene significant.
- get_results
bool(default:True) Get a dataframe with the consensus results
- key_added
str(default:'rank_genes_consensus') Name of the uns attribute with the results
- random_state
int(default:0) seed for random number generator
- adata
- Return type:
- Returns:
Returns a
DataFramewith DEA results ifget_resultsis set to True. The following field will also be set:adata.uns['rank_genes_consensus' | key_added]Dataframe with results of the differential expression analysis
See also
dotools_py.tl.rank_genes_condition()run DEA at single-cell level between condition for all clusters
dotools_py.tl.rank_genes_pseudobulk()run DEA at pseudobulk level between condition for all clusters