dotools_py.tl.DGEAnalysis#

class dotools_py.tl.DGEAnalysis(adata, groupby, batch_key='batch', pseudobulk_mode='sum', pseudobulk_groups=None, technical_replicates=None, is_pseudobulk=False)[source]#

Class to perform differential gene expression (DGE) at the single-cell or sample level for AnnData objects.

At the sample (pseudobulk) level, the available methods are EdgeR, DESeq2, and t-test. At the single-cell level, the available methods are wilcoxon, MAST, t-test, t-test with overestimated variance, and logistic regression.

Parameters:
adata AnnData

Annotated data matrix.

groupby str

Column in adata.obs to use for testing.

batch_key str (default: 'batch')

Column in adata.obs containing batch information.

pseudobulk_mode Literal['sum', 'mean'] (default: 'sum')

Method used to generate pseudobulk counts.

pseudobulk_groups str | None (default: None)

Column in adata.obs used to additionally group observations when generating pseudobulk profiles (e.g. cell type annotation). Differential gene expression is performed for the groups in group_by within each category of pseudobulk_groups.

technical_replicates int (default: None)

Number of technical replicates to generate for each sample (experimental).

Examples

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> tester = do.tl.DGEAnalysis(adata, group_by="condition")
>>> tester.find_methods("single-cell")
['logreg', 'mast', 'ttest', 'ttest_overtim_var', 'wilcoxon']
>>> tester.find_methods("pseudobulk")
['cluster_ttest', 'deseq', 'edger']

Attributes table#

get_dge

Get DGE results.

Methods table#

cluster_ttest(reference, groups[, ...])

Differential Gene Expression Analysis with T-test.

deseq(design, reference, groups[, ...])

Differential Gene Expression Analysis with DESeq2.

edger(design, reference, groups[, ...])

Differential Gene Expression Analysis with EdgeR.

find_methods(label)

Get list with pseudobulk or single-cell methods

logreg([reference, groups, logcounts, layer])

Differential Gene Expression Analysis with logistic regression.

mast(reference[, groups, covariates, layer])

Run the Mast Test.

ttest([reference, groups, logcounts, layer])

Differential Gene Expression Analysis with Wilcoxon.

ttest_overtim_var([reference, groups, ...])

Differential Gene Expression Analysis with t-test with overestimated variance.

wilcoxon([reference, groups, logcounts, layer])

Differential Gene Expression Analysis with Wilcoxon.

Attributes#

DGEAnalysis.get_dge[source]#

Get DGE results.

Returns:

Returns a dictionary with the results.

Methods#

DGEAnalysis.cluster_ttest(reference, groups, equal_var=True, layer=None)[source]#

Differential Gene Expression Analysis with T-test.

Parameters:
reference str

Control condition.

groups str | list

Alternative conditions to test against.

equal_var bool (default: True)

Assume equal variance.

layer str (default: None)

Layer in the AnnData to use.

Return type:

None

Returns:

Returns None.

DGEAnalysis.deseq(design, reference, groups, sample_min_cells=10, sample_min_counts=100, gene_min_count=0, gene_min_total_count=0, layer='counts')[source]#

Differential Gene Expression Analysis with DESeq2.

Parameters:
design str

Design for the test.

reference str

Control condition.

groups str | list

Alternative conditions to test against.

sample_min_cells int (default: 10)

Minimum number of cells to retain a pseudobulk sample,

sample_min_counts int (default: 100)

Minimum number of counts to retain a pseudobulk sample.

gene_min_count int (default: 0)

Minimum number of counts to retain a gene.

gene_min_total_count int (default: 0)

Minimum number of total counts to retain a gene.

layer str (default: 'counts')

Layer in AnnData to use.

Returns:

Returns None.

DGEAnalysis.edger(design, reference, groups, sample_min_cells=10, sample_min_counts=100, gene_min_count=0, gene_min_total_count=0, layer='counts')[source]#

Differential Gene Expression Analysis with EdgeR.

Parameters:
design str | DataFrame

Design for the test.

reference str

Control condition.

groups str | list

Alternative conditions to test against.

sample_min_cells int (default: 10)

Minimum number of cells to retain a pseudobulk sample,

sample_min_counts int (default: 100)

Minimum number of counts to retain a pseudobulk sample.

gene_min_count int (default: 0)

Minimum number of counts to retain a gene.

gene_min_total_count int (default: 0)

Minimum number of total counts to retain a gene.

layer str (default: 'counts')

Layer with raw counts. Set to None if raw counts are in X

Return type:

None

Returns:

Returns None.

classmethod DGEAnalysis.find_methods(label)[source]#

Get list with pseudobulk or single-cell methods

Parameters:
label Literal['pseudobulk', 'single-cell']

Tag of the method.

Return type:

list

Returns:

Returns a list with the names of the methods implemented to perform differential gene expression on pseudobulk or single-cell level.

DGEAnalysis.logreg(reference='rest', groups=None, logcounts=True, layer=None)[source]#

Differential Gene Expression Analysis with logistic regression.

Parameters:
reference str (default: 'rest')

reference condition.

groups str | list (default: None)

alternative condition.

logcounts bool (default: True)

whether the data is log-normalized.

layer str (default: None)

layer in adata.layers to use.

Return type:

None

Returns:

Returns None.

DGEAnalysis.mast(reference, groups=None, covariates=None, layer=None)[source]#

Run the Mast Test.

Parameters:
reference str

reference condition.

groups str | list (default: None)

alternative condition.

covariates str (default: None)

covariates to correct for.

layer str (default: None)

layer in adata.layers to use.

Return type:

None

Returns:

Returns None.

DGEAnalysis.ttest(reference='rest', groups=None, logcounts=True, layer=None)[source]#

Differential Gene Expression Analysis with Wilcoxon.

Parameters:
reference str (default: 'rest')

reference condition.

groups str | list (default: None)

alternative condition.

logcounts bool (default: True)

whether the data is log-normalized or not.

layer str (default: None)

layer in adata.layers.

Return type:

None

Returns:

Returns None.

DGEAnalysis.ttest_overtim_var(reference='rest', groups=None, logcounts=True, layer=None)[source]#

Differential Gene Expression Analysis with t-test with overestimated variance.

Parameters:
reference str (default: 'rest')

reference condition.

groups str | list (default: None)

alternative condition.

logcounts bool (default: True)

whether the data is log-normalized.

layer str (default: None)

layer in adata.layers to use.

Return type:

None

Returns:

Returns None.

DGEAnalysis.wilcoxon(reference='rest', groups=None, logcounts=True, layer=None)[source]#

Differential Gene Expression Analysis with Wilcoxon.

Parameters:
reference str (default: 'rest')

reference condition

groups str | list (default: None)

alternative conditions

logcounts bool (default: True)

Whether the data is log-normalized or not.

layer str (default: None)

Layer in adata.layers to use.

Return type:

None

Returns:

Returns None