dotools_py.tl.rank_genes_groups

dotools_py.tl.rank_genes_groups#

dotools_py.tl.rank_genes_groups(adata, groupby, *, mask_var=None, use_raw=None, groups='all', reference='rest', n_genes=None, rankby_abs=False, pts=True, key_added=None, copy=False, method=None, corr_method='benjamini-hochberg', tie_correct=True, layer=None, logcounts=True, **kwds)[source]#

Rank genes for characterizing groups.

Adaptation from sc.tl.rank_genes_groups which expects logarithmized data if logcounts is set to True.

Parameters:

adata AnnData: Annotated data matrix.
groupby str: The column in obs to group.
mask_var ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | str | None (default: None): Select a subset of genes to use in statistical tests.
use_raw bool | None (default: None): Use raw attribute of adata if present.
layer str | None (default: None): Key from adata.layers whose value will be used to perform tests on.
logcounts bool (default: True): The input are logarithmize counts
groups Union[Literal['all'], Iterable[str]] (default: 'all'): Subset of groups, e.g. ['g1', 'g2', 'g3'], to which comparison shall be restricted, or 'all' (default), for all groups. Note that if reference='rest' all groups will still be used as the reference, not just those specified in groups.
reference str (default: 'rest'): If 'rest', compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.
n_genes int | None (default: None): The number of genes that appear in the returned tables. Defaults to all genes.
method Optional[Literal['logreg', 't-test', 'wilcoxon', 't-test_overestim_var']] (default: None): The default method is 'wilcoxon' which uses Wilcoxon rank-sum, 't-test', 't-test_overestim_var' overestimates variance of each group, 'logreg' uses logistic regression. here and here, for why this is meaningful.
corr_method Literal['benjamini-hochberg', 'bonferroni'] (default: 'benjamini-hochberg'): p-value correction method. Used only for 't-test', 't-test_overestim_var', and 'wilcoxon'.
tie_correct bool (default: True): Use tie correction for 'wilcoxon' scores. Used only for 'wilcoxon'.
rankby_abs bool (default: False): Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.
pts bool (default: True): Compute the fraction of cells expressing the genes.
key_added str | None (default: None): The key in adata.uns information is saved to.
copy bool (default: False): Whether to copy adata or modify it inplace.
kwds: Are passed to test methods. Currently, this affects only parameters that are passed to sklearn.linear_model.LogisticRegression. For instance, you can pass penalty='l1' to try to come up with a minimal set of genes that are good predictors (sparse solution meaning few non-zero fitted coefficients).

Return type:

AnnData | None

Returns:

Returns None if copy=False, else returns an AnnData object. Sets the following fields:

adata.uns['rank_genes_groups' | key_added]['names']structured numpy.ndarray (dtype object): Structured array to be indexed by group id storing the gene names. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['scores']structured numpy.ndarray (dtype object): Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['logfoldchanges']structured numpy.ndarray (dtype object): Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is ‘t-test’ like. Note: this is an approximation calculated from mean-log values.
adata.uns['rank_genes_groups' | key_added]['pvals']structured numpy.ndarray (dtype float): p-values.
adata.uns['rank_genes_groups' | key_added]['pvals_adj']structured numpy.ndarray (dtype float): Corrected p-values.
adata.uns['rank_genes_groups' | key_added]['pts']pandas.DataFrame (dtype float): Fraction of cells expressing the genes for each group.
adata.uns['rank_genes_groups' | key_added]['pts_rest']pandas.DataFrame (dtype float): Only if reference is set to 'rest'. Fraction of cells from the union of the rest of each group expressing the genes.

Notes

There are slight inconsistencies depending on whether sparse or dense data are passed. See here.

dotools_py.tl.rank_genes_groups

Contents

dotools_py.tl.rank_genes_groups#