dotools_py.tl.rank_genes_groups#
- dotools_py.tl.rank_genes_groups(adata, groupby, *, mask_var=None, use_raw=None, groups='all', reference='rest', n_genes=None, rankby_abs=False, pts=True, key_added=None, copy=False, method=None, corr_method='benjamini-hochberg', tie_correct=True, layer=None, logcounts=True, **kwds)[source]#
Rank genes for characterizing groups.
Adaptation from sc.tl.rank_genes_groups which expects logarithmized data if
logcountsis set toTrue.- Parameters:
- adata
AnnData Annotated data matrix.
- groupby
str The column in
obsto group.- mask_var
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |str|None(default:None) Select a subset of genes to use in statistical tests.
- use_raw
bool|None(default:None) Use
rawattribute ofadataif present.- layer
str|None(default:None) Key from
adata.layerswhose value will be used to perform tests on.- logcounts
bool(default:True) The input are logarithmize counts
- groups
Union[Literal['all'],Iterable[str]] (default:'all') Subset of groups, e.g. [
'g1','g2','g3'], to which comparison shall be restricted, or'all'(default), for all groups. Note that ifreference='rest'all groups will still be used as the reference, not just those specified ingroups.- reference
str(default:'rest') If
'rest', compare each group to the union of the rest of the group. If a group identifier, compare with respect to this group.- n_genes
int|None(default:None) The number of genes that appear in the returned tables. Defaults to all genes.
- method
Optional[Literal['logreg','t-test','wilcoxon','t-test_overestim_var']] (default:None) The default method is
'wilcoxon'which uses Wilcoxon rank-sum,'t-test','t-test_overestim_var'overestimates variance of each group,'logreg'uses logistic regression. here and here, for why this is meaningful.- corr_method
Literal['benjamini-hochberg','bonferroni'] (default:'benjamini-hochberg') p-value correction method. Used only for
't-test','t-test_overestim_var', and'wilcoxon'.- tie_correct
bool(default:True) Use tie correction for
'wilcoxon'scores. Used only for'wilcoxon'.- rankby_abs
bool(default:False) Rank genes by the absolute value of the score, not by the score. The returned scores are never the absolute values.
- pts
bool(default:True) Compute the fraction of cells expressing the genes.
- key_added
str|None(default:None) The key in
adata.unsinformation is saved to.- copy
bool(default:False) Whether to copy
adataor modify it inplace.- kwds
Are passed to test methods. Currently, this affects only parameters that are passed to
sklearn.linear_model.LogisticRegression. For instance, you can passpenalty='l1'to try to come up with a minimal set of genes that are good predictors (sparse solution meaning few non-zero fitted coefficients).
- adata
- Return type:
- Returns:
Returns
Noneifcopy=False, else returns anAnnDataobject. Sets the following fields:adata.uns['rank_genes_groups' | key_added]['names']structurednumpy.ndarray(dtypeobject)Structured array to be indexed by group id storing the gene names. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['scores']structurednumpy.ndarray(dtypeobject)Structured array to be indexed by group id storing the z-score underlying the computation of a p-value for each gene for each group. Ordered according to scores.
adata.uns['rank_genes_groups' | key_added]['logfoldchanges']structurednumpy.ndarray(dtypeobject)Structured array to be indexed by group id storing the log2 fold change for each gene for each group. Ordered according to scores. Only provided if method is ‘t-test’ like. Note: this is an approximation calculated from mean-log values.
adata.uns['rank_genes_groups' | key_added]['pvals']structurednumpy.ndarray(dtypefloat)p-values.
adata.uns['rank_genes_groups' | key_added]['pvals_adj']structurednumpy.ndarray(dtypefloat)Corrected p-values.
adata.uns['rank_genes_groups' | key_added]['pts']pandas.DataFrame(dtypefloat)Fraction of cells expressing the genes for each group.
adata.uns['rank_genes_groups' | key_added]['pts_rest']pandas.DataFrame(dtypefloat)Only if
referenceis set to'rest'. Fraction of cells from the union of the rest of each group expressing the genes.
Notes
There are slight inconsistencies depending on whether sparse or dense data are passed. See here.
See also
dotools_py.tl.grouped_ttest()run DEA at pseudobulk level between condition for all genes