dotools_py.get.subset

Contents

dotools_py.get.subset#

dotools_py.get.subset(adata, obs_key=None, obs_groups=None, var_key=None, var_groups=None, comparison='include', copy=False)[source]#

Subset AnnData object.

Subset an AnnData object based on obs or var column. Currently it does not allow to subset by multiple obs/var columns at the same time.

Parameters:
adata AnnData

AnnData Object.

obs_key str | None (default: None)

Column in obs to subset for.

obs_groups str | list | float | bool | None (default: None)

Groups or values to include or filter for the AnnData object.

var_key str | None (default: None)

Column in var to subset for.

var_groups str | list | float | bool | None (default: None)

Groups or values to include or filter for in the AnnData object.

comparison Literal['>=', '>', '==', '<', '<=', 'include', 'exclude'] (default: 'include')

Method to filter the AnnData object.

copy bool (default: False)

if set to True, a copy is returned, otherwise a view of the AnnData is returned.

Return type:

AnnData

Returns:

Returns a view or a new AnnData object.

Returns:

Returns an AnnData Object if copy is set to True, otherwise returns a View of an AnnData after subsetting.

Example

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> tcells = do.get.subset(adata, obs_key="annotation", obs_groups="T_cells")
>>> tcells
View of AnnData object with n_obs × n_vars = 464 × 1851
    obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster'
    var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
    obsm: 'X_CCA', 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'connectivities', 'distances'
>>> adata_subset = do.get.subset(adata, obs_key="total_counts", obs_groups=1000, comparison=">=", copy=True)
>>> adata_subset
AnnData object with n_obs × n_vars = 699 × 1851
    obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'cell_type', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster'
    var: 'mean', 'std', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'annotation_colors', 'annotation_recluster_colors', 'batch_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'umap'
    obsm: 'X_CCA', 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts', 'logcounts'
    obsp: 'connectivities', 'distances'