dotools_py.get.subset_df#
- dotools_py.get.subset_df(df, col=None, col_groups=None, comparison='include')[source]#
Subset Pandas DataFrame.
Subset a Pandas DataFrame based on one or multiple columns. To subset for multiple columns, provide a list to
colandcol_groupsin the same order. For example, you have a dataframedfwith two columns:sexthat conteins “male” and “female” andagewith values from 20 to 90. You can subset to select only males below 50 providing:col = ["sex", "age"]andcol_groups = ["male", 50]and then specifying the type of comparison for each column:comparison = ["include", "<"]. More examples in the section below.- Parameters:
- df
DataFrame Pandas DataFrame
- col
str|list|None(default:None) Name of the columns to use for subseting. To subset based on multiple columns provide a list.
- col_groups
str|list|float|bool|None(default:None) Values to use for subsetting each column. If several columns are provided, provide a list.
- comparison
Union[Literal['>=','>','==','<','<=','include','exclude'],list] (default:'include') How the subset will be performed. If several columns are provided, provide a list.
- df
- Return type:
- Returns:
Returns a pandas DataFrame subsetted.
Examples
>>> import dotools_py as do >>> adata = do.dt.example_10x_processed() >>> df = adata.obs.copy() >>> df.shape (700, 22) >>> df_subset = subset_df(df, col=["condition", "annotation"], col_groups=["healthy", "NK"], comparison="include") >>> df_subset.shape (111, 22) >>> df_subset.value_counts(["condition", "annotation"]) condition annotation healthy NK 111 Name: count, dtype: int64 >>> df_subset = subset_df(df, col=["total_counts", "annotation"], col_groups=[1000, ["B_cells", "T_cells"]], comparison=["<", "include"] ) >>> df_subset.shape (1, 22) >>> df_subset.head(5)[["total_counts", "annotation"]].sort_values("total_counts", ascending=False) total_counts annotation GGAGAACCAAAGCTCT-1-batch2 938.0 T_cells