dotools_py.bm.kbet

Contents

dotools_py.bm.kbet#

dotools_py.bm.kbet(adata, batch_key='batch', annotation_key='annotation', integration_type='knn', use_rep=None, scale=True, get_data=True)[source]#

kBET score.

Compute the average of k-nearest neighbor batch effect test (kBET) score per annotation. This is a wrapper function of the implementation by Büttner et al. 2019. kBET measures the bias of a batch variable in the kNN graph. Specifically, kBET is quantified as the average rejection rate of Chi-squared tests of local vs global batch label distributions. This means that smaller values indicate better batch mixing. By default, the original kBET score is scaled between 0 and 1 so that larger scores are associated with better batch mixing.

Parameters:
adata AnnData

Annotated data matrix.

batch_key str (default: 'batch')

Column in adata.obs with batch information.

annotation_key str (default: 'annotation')

Column in adata.obs with cell type or cluster information.

integration_type Literal['embedding', 'knn', 'full'] (default: 'knn')

Type of data integration. If set to knn it will take the neighborhood present in the object. If set to embedding it will recompute the neighborhood based on use_rep and if set to full it will recompute PCA use the PCA embedding for the neighborhood graph.

use_rep str (default: None)

Representation to use to compute neighborhood when integration_type is set to embedding.

scale bool (default: True)

Re-scale output values between 0 and 1 (True/False)

get_data bool (default: True)

If it is set to True it also returns a pd.DataFrame with kBET observed rejection rater per cluster

Return type:

float | DataFrame

Returns:

Returns de kBET score (average of kBET per label) based on observed rejection rate. If get_data is set to True it returns a pd.DataFrame with kBET observed rejection rater per cluster

Examples

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> do.bm.kbet(adata, "batch", "annotation")  # Estimation of score per cell type
    Out[23]:
     cluster      kBET
0    B_cells  1.000000
1  Monocytes  1.000000
2         NK  1.000000
3    T_cells  0.323617
4        pDC  1.000000
>>> kbet(adata, "batch", "annotation", get_data=False)  # Estimation of score
Out[24]: np.float64(0.13540425531914901)