dotools_py.bm.silhouette_batch

dotools_py.bm.silhouette_batch#

dotools_py.bm.silhouette_batch(adata, batch_key, annotation_key, use_rep, metric='euclidean', scale=True, get_all=False)[source]#

Batch ASW.

Modified average silhouette width (ASW) of batch This metric measures the silhouette of a given batch. It assumes that a silhouette width close to 0 represents perfect overlap of the batches, thus the absolute value of the silhouette width is used to measure how well batches are mixed. If scale is set to True, the absolute ASW per group is subtracted from 1 before averaging, so that 0 indicates suboptimal label representation and 1 indicates optimal label representation.

Parameters:
adata AnnData

Annotated data matrix.

batch_key str

Column in adata.obs with batch information.

annotation_key str

Column in adata.obs with cell type or cluster information.

use_rep str

Column in adata.obsm with the embedding.

metric str (default: 'euclidean')

See sklearn.silhouette_score

scale bool (default: True)

If set to True, scale the values between 0 and 1

get_all bool (default: False)

If set to True returns the silhouette score, the average silhouette score per cluster and all the silhouette scores.

Return type:

float | tuple

Returns:

Returns 1) the average width silhouette 2) the average silhouette score per cluster and 3) all silhouette scores if get_all is set to True, otherwise returns the average width silhouette (ASW).

Examples

>>> import dotools_py as do
>>> adata = do.dt.example_10x_processed()
>>> do.bm.silhouette_batch(adata, batch_key="batch", annotation_key="annotation", use_rep="X_CCA")
Out[63]: np.float64(0.8107897347900055)
>>> score, mean_score, all_scores = do.bm.silhouette_batch(adata, batch_key="batch", annotation_key="annotation", use_rep="X_CCA", get_all=True)
>>> mean_score
               silhouette_score
group
B_cells            0.795807
Monocytes          0.603867
NK                 0.878482
T_cells            0.961296
pDC                0.814496