Usage of functions#

In this tutorial we compile examples on how to use the differnt functions available in the package. You can refer to previous tutorial to see how to use CellBender and perform quality control and basic downstream analysis of sc/snRNA-seq. We are going to use the object generated in the Quality control of sc/snRNA-seq.

Environment setup#

# Set-up
import anndata as ad
import dotools_py as do
import session_info


adata = ad.read_h5ad("/Users/david/Downloads/Data10x/adata.h5ad")
adata
2025-10-22 17:13:17,278 - Jupyter enviroment detected. Using "inline" backend
AnnData object with n_obs × n_vars = 2783 × 18517
    obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score', 'leiden', 'autoAnnot', 'celltypist_conf_score', 'annotation', 'annotation_recluster'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'annotation_recluster_colors', 'hvg', 'leiden', 'log1p', 'neighbors', 'umap'
    obsm: 'X_CCA', 'X_pca', 'X_umap'
    layers: 'counts', 'logcounts'
    obsp: 'connectivities', 'distances'

Dotplot#

The dotplot function builds on scanpy Dotplot method, but we also allow for the visualisation of 3 variables at the same time. For example, we can visualise the expression of a gene across celltypes and conditions

do.pl.dotplot(adata, x_axis="condition", features="NKG7", y_axis="annotation_recluster", figsize=(3, 4))
../_images/7ece238a6ffa0a93819b72abb9185ede90b274d479a892a425cda47417855b5b.png

We can also perform differential testing between groups.

do.pl.dotplot(adata, x_axis="condition", features="NKG7", y_axis="annotation_recluster", figsize=(3, 4), add_stats="x_axis")
2025-10-22 16:45:41,616 - Error while testing: division by zero
../_images/ddb3cd908063fb1be7b8fb449237fe464fdac9557fbf9957d821020cd50e1cb7.png

Heatmap#

The heatmap functions allow for the visualisation of the mean expression of genes across different groups. We can also test for significance and identify the group with the highest expression compared to the rest

do.pl.heatmap(
    adata,
    group_by="annotation_recluster",
    features=["CD79A", "NKG7", "CD3D", "CD8A", "CD14"],
    add_stats=True,
    xticks_rotation=45,
)
../_images/8f6af61fb9e44139695e2180f3f0ba2c0cc3636446b9320a61d399e7d06412fe.png

UMAP#

With do.pl.umap() we can visualise in UMAP embeddings the expression of genes as well as metadata information

do.pl.umap(adata, "NKG7", split_by="condition", share_legend=True, size=20, labels="annotation_recluster", figsize=(12, 6))
../_images/5b35bcd66731dd0f10310888dd49b4338a4f9d40895af49c8b6bfc526a2a0056.png

Split embeddings#

If we have categorical metadata we can visualise it also highlighting the different categories in each subplot

do.pl.split_embeddding(adata, split_by="annotation_recluster", ncols=4, figsize=(12, 6))
../_images/490f28e94217952b0ed073060095c183cf86f4b8523249e201028491bab7fb36.png

Changes in cell proportion#

As shown in the tutorial, we can also test for significant changes in cell population.

do.pl.cell_composition(
    adata, "annotation_recluster", "condition", "batch", condition_order=["healthy", "disease"], transform="arcsin"
)
[INFO] Your data doesn't have replicates! Artificial replicates will be simulated to run scanpro.
[INFO] Simulation may take some minutes...
[INFO] Generating 3 replicates and running 100 simulations...
[INFO] Finished 100 simulations in 1.09 seconds
2025-10-22 16:46:36,057 - There are 5 populations with a significant change
../_images/30ce2392a9099f0d3a02941c9d74e0755c6cdfa25c4471f72f9e35eed2ca5a4e.png

Expression of genes and continuous metadata#

We can visualise the average expression of a gene in a celltype or continuous metadata information across condition with barplots, violinplots and boxplots. Additionally, we can test for significance.

nk = adata[adata.obs.annotation == "NK"]
do.pl.violinplot(nk, feature="NKG7", x_axis="condition", reference="healthy", groups="disease", figsize=(5, 6))
../_images/7da1e97ec85a3800c9d59b1ec506999e9957e873cddeedd229316eff7293748f.png
do.pl.barplot(nk, feature="NKG7", x_axis="condition", reference="healthy", groups="disease", figsize=(5, 6))
../_images/b2714a316edb44b3224d80544c9827e79e5a6145ab0031e8cb4e4c9ca78f8066.png
do.pl.boxplot(nk, feature="NKG7", x_axis="condition", reference="healthy", groups="disease", figsize=(5, 6))
../_images/7a53389f8d2a0fbc7553b603a8a384e1f88d80fd0e970d01fb3abb304a383dd7.png
do.pl.boxplot(
    adata,
    "annotation",
    "total_counts",
    reference="B_cells",
    groups=["Monocytes", "NK", "T_cells", "pDC"],
    figsize=(6, 8),
    ylabel="Total counts per cell",
)
../_images/e4d9304c809c91cc5c41603b6dfe176d0bde982360ca7b0c4cbe29bca1c8cd26.png
do.pl.lineplot(adata,
               x_axis="condition",
               features="NKG7",
               hue="annotation",
               xticks_order=["healthy", "disease"],
               )
{'mainplot_ax': <Axes: title={'center': 'NKG7'}, ylabel='LogMean(nUMI)'>,
 'legend_ax': <Axes: >}
../_images/8d06e7129d4aff6b17a7edcb1c7b044b852b67142c943808f07fbeb79deebe49.png

Correlation between condition#

We can also visualise the overall correlation between categorical metadata such as sample or condition.

do.pl.correlation(
    adata,
    group_by="batch",
    method="pearson",
    mask="upper",  # Hide the upper triangle
    mode="letters",  # alternative use colors
)
../_images/67f3e33bba283cb10550e5ad0649c69c17964a1e13bb0522ed7d44b2b614d14a.png
session_info.show(na=False, cpu=True, excludes=["backports"], std_lib=True, dependencies=True, html=True)
Click to view session information
-----
anndata             0.11.4
dotools_py          0.0.1
session_info        v1.0.1
sys                 3.10.16 (main, Dec 11 2024, 10:22:29) [Clang 14.0.6 ]
-----
Click to view modules imported as dependencies
Cython                      3.0.12
PIL                         11.2.1
adjustText                  1.3.0
appnope                     0.1.2
argparse                    1.1
arrow                       1.3.0
attr                        25.3.0
attrs                       25.3.0
babel                       2.16.0
bbknn                       1.6.0
brotli                      1.0.9
celltypist                  1.6.3
certifi                     2025.04.26
cffi                        1.17.1
chardet                     4.0.0
charset_normalizer          3.3.2
cloudpickle                 3.1.1
comm                        0.2.1
csv                         1.0
ctypes                      1.1.0
cycler                      0.12.1
cython                      3.0.12
dask                        2024.11.2
dateutil                    2.9.0.post0
debugpy                     1.8.11
decimal                     1.70
decorator                   5.1.1
defusedxml                  0.7.1
distutils                   3.10.16
doubletdetection            4.3
exceptiongroup              1.2.0
executing                   0.8.3
formulaic                   1.1.1
formulaic_contrasts         1.0.0
gseapy                      1.1.8
h5py                        3.13.0
idna                        3.7
igraph                      0.11.8
interface_meta              1.3.0
ipaddress                   1.0
ipykernel                   6.29.5
ipywidgets                  8.1.7
jedi                        0.19.2
jinja2                      3.1.6
joblib                      1.4.2
json                        2.0.9
json5                       0.9.25
jsonpointer                 2.1
jsonschema                  4.23.0
jupyter_events              0.12.0
jupyter_server              2.15.0
jupyterlab_server           2.27.3
kiwisolver                  1.4.8
leidenalg                   0.10.2
llvmlite                    0.44.0
logging                     0.5.1.2
markupsafe                  3.0.2
marshal                     4
matplotlib                  3.10.0
matplotlib_inline           0.1.6
more_itertools              10.3.0
msgpack                     1.1.0
natsort                     8.4.0
nbformat                    5.10.4
numba                       0.61.0
numcodecs                   0.13.1
numpy                       1.26.4
packaging                   24.2
pandas                      2.2.3
parso                       0.8.4
patsy                       1.0.1
phenograph                  1.5.7
platform                    1.0.8
platformdirs                4.3.7
pluggy                      1.5.0
polars                      1.30.0
prompt_toolkit              3.0.43
psutil                      5.9.0
pure_eval                   0.2.2
pyarrow                     19.0.1
pycparser                   2.21
pydeseq2                    0.5.1
pygments                    2.19.1
pynndescent                 0.5.13
pyparsing                   3.2.3
pytz                        2025.2
re                          2.2.1
requests                    2.32.3
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
scanpy                      1.11.1
scipy                       1.15.2
seaborn                     0.13.2
setuptools                  75.8.0
simplejson                  3.20.1
six                         1.17.0
sklearn                     1.5.2
sniffio                     1.3.0
socketserver                0.4
socks                       1.7.1
sparse                      0.16.0
sqlite3                     2.6.0
stack_data                  0.2.0
statsmodels                 0.14.4
stdlib_list                 0.11.1
tarfile                     0.9.0
tblib                       3.1.0
texttable                   1.7.0
threadpoolctl               3.6.0
tlz                         1.0.0
tomli                       2.0.1
toolz                       1.0.0
torch                       2.6.0
tornado                     6.5.1
tqdm                        4.67.1
traitlets                   5.14.3
umap                        0.5.7
urllib3                     2.3.0
wcwidth                     0.2.5
websocket                   1.8.0
wrapt                       1.17.2
yaml                        6.0.2
zarr                        2.18.3
zlib                        1.0
zmq                         26.2.0
zstandard                   0.23.0
-----
IPython             8.30.0
jupyter_client      8.6.3
jupyter_core        5.7.2
jupyterlab          4.3.4
notebook            7.3.2
-----
Python 3.10.16 (main, Dec 11 2024, 10:22:29) [Clang 14.0.6 ]
macOS-15.5-arm64-arm-64bit
10 logical CPU cores, arm
-----
Session information updated at 2025-07-08 17:11