Quality control of sc/snRNA-seq#

To perform quality control of single-cell or single-nuclei RNA sequencing (sc/snRNA-seq) we have the dotools_py.pp.importer_py() function. This function compiles different methods to process samples. We need to define a list of H5 files that have been generated from a mapping tool like CellRanger or STARsolo. If CellBender has been run, we can also provide the path to the H5 files generated by CellBender. Additionally, we need to provide the batch names for the samples, and additional metadata can be provided in the form of a dictionary.

The quality control involves filtering genes and cells:

  • Genes: are removed based on their expression levels. Genes expressed in low amount of cells are excluded. By default, we consider that a gene is excluded if it is expressed in less than 5 cells

  • Cells: are removed based on different parameters, including: number of genes, mitochondrial content, doublets and number of UMI counts.

    • Mitochondrial content: cells with high mitochondrial content are excluded. We recomment assuming a maximum un 5 % for scRNA-seq and 3% for snRNA-seq

    • Doublets: we implemented three different approaches for the identification of neotypic doublets (i.e, doublets originating from the combination of two or more different cell-types). The available implementations are scDblFinder, DoubletDetection and Scrublet.

    • Number of genes: cells are removed by absolute number of genes. A lower and upper threshold can be set.

    • Number of UMI counts: cells can be removed using two approaches: absolute or quantiles. A lower and upper threshold can be set and a combination of both approaches can be used (e.g., an absolute lower threshold and filter cells on the upper quantile).

After the quality control per sample, the individual samples will be combined into one AnnData object and log-normalisation, scaling and highly variable genes will be calculated. To evaluate the quality control the distribution of total UMI, number of genes and mitochondrial genes per cell will be plotted in a violin plot before and after the quality control. These plots will be saved in the folders containing the H5 files. Additionally, we also keep track on the number of cells and genes that have been removed in each quality control step.

First, we start setting up the environment and loading the required libraries

Environment setup#

import dotools_py as do
import pandas as pd
from IPython.display import display, SVG
import session_info
2025-10-22 16:16:29,365 - Jupyter enviroment detected. Using "inline" backend

To show how the quality control works, we are going to use a public dataset from 10X from human blood of healthy and donors with a malignant tumor. We get the raw and the filtered H5 files generated by 10X.

do.dt.example_10x("/Users/david/Downloads/PublicData10x")
2025-10-22 16:16:29,452 - Downloading data to /Users/david/Downloads/PublicData10x

Sequential preprocessing#

paths = [
    "/Users/david/Downloads/PublicData10x/healthy/outs/filtered_feature_bc_matrix.h5",
    "/Users/david/Downloads/PublicData10x/disease/outs/filtered_feature_bc_matrix.h5",
]

adata = do.pp.importer_py(
    paths=paths,
    ids=["batch1", "batch2"],
    metadata={"condition": ["healthy", "disease"]},  # Additional metadata information
    batch_key="batch",  # Column in obs to save batch information
    remove_doublets=True,
    doublet_tool="scDblFinder",  # Tool to identify neotypic doublets (Also available Scrublet and DoubletDetection)
    min_genes_in_cell=300,
    min_cells_with_genes=5,
    cut_mt=5,
    n_reads=10_000,
    min_counts=500,  # Filter cells with less than 500 genes
    high_quantile=95,  # Filter cells with the top 5% high number of UMI counts
)
2025-10-22 16:16:33,139 - Reading batch1
2025-10-22 16:16:33,896 - Remove Cells with low number of genes
2025-10-22 16:16:33,943 - Remove Genes lowly expressed
2025-10-22 16:16:34,007 - Remove cells with high MT-content
2025-10-22 16:16:34,025 - Remove cells based on nUMI counts
2025-10-22 16:16:34,340 - Finding Neotypic doublets
2025-10-22 16:16:34,409 - Running scDblFinder
2025-10-22 16:16:51,987 - Remove 86 doublets
2025-10-22 16:16:52,346 - Reading batch2
2025-10-22 16:16:52,840 - Remove Cells with low number of genes
2025-10-22 16:16:52,878 - Remove Genes lowly expressed
2025-10-22 16:16:52,932 - Remove cells with high MT-content
2025-10-22 16:16:52,937 - Remove cells based on nUMI counts
2025-10-22 16:16:52,944 - Finding Neotypic doublets
2025-10-22 16:16:52,978 - Running scDblFinder
2025-10-22 16:17:09,388 - Remove 26 doublets
2025-10-22 16:17:09,571 - Concatenating samples
2025-10-22 16:17:09,653 - Normalisation of the expression
2025-10-22 16:17:09,675 - Finding Highly Variable Genes shared across samples
2025-10-22 16:17:09,970 - Run PCA
adata
AnnData object with n_obs × n_vars = 2783 × 18517
    obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg'
    obsm: 'X_pca'
    layers: 'counts', 'logcounts'

Evaluation of the preprocessing#

We can now check the quality control plots that were generated:

files = [
    "/Users/david/Downloads/PublicData10x/healthy/outs/Vln_PreQC_batch1.svg",
    "//Users/david/Downloads/PublicData10x/healthy/outs/Vln_PostQC_batch1.svg",
    "/Users/david/Downloads/PublicData10x/healthy/outs/251022_QC_Metricsbatch1.svg"
]

display(
    SVG(files[0]),
    SVG(files[1]),
    SVG(files[2]),
)
../_images/e2b65c676cb011f6df16f85c8557778a717072528f4b12c4b33b5ee75c64567d.svg ../_images/39b52374feddf6c519a1cf4c836f34ce840fddd61909f52224b6a19a05ee4b73.svg ../_images/4e4919ddf776fe402ed2a2823ef775296457012a44715369ae31f0e48d835033.svg

We can observe that the majority of cells were remove due to high mitochondrial content. Depending on the experimental set-up we might want to increase the threshold of mitochondrial content if we do not want to lose too many cells. Besides these plots, we also have an ExcelSheet that kept track on the thresholds used during the quality control.

table = pd.read_excel("/Users/david/Downloads/PublicData10x/healthy/outs/251022_Metrics_batch1.xlsx")
table
QC_Step nCells nFeatures Comments
0 Input_Shape 7865 33538 NaN
1 Rm_Cells_lowGenes 7851 33538 Remove cells with <300 genes
2 Rm_Genes_lowCells 7851 16844 Remove genes express in less than 5 cells
3 Rm_Cell_HighMT 2254 16844 Remove cells with >5% of Mitochondrial genes
4 Rm_Cells_nUMI_nGenes 2141 16844 Remove cells based on nUMI counts[Absolute (Mi...
5 Rm_Doublets 2055 16844 Remove neotypic doublets using scDblFinder

Integration and clustering#

After the quality control, we can now proceed to the batch correction and integration of the samples. For these, we can use different batch correction methods: Harmony, Scanorama, BBKNN, scVI or CCA from Seurat (v4 or v5 approach). After the integration of the samples, we run the Leiden algorithm to find clusters and generate the UMAP embeddings for visualisation.

do.tl.integrate_data(
    adata,
    batch_key="batch",
    hvg_batch=True,
    cca5=True,
    resolution=0.3,  # Resolution for leiden algorithm
)
2025-10-22 16:17:10,416 - Computing HVGs
2025-10-22 16:17:11,051 - Integration using CCA (Seurat v5 approach)
2025-10-22 16:17:11,053 - Preprocessing to export to Seurat
2025-10-22 16:17:11,069 - Running CCA Integration
                          integratedcca_1 integratedcca_2 integratedcca_3
AAACCCAGTGCATTTG-1-batch1      -29.240194       0.4496809        4.260958
AAACCCATCTCAACGA-1-batch1        4.484761      -1.2218207        4.033311
AAACCCATCTCTCGAC-1-batch1        4.340738      -1.4006172        3.692713
2025-10-22 16:17:31,545 - Loading corrected matrix
2025-10-22 16:17:31,577 - Finding neighbors
2025-10-22 16:17:33,748 - Run UMAP
2025-10-22 16:17:36,709 - Clustering cells using Leiden (resolution 0.3)

We can observe, that after the integration we have X_CCA in obsm. This is the CCA matrix after dimensionality reduction. Contrary to the approach in Seurat4 where the dimensions of this matrix is n_cells x n_hvg, in this case the dimension is n_cells x 50

adata.obsm["X_CCA"].shape
(2783, 50)

Evaluation of integration#

We can now visualise the integrated object and the identified clusters:

do.pl.split_embeddding(adata, "batch", figsize=(8, 5))
do.pl.umap(adata, "leiden", labels="leiden", figsize=(6, 5))
../_images/f7a955ff90489965a171f3a60480025b274b42399193966be7764d5ddf032927.png ../_images/68404bc91fe8c4237c46a7087d7dd4647a1788988a16eba551de7b9f67d679cb.png
adata.write("/Users/david/Downloads/PublicData10x/adata.h5ad")

Semi-automatic annotation with CellTypist#

We also have the possibility to perform a semi-automatic annotation using CellTypist. In this case, we use the Adult_COVID19_PBMC.pkl model.

do.tl.auto_annot(adata, "leiden", model="Healthy_COVID19_PBMC.pkl", convert=False, pl_cell_prob=True)
../_images/759b98bbb356ea788ac3eeef49ba3a837ea076bfdb98978d5a95abf8b56d755b.png
do.pl.umap(adata, "leiden", labels="autoAnnot")
../_images/0a0a0fb11408492ba00270f0b35a12dafbe2d2de63d9afaae946f9aa26b2aec8.png

Besides the semi-automatic annotation, we should also validate the findings with known markers for these celltypes.

markers = {
    "ImmuneCells": ["PTPRC"],
    "B_cells": ["CD79A", "BANK1", "MS4A1"],
    "T_cells": ["CD3E", "CD4", "IL7R"],
    "NK": ["NKG7", "KLRD1"],
    "Myeloid": ["CD68", "CD14", "ITGAM"],
    "pDC": ["LILRA4", "CLEC4C", "LRRC26"],
}

do.pl.dotplot(adata, "leiden", markers, swap_axes=False, var_group_rotation=90)
../_images/56dbe49938a2f9672ef808871ce897f7190a228857968070516bba03f44d0cf7.png

Overall we can see an agreement with the annotation and can continue with the annotation.

adata.obs["annotation"] = adata.obs.leiden.map(
    {"0": "Monocytes", "1": "T_cells", "2": "T_cells", "3": "NK", "4": "B_cells", "5": "pDC"}
)
do.pl.umap(adata, "annotation", labels="annotation")
../_images/a7faabe16416f6bf44047861beeecb02e0dcc226a0c56295410e038db0367f49.png

Evaluate changes in cell population#

After the annotation of the cell-type populations, we can also evaluate if there are significant changes in these populations in the healthy and diseased condition using scanpro.

do.pl.cell_composition(
    adata,
    annot_key="annotation",
    cond_key="condition",
    batch_key="batch",
    transform="arcsin",  # Produce more accurate results for simulated data
    condition_order=["healthy", "disease"],
)
[INFO] Your data doesn't have replicates! Artificial replicates will be simulated to run scanpro.
[INFO] Simulation may take some minutes...
[INFO] Generating 3 replicates and running 100 simulations...
[INFO] Finished 100 simulations in 0.99 seconds
2025-10-22 16:17:55,491 - There are 3 populations with a significant change
../_images/24e80801a7890d4246fc682628cc0889e75a76a29c08b067e26fbb9777f7f299.png

Cell populations with a significant change are connected by discontinued lines and the p-value is indicated in the legend. In this case, we see a significant change in B cells, Monocytes and NK cells.

Reclustering of a cell population#

If we are interested in specific states for a cell-type, we can also perform re-clustering. In this case, we are going to focus on the biggest cluster, the T cells.

tcell = do.tl.reclustering(
    adata,
    cluster_key="annotation",  # Metadata column with clusters
    batch_key="batch",  # Metadata column with batch information
    recluster_apporach="cca5",  # Integration approach used
    use_clusters=["T_cells"],  # Cluster we want to re-cluster
    use_rep="X_CCA",  # representation to use
    get_subset=True,  # Get AnnData of T_cells re-clusters
    resolution=0.6,
)
do.pl.umap(tcell, "leiden")
2025-10-22 16:22:43,257 - annotation_recluster will be overwritten
2025-10-22 16:22:43,268 - Reclustering using CCA5 approach
../_images/fab14e4bc7fd1a94c85d867070f23482c92d4db1863b651a5763e383aef9eb15.png

We identified 5 clusters, to evaluate if there are subtypes of T_cells we can identify the top markers for each cluster.

do.tl.rank_genes_groups(tcell, groupby="leiden", method="wilcoxon", tie_correct=True, pts=True)
table = do.get.dge_results(tcell)
table_filt = table[(table.log2fc > 0.25) & (table.padj < 0.05)]

for group in table_filt.group.unique():
    display(table_filt[table_filt.group == group].head(6))
group GeneName wilcox_score log2fc pvals padj pts_group pts_ref
0 0 RPS3A 24.965307 0.591396 1.456458e-137 2.696923e-133 0.998952 0.995619
1 0 RPS13 24.810595 0.664970 6.890172e-136 4.252844e-132 0.997904 0.995619
2 0 RPL30 22.808558 0.455121 3.770908e-115 1.396518e-111 1.000000 0.995619
3 0 RPL32 22.482384 0.463160 6.173390e-112 1.905211e-108 1.000000 0.996714
4 0 RPS23 22.206753 0.486430 2.955463e-109 7.818045e-106 1.000000 0.995619
5 0 RPS27A 21.341381 0.439174 4.688915e-101 1.085308e-97 1.000000 0.995619
group GeneName wilcox_score log2fc pvals padj pts_group pts_ref
18517 1 LINC02446 34.184128 6.524057 4.162212e-256 7.707169e-252 0.821918 0.009476
18518 1 CD8B 32.883751 5.815713 3.752366e-237 3.474128e-233 0.958904 0.025084
18519 1 CD8A 29.624027 5.087067 7.329576e-193 4.524059e-189 0.808219 0.021182
18520 1 CD8B2 15.841774 7.049018 1.602366e-56 7.417755e-53 0.164384 0.001115
18521 1 CTSW 14.294889 2.647650 2.354791e-46 8.720735e-43 0.726027 0.129877
18522 1 S100B 11.258095 5.809854 2.112769e-29 6.520357e-26 0.123288 0.003344
group GeneName wilcox_score log2fc pvals padj pts_group pts_ref
37034 2 ANXA1 22.154280 2.242573 9.486497e-109 1.756615e-104 0.744783 0.242765
37035 2 B2M 20.324232 0.452009 7.850685e-92 7.268556e-88 1.000000 0.999196
37036 2 S100A4 20.198523 1.883775 1.008713e-90 6.226114e-87 0.855538 0.509646
37037 2 ITGB1 18.013584 1.975382 1.524329e-72 4.704333e-69 0.568218 0.164791
37038 2 S100A11 17.528395 1.584508 8.698970e-69 2.301126e-65 0.691814 0.295016
37039 2 ANXA2 16.908760 2.318137 3.877802e-64 7.978363e-61 0.399679 0.072347
group GeneName wilcox_score log2fc pvals padj pts_group pts_ref
55551 3 IKZF2 21.439062 3.990067 5.776706e-102 1.069673e-97 0.471503 0.034050
55552 3 RTKN2 19.659430 3.815890 4.800974e-86 4.444981e-82 0.455959 0.043011
55553 3 TIGIT 18.316952 2.897526 6.061429e-75 3.741316e-71 0.544041 0.080645
55554 3 FOXP3 17.922539 4.867512 7.865529e-72 3.641150e-68 0.284974 0.013740
55555 3 CTLA4 16.903746 2.850002 4.222203e-64 1.563651e-60 0.497409 0.080048
55556 3 PMAIP1 15.972801 2.978600 1.977037e-57 6.101466e-54 0.528497 0.109916
group GeneName wilcox_score log2fc pvals padj pts_group pts_ref
74068 4 MYL9 23.222256 31.723900 2.713621e-119 5.024813e-115 0.291667 0.000000
74069 4 PTCRA 21.493868 31.799917 1.776754e-102 1.645008e-98 0.250000 0.000000
74070 4 SPIB 20.267651 6.568335 2.482264e-91 1.532136e-87 0.375000 0.003256
74071 4 ACY3 19.442804 6.555577 3.353267e-84 1.552311e-80 0.416667 0.005426
74072 4 LINC01857 19.355385 6.336043 1.836193e-83 6.800157e-80 0.333333 0.002713
74073 4 GSN 19.210054 7.703021 3.049471e-82 9.411177e-79 0.458333 0.007596

From the list of markers, cluster 3 seems to express markers for T regulatory cells. while cluster 1 seems to be enriched in cytotoxic markers. We can visualise the distribution of these genes.

do.pl.umap(tcell, ["FOXP3", "CTLA4", "CD8A", "GZMK"], ncols=2, labels="leiden")
../_images/51c24f7a48699bf665aa77bec5a0ad56ac541a5eb0184a7696727d6240f36aae.png

From this list of markers, we can see that cluster 1 is enriched for cytotoxic markers. We can transfer this annotation to our original object and evaluate again changes in the cell population.

tcell.obs["annotation_recluster"] = tcell.obs.leiden.map(
    {"0": "T_cells", "1": "T_cytotoxic", "2": "T_cells", "3": "Tregs", "4": "T_cells"}
)
adata.obs["annotation_recluster"] = adata.obs["annotation"].copy()
do.utility.transfer_labels(
    adata_original=adata,
    adata_subset=tcell,
    original_key="annotation_recluster",
    subset_key="annotation_recluster",
    original_labels=["T_cells"],
)
do.pl.umap(adata, "annotation_recluster", labels="annotation_recluster")
../_images/0c42f9fabc8e11b9646bc1b0aad50d00c970101fb43626ab2a0e8bd5c9c19fee.png
do.pl.cell_composition(
    adata,
    annot_key="annotation_recluster",
    cond_key="condition",
    batch_key="batch",
    transform="arcsin",
    condition_order=["healthy", "disease"],
)
[INFO] Your data doesn't have replicates! Artificial replicates will be simulated to run scanpro.
[INFO] Simulation may take some minutes...
[INFO] Generating 3 replicates and running 100 simulations...
[INFO] Finished 100 simulations in 1.37 seconds
2025-10-22 16:29:48,075 - There are 5 populations with a significant change
../_images/30ce2392a9099f0d3a02941c9d74e0755c6cdfa25c4471f72f9e35eed2ca5a4e.png

We can see that even though there is a decrease in the proportion of T_cytotoxic, the change is not significant. On the other hand, the regulatory T cells increase significantly.

Gene Ontology analysis#

We can also evaluate which biological processes are enriched in a cell-type in each condition by performing gene ontology analysis. First, we need to identified differentially expressed genes. We are going to focus on T cells. We can use do.tl.go_analysis() to run gene set analysis using the enrichR API. This function, will split differentially express genes in up- and down-regulated and run the analysis for each set.

tcell = adata[adata.obs.annotation == "T_cells"]
do.tl.rank_genes_groups(
    tcell, groupby="condition", method="wilcoxon", tie_correct=True, pts=True, reference="healthy", groups=["disease"]
)
table = do.get.dge_results(tcell)
df = do.tl.go_analysis(
    table,
    gene_key="GeneName",
    pval_key="padj",
    log2fc_key="log2fc",
    log2fc_cutoff=0.25,  # It will take -0.25 and +0.25
    specie="Human",
    go_catgs=["GO_Biological_Process_2023"],
)
df.head(10)
2025-10-22 16:31:14,783 - Running GSA on Up- and Down-regulated genes
Gene_set Term Overlap P-value Adjusted P-value Old P-value Old Adjusted P-value Odds Ratio Combined Score Genes state
0 GO_Biological_Process_2023 Regulation Of Apoptotic Process (GO:0042981) 115/705 3.547929e-11 1.465882e-07 0 0 2.105323 50.658438 TFRC;ARL6IP1;CIB1;FAIM2;TNF;IKZF3;CCND2;EPC1;P... enriched
1 GO_Biological_Process_2023 Positive Regulation Of Cytokine Production (GO... 65/320 9.629913e-11 1.465882e-07 0 0 2.722232 62.784364 IL21;ITK;CD40;CD80;RORA;PTPN22;TNF;PNP;PDE4B;C... enriched
2 GO_Biological_Process_2023 Regulation Of Gene Expression (GO:0010468) 162/1127 1.092209e-10 1.465882e-07 0 0 1.827286 41.913650 ZNF331;TFRC;NAB1;NAB2;JMJD1C;RORA;PRDM2;AHR;NR... enriched
3 GO_Biological_Process_2023 Regulation Of B Cell Proliferation (GO:0030888) 20/44 1.404102e-10 1.465882e-07 0 0 8.779383 199.173072 IL21;IL10;LYN;VAV3;CD74;CD40;MEF2C;TFRC;TNFRSF... enriched
4 GO_Biological_Process_2023 Positive Regulation Of Apoptotic Process (GO:0... 57/270 2.948489e-10 2.462578e-07 0 0 2.851035 62.564711 TOP2A;PRR7;BTG1;CTSV;TNF;ADAMTSL4;CTSL;CASP3;P... enriched
5 GO_Biological_Process_2023 Response To Unfolded Protein (GO:0006986) 18/44 9.314947e-09 6.483203e-06 0 0 7.284420 134.700912 HSPA8;PTPN1;HSP90AA1;HSP90AB1;HSPA4;RHBDD1;HSP... enriched
6 GO_Biological_Process_2023 Regulation Of DNA-templated Transcription (GO:... 237/1922 1.516467e-08 8.673926e-06 0 0 1.539930 27.725356 ZNF296;JMJD1C;IKZF2;IKZF3;BACH1;IKZF5;SPIB;GPB... enriched
7 GO_Biological_Process_2023 Negative Regulation Of Apoptotic Process (GO:0... 80/482 1.661672e-08 8.673926e-06 0 0 2.126851 38.097973 ARF4;TFRC;ARL6IP1;CITED2;CIB1;FAIM2;MTRNR2L8;T... enriched
8 GO_Biological_Process_2023 Regulation Of B Cell Activation (GO:0050864) 13/25 3.212989e-08 1.443806e-05 0 0 11.374688 196.252944 IL10;FCRL3;TNFAIP3;IKZF3;SAMSN1;SUPT6H;ZFP36L2... enriched
9 GO_Biological_Process_2023 Antigen Receptor-Mediated Signaling Pathway (G... 33/134 3.457389e-08 1.443806e-05 0 0 3.453623 59.333826 IGHM;ITK;PTPN22;PTPRJ;MALT1;CD79A;IGHG1;CD19;Z... enriched

We can visualise the top terms enriched in each condition with do.pl.split_bar_gsea(). But we need to do a pre-filtering to only consider significant terms.

df_filt = df[df["Adjusted P-value"] < 0.05]
do.pl.split_bar_gsea(
    df_filt,
    term_col="Term",
    col_split="Combined Score",  # Column to use for the x-axis
    cond_col="state",  # Column that splits the up and down-regulated terms
    pos_cond="enriched",  # value in cond_col that should be in the positive axis
)
2025-10-22 16:31:32,711 - !!! Assuming GO Terms are preprocessed (Only Significant terms included)
../_images/869152c01c238b0b0ba95a482aee6768613074a8c516e3e4569d65b281acd8bb.png
adata.write("/Users/david/Downloads/Data10x/adata.h5ad")
session_info.show(na=False, cpu=True, excludes=["backports"], std_lib=True, dependencies=True, html=True)
Click to view session information
-----
anndata             0.11.4
dotools_py          0.0.1
pandas              2.3.2
session_info        v1.0.1
sys                 3.11.13 (main, Jun  5 2025, 08:21:08) [Clang 14.0.6 ]
-----
Click to view modules imported as dependencies
Cython                      3.1.4
PIL                         11.3.0
absl                        2.3.1
adjustText                  1.3.0
appnope                     0.1.4
argparse                    1.1
array_api_compat            1.12.0
arrow                       1.3.0
attr                        25.3.0
attrs                       25.3.0
babel                       2.17.0
celltypist                  1.7.1
certifi                     2025.08.03
cffi                        2.0.0
charset_normalizer          3.4.3
cloudpickle                 3.1.1
comm                        0.2.3
coverage                    7.11.0
csv                         1.0
ctypes                      1.1.0
cycler                      0.12.1
cython                      3.1.4
dask                        2024.11.2
dateutil                    2.9.0.post0
debugpy                     1.8.17
decimal                     1.70
decorator                   5.2.1
defusedxml                  0.7.1
deprecated                  1.2.18
docrep                      0.3.2
doubletdetection            4.3.0.post1
et_xmlfile                  2.0.0
executing                   2.2.1
fsspec                      2025.9.0
geopandas                   1.1.1
gseapy                      1.1.10
h5py                        3.14.0
idna                        3.10
igraph                      0.11.9
ipaddress                   1.0
ipykernel                   6.30.1
ipywidgets                  8.1.7
jedi                        0.19.2
jinja2                      3.1.6
joblib                      1.5.2
json                        2.0.9
json5                       0.12.1
jsonpointer                 3.0.0
jsonschema                  4.25.1
jupyter_events              0.12.0
jupyter_server              2.17.0
jupyterlab_server           2.27.3
kiwisolver                  1.4.9
lark                        1.2.2
leidenalg                   0.10.2
lightning                   2.5.5
lightning_utilities         0.15.2
llvmlite                    0.45.0
logging                     0.5.1.2
markupsafe                  3.0.2
marshal                     4
matplotlib                  3.10.6
matplotlib_inline           0.1.7
ml_collections              1.1.0
mpmath                      1.3.0
mudata                      0.3.2
natsort                     8.4.0
nbformat                    5.10.4
numba                       0.62.0
numcodecs                   0.15.1
numpy                       2.3.3
openpyxl                    3.1.5
opt_einsum                  3.4.0
packaging                   25.0
parso                       0.8.5
patsy                       1.0.1
phenograph                  1.5.7
platform                    1.0.8
platformdirs                4.4.0
polars                      1.33.1
prompt_toolkit              3.0.52
psutil                      7.1.0
pure_eval                   0.2.3
pyarrow                     21.0.0
pycparser                   2.23
pygments                    2.19.2
pynndescent                 0.5.13
pyparsing                   3.2.4
pyproj                      3.7.2
pyro                        1.9.1
pytz                        2025.2
re                          2.2.1
requests                    2.32.5
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
scanpro                     0.4.0
scanpy                      1.11.3
scipy                       1.15.3
scvi                        1.4.0
seaborn                     0.13.2
shapely                     2.1.2
six                         1.17.0
sklearn                     1.7.2
sniffio                     1.3.1
socketserver                0.4
sparse                      0.17.0
sqlite3                     2.6.0
stack_data                  0.6.3
statsmodels                 0.14.5
stdlib_list                 0.11.1
sympy                       1.14.0
tarfile                     0.9.0
texttable                   1.7.0
threadpoolctl               3.6.0
tlz                         1.0.0
toolz                       1.0.0
torch                       2.8.0
torchmetrics                1.8.2
tornado                     6.5.2
tqdm                        4.67.1
traitlets                   5.14.3
umap                        0.5.9.post2
urllib3                     2.5.0
wcwidth                     0.2.13
websocket                   1.8.0
wrapt                       1.17.3
xarray                      2025.9.0
yaml                        6.0.2
zarr                        2.18.7
zlib                        1.0
zmq                         27.1.0
-----
IPython             9.5.0
jupyter_client      8.6.3
jupyter_core        5.8.1
jupyterlab          4.4.7
notebook            7.4.5
-----
Python 3.11.13 (main, Jun  5 2025, 08:21:08) [Clang 14.0.6 ]
macOS-26.0.1-arm64-arm-64bit
10 logical CPU cores, arm
-----
Session information updated at 2025-10-22 16:31