Quality control of sc/snRNA-seq

Quality control of sc/snRNA-seq#

To perform quality control of single-cell or single-nuclei RNA sequencing (sc/snRNA-seq) we have the dotools_py.pp.importer_py() function. This function compiles different methods to process samples. We need to define a list of H5 files that have been generated from a mapping tool like CellRanger or STARsolo. If CellBender has been run, we can also provide the path to the H5 files generated by CellBender. Additionally, we need to provide the batch names for the samples, and additional metadata can be provided in the form of a dictionary.

The quality control involves filtering genes and cells:

Genes: are removed based on their expression levels. Genes expressed in low amount of cells are excluded. By default, we consider that a gene is excluded if it is expressed in less than 5 cells
Cells: are removed based on different parameters, including: number of genes, mitochondrial content, doublets and number of UMI counts.
- Mitochondrial content: cells with high mitochondrial content are excluded. We recomment assuming a maximum un 5 % for scRNA-seq and 3% for snRNA-seq
- Doublets: we implemented three different approaches for the identification of neotypic doublets (i.e, doublets originating from the combination of two or more different cell-types). The available implementations are scDblFinder, DoubletDetection and Scrublet.
- Number of genes: cells are removed by absolute number of genes. A lower and upper threshold can be set.
- Number of UMI counts: cells can be removed using two approaches: absolute or quantiles. A lower and upper threshold can be set and a combination of both approaches can be used (e.g., an absolute lower threshold and filter cells on the upper quantile).

After the quality control per sample, the individual samples will be combined into one AnnData object and log-normalisation, scaling and highly variable genes will be calculated. To evaluate the quality control the distribution of total UMI, number of genes and mitochondrial genes per cell will be plotted in a violin plot before and after the quality control. These plots will be saved in the folders containing the H5 files. Additionally, we also keep track on the number of cells and genes that have been removed in each quality control step.

First, we start setting up the environment and loading the required libraries

Environment setup#

import dotools_py as do
import pandas as pd
from IPython.display import display, SVG
import session_info

2025-10-22 16:16:29,365 - Jupyter enviroment detected. Using "inline" backend

To show how the quality control works, we are going to use a public dataset from 10X from human blood of healthy and donors with a malignant tumor. We get the raw and the filtered H5 files generated by 10X.

do.dt.example_10x("/Users/david/Downloads/PublicData10x")

2025-10-22 16:16:29,452 - Downloading data to /Users/david/Downloads/PublicData10x

Sequential preprocessing#

paths = [
    "/Users/david/Downloads/PublicData10x/healthy/outs/filtered_feature_bc_matrix.h5",
    "/Users/david/Downloads/PublicData10x/disease/outs/filtered_feature_bc_matrix.h5",
]

adata = do.pp.importer_py(
    paths=paths,
    ids=["batch1", "batch2"],
    metadata={"condition": ["healthy", "disease"]},  # Additional metadata information
    batch_key="batch",  # Column in obs to save batch information
    remove_doublets=True,
    doublet_tool="scDblFinder",  # Tool to identify neotypic doublets (Also available Scrublet and DoubletDetection)
    min_genes_in_cell=300,
    min_cells_with_genes=5,
    cut_mt=5,
    n_reads=10_000,
    min_counts=500,  # Filter cells with less than 500 genes
    high_quantile=95,  # Filter cells with the top 5% high number of UMI counts
)

2025-10-22 16:16:33,139 - Reading batch1
2025-10-22 16:16:33,896 - Remove Cells with low number of genes
2025-10-22 16:16:33,943 - Remove Genes lowly expressed
2025-10-22 16:16:34,007 - Remove cells with high MT-content
2025-10-22 16:16:34,025 - Remove cells based on nUMI counts
2025-10-22 16:16:34,340 - Finding Neotypic doublets
2025-10-22 16:16:34,409 - Running scDblFinder
2025-10-22 16:16:51,987 - Remove 86 doublets
2025-10-22 16:16:52,346 - Reading batch2
2025-10-22 16:16:52,840 - Remove Cells with low number of genes
2025-10-22 16:16:52,878 - Remove Genes lowly expressed
2025-10-22 16:16:52,932 - Remove cells with high MT-content
2025-10-22 16:16:52,937 - Remove cells based on nUMI counts
2025-10-22 16:16:52,944 - Finding Neotypic doublets
2025-10-22 16:16:52,978 - Running scDblFinder
2025-10-22 16:17:09,388 - Remove 26 doublets
2025-10-22 16:17:09,571 - Concatenating samples
2025-10-22 16:17:09,653 - Normalisation of the expression
2025-10-22 16:17:09,675 - Finding Highly Variable Genes shared across samples
2025-10-22 16:17:09,970 - Run PCA

adata

AnnData object with n_obs × n_vars = 2783 × 18517
    obs: 'batch', 'condition', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'total_counts_mt', 'log1p_total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'log1p_total_counts_ribo', 'pct_counts_ribo', 'n_genes', 'n_counts', 'doublet_class', 'doublet_score'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'highly_variable_nbatches', 'highly_variable_intersection'
    uns: 'log1p', 'hvg'
    obsm: 'X_pca'
    layers: 'counts', 'logcounts'

Evaluation of the preprocessing#

We can now check the quality control plots that were generated:

files = [
    "/Users/david/Downloads/PublicData10x/healthy/outs/Vln_PreQC_batch1.svg",
    "//Users/david/Downloads/PublicData10x/healthy/outs/Vln_PostQC_batch1.svg",
    "/Users/david/Downloads/PublicData10x/healthy/outs/251022_QC_Metricsbatch1.svg"
]

display(
    SVG(files[0]),
    SVG(files[1]),
    SVG(files[2]),
)

../_images/e2b65c676cb011f6df16f85c8557778a717072528f4b12c4b33b5ee75c64567d.svg

../_images/39b52374feddf6c519a1cf4c836f34ce840fddd61909f52224b6a19a05ee4b73.svg

../_images/4e4919ddf776fe402ed2a2823ef775296457012a44715369ae31f0e48d835033.svg

We can observe that the majority of cells were remove due to high mitochondrial content. Depending on the experimental set-up we might want to increase the threshold of mitochondrial content if we do not want to lose too many cells. Besides these plots, we also have an ExcelSheet that kept track on the thresholds used during the quality control.

table = pd.read_excel("/Users/david/Downloads/PublicData10x/healthy/outs/251022_Metrics_batch1.xlsx")
table

	QC_Step	nCells	nFeatures	Comments
0	Input_Shape	7865	33538	NaN
1	Rm_Cells_lowGenes	7851	33538	Remove cells with <300 genes
2	Rm_Genes_lowCells	7851	16844	Remove genes express in less than 5 cells
3	Rm_Cell_HighMT	2254	16844	Remove cells with >5% of Mitochondrial genes
4	Rm_Cells_nUMI_nGenes	2141	16844	Remove cells based on nUMI counts[Absolute (Mi...
5	Rm_Doublets	2055	16844	Remove neotypic doublets using scDblFinder

Integration and clustering#

After the quality control, we can now proceed to the batch correction and integration of the samples. For these, we can use different batch correction methods: Harmony, Scanorama, BBKNN, scVI or CCA from Seurat (v4 or v5 approach). After the integration of the samples, we run the Leiden algorithm to find clusters and generate the UMAP embeddings for visualisation.

do.tl.integrate_data(
    adata,
    batch_key="batch",
    hvg_batch=True,
    cca5=True,
    resolution=0.3,  # Resolution for leiden algorithm
)

2025-10-22 16:17:10,416 - Computing HVGs
2025-10-22 16:17:11,051 - Integration using CCA (Seurat v5 approach)
2025-10-22 16:17:11,053 - Preprocessing to export to Seurat
2025-10-22 16:17:11,069 - Running CCA Integration
                          integratedcca_1 integratedcca_2 integratedcca_3
AAACCCAGTGCATTTG-1-batch1      -29.240194       0.4496809        4.260958
AAACCCATCTCAACGA-1-batch1        4.484761      -1.2218207        4.033311
AAACCCATCTCTCGAC-1-batch1        4.340738      -1.4006172        3.692713
2025-10-22 16:17:31,545 - Loading corrected matrix
2025-10-22 16:17:31,577 - Finding neighbors
2025-10-22 16:17:33,748 - Run UMAP
2025-10-22 16:17:36,709 - Clustering cells using Leiden (resolution 0.3)

We can observe, that after the integration we have X_CCA in obsm. This is the CCA matrix after dimensionality reduction. Contrary to the approach in Seurat4 where the dimensions of this matrix is n_cells x n_hvg, in this case the dimension is n_cells x 50

adata.obsm["X_CCA"].shape

(2783, 50)

Evaluation of integration#

We can now visualise the integrated object and the identified clusters:

do.pl.split_embeddding(adata, "batch", figsize=(8, 5))
do.pl.umap(adata, "leiden", labels="leiden", figsize=(6, 5))

../_images/f7a955ff90489965a171f3a60480025b274b42399193966be7764d5ddf032927.png

../_images/68404bc91fe8c4237c46a7087d7dd4647a1788988a16eba551de7b9f67d679cb.png

adata.write("/Users/david/Downloads/PublicData10x/adata.h5ad")

Semi-automatic annotation with CellTypist#

We also have the possibility to perform a semi-automatic annotation using CellTypist. In this case, we use the Adult_COVID19_PBMC.pkl model.

do.tl.auto_annot(adata, "leiden", model="Healthy_COVID19_PBMC.pkl", convert=False, pl_cell_prob=True)

../_images/759b98bbb356ea788ac3eeef49ba3a837ea076bfdb98978d5a95abf8b56d755b.png

do.pl.umap(adata, "leiden", labels="autoAnnot")

../_images/0a0a0fb11408492ba00270f0b35a12dafbe2d2de63d9afaae946f9aa26b2aec8.png

Besides the semi-automatic annotation, we should also validate the findings with known markers for these celltypes.

markers = {
    "ImmuneCells": ["PTPRC"],
    "B_cells": ["CD79A", "BANK1", "MS4A1"],
    "T_cells": ["CD3E", "CD4", "IL7R"],
    "NK": ["NKG7", "KLRD1"],
    "Myeloid": ["CD68", "CD14", "ITGAM"],
    "pDC": ["LILRA4", "CLEC4C", "LRRC26"],
}

do.pl.dotplot(adata, "leiden", markers, swap_axes=False, var_group_rotation=90)

../_images/56dbe49938a2f9672ef808871ce897f7190a228857968070516bba03f44d0cf7.png

Overall we can see an agreement with the annotation and can continue with the annotation.

adata.obs["annotation"] = adata.obs.leiden.map(
    {"0": "Monocytes", "1": "T_cells", "2": "T_cells", "3": "NK", "4": "B_cells", "5": "pDC"}
)
do.pl.umap(adata, "annotation", labels="annotation")

../_images/a7faabe16416f6bf44047861beeecb02e0dcc226a0c56295410e038db0367f49.png

Evaluate changes in cell population#

After the annotation of the cell-type populations, we can also evaluate if there are significant changes in these populations in the healthy and diseased condition using scanpro.

do.pl.cell_composition(
    adata,
    annot_key="annotation",
    cond_key="condition",
    batch_key="batch",
    transform="arcsin",  # Produce more accurate results for simulated data
    condition_order=["healthy", "disease"],
)

[INFO] Your data doesn't have replicates! Artificial replicates will be simulated to run scanpro.
[INFO] Simulation may take some minutes...
[INFO] Generating 3 replicates and running 100 simulations...
[INFO] Finished 100 simulations in 0.99 seconds
2025-10-22 16:17:55,491 - There are 3 populations with a significant change

../_images/24e80801a7890d4246fc682628cc0889e75a76a29c08b067e26fbb9777f7f299.png

Cell populations with a significant change are connected by discontinued lines and the p-value is indicated in the legend. In this case, we see a significant change in B cells, Monocytes and NK cells.

Reclustering of a cell population#

If we are interested in specific states for a cell-type, we can also perform re-clustering. In this case, we are going to focus on the biggest cluster, the T cells.

tcell = do.tl.reclustering(
    adata,
    cluster_key="annotation",  # Metadata column with clusters
    batch_key="batch",  # Metadata column with batch information
    recluster_apporach="cca5",  # Integration approach used
    use_clusters=["T_cells"],  # Cluster we want to re-cluster
    use_rep="X_CCA",  # representation to use
    get_subset=True,  # Get AnnData of T_cells re-clusters
    resolution=0.6,
)
do.pl.umap(tcell, "leiden")

2025-10-22 16:22:43,257 - annotation_recluster will be overwritten
2025-10-22 16:22:43,268 - Reclustering using CCA5 approach

../_images/fab14e4bc7fd1a94c85d867070f23482c92d4db1863b651a5763e383aef9eb15.png

We identified 5 clusters, to evaluate if there are subtypes of T_cells we can identify the top markers for each cluster.

do.tl.rank_genes_groups(tcell, groupby="leiden", method="wilcoxon", tie_correct=True, pts=True)
table = do.get.dge_results(tcell)
table_filt = table[(table.log2fc > 0.25) & (table.padj < 0.05)]

for group in table_filt.group.unique():
    display(table_filt[table_filt.group == group].head(6))

	GeneName	wilcox_score	log2fc	pvals	padj	pts_group	pts_ref
0	RPS3A	24.965307	0.591396	1.456458e-137	2.696923e-133	0.998952	0.995619
1	RPS13	24.810595	0.664970	6.890172e-136	4.252844e-132	0.997904	0.995619
2	RPL30	22.808558	0.455121	3.770908e-115	1.396518e-111	1.000000	0.995619
3	RPL32	22.482384	0.463160	6.173390e-112	1.905211e-108	1.000000	0.996714
4	RPS23	22.206753	0.486430	2.955463e-109	7.818045e-106	1.000000	0.995619
5	RPS27A	21.341381	0.439174	4.688915e-101	1.085308e-97	1.000000	0.995619

	group	GeneName	wilcox_score	log2fc	pvals	padj	pts_group	pts_ref
18517	1	LINC02446	34.184128	6.524057	4.162212e-256	7.707169e-252	0.821918	0.009476
18518	1	CD8B	32.883751	5.815713	3.752366e-237	3.474128e-233	0.958904	0.025084
18519	1	CD8A	29.624027	5.087067	7.329576e-193	4.524059e-189	0.808219	0.021182
18520	1	CD8B2	15.841774	7.049018	1.602366e-56	7.417755e-53	0.164384	0.001115
18521	1	CTSW	14.294889	2.647650	2.354791e-46	8.720735e-43	0.726027	0.129877
18522	1	S100B	11.258095	5.809854	2.112769e-29	6.520357e-26	0.123288	0.003344

	group	GeneName	wilcox_score	log2fc	pvals	padj	pts_group	pts_ref
37034	2	ANXA1	22.154280	2.242573	9.486497e-109	1.756615e-104	0.744783	0.242765
37035	2	B2M	20.324232	0.452009	7.850685e-92	7.268556e-88	1.000000	0.999196
37036	2	S100A4	20.198523	1.883775	1.008713e-90	6.226114e-87	0.855538	0.509646
37037	2	ITGB1	18.013584	1.975382	1.524329e-72	4.704333e-69	0.568218	0.164791
37038	2	S100A11	17.528395	1.584508	8.698970e-69	2.301126e-65	0.691814	0.295016
37039	2	ANXA2	16.908760	2.318137	3.877802e-64	7.978363e-61	0.399679	0.072347

	group	GeneName	wilcox_score	log2fc	pvals	padj	pts_group	pts_ref
55551	3	IKZF2	21.439062	3.990067	5.776706e-102	1.069673e-97	0.471503	0.034050
55552	3	RTKN2	19.659430	3.815890	4.800974e-86	4.444981e-82	0.455959	0.043011
55553	3	TIGIT	18.316952	2.897526	6.061429e-75	3.741316e-71	0.544041	0.080645
55554	3	FOXP3	17.922539	4.867512	7.865529e-72	3.641150e-68	0.284974	0.013740
55555	3	CTLA4	16.903746	2.850002	4.222203e-64	1.563651e-60	0.497409	0.080048
55556	3	PMAIP1	15.972801	2.978600	1.977037e-57	6.101466e-54	0.528497	0.109916

	group	GeneName	wilcox_score	log2fc	pvals	padj	pts_group	pts_ref
74068	4	MYL9	23.222256	31.723900	2.713621e-119	5.024813e-115	0.291667	0.000000
74069	4	PTCRA	21.493868	31.799917	1.776754e-102	1.645008e-98	0.250000	0.000000
74070	4	SPIB	20.267651	6.568335	2.482264e-91	1.532136e-87	0.375000	0.003256
74071	4	ACY3	19.442804	6.555577	3.353267e-84	1.552311e-80	0.416667	0.005426
74072	4	LINC01857	19.355385	6.336043	1.836193e-83	6.800157e-80	0.333333	0.002713
74073	4	GSN	19.210054	7.703021	3.049471e-82	9.411177e-79	0.458333	0.007596

From the list of markers, cluster 3 seems to express markers for T regulatory cells. while cluster 1 seems to be enriched in cytotoxic markers. We can visualise the distribution of these genes.

do.pl.umap(tcell, ["FOXP3", "CTLA4", "CD8A", "GZMK"], ncols=2, labels="leiden")

../_images/51c24f7a48699bf665aa77bec5a0ad56ac541a5eb0184a7696727d6240f36aae.png

From this list of markers, we can see that cluster 1 is enriched for cytotoxic markers. We can transfer this annotation to our original object and evaluate again changes in the cell population.

tcell.obs["annotation_recluster"] = tcell.obs.leiden.map(
    {"0": "T_cells", "1": "T_cytotoxic", "2": "T_cells", "3": "Tregs", "4": "T_cells"}
)
adata.obs["annotation_recluster"] = adata.obs["annotation"].copy()
do.utility.transfer_labels(
    adata_original=adata,
    adata_subset=tcell,
    original_key="annotation_recluster",
    subset_key="annotation_recluster",
    original_labels=["T_cells"],
)
do.pl.umap(adata, "annotation_recluster", labels="annotation_recluster")

../_images/0c42f9fabc8e11b9646bc1b0aad50d00c970101fb43626ab2a0e8bd5c9c19fee.png

do.pl.cell_composition(
    adata,
    annot_key="annotation_recluster",
    cond_key="condition",
    batch_key="batch",
    transform="arcsin",
    condition_order=["healthy", "disease"],
)

[INFO] Your data doesn't have replicates! Artificial replicates will be simulated to run scanpro.
[INFO] Simulation may take some minutes...
[INFO] Generating 3 replicates and running 100 simulations...
[INFO] Finished 100 simulations in 1.37 seconds
2025-10-22 16:29:48,075 - There are 5 populations with a significant change

../_images/30ce2392a9099f0d3a02941c9d74e0755c6cdfa25c4471f72f9e35eed2ca5a4e.png

We can see that even though there is a decrease in the proportion of T_cytotoxic, the change is not significant. On the other hand, the regulatory T cells increase significantly.

Gene Ontology analysis#

We can also evaluate which biological processes are enriched in a cell-type in each condition by performing gene ontology analysis. First, we need to identified differentially expressed genes. We are going to focus on T cells. We can use do.tl.go_analysis() to run gene set analysis using the enrichR API. This function, will split differentially express genes in up- and down-regulated and run the analysis for each set.

tcell = adata[adata.obs.annotation == "T_cells"]
do.tl.rank_genes_groups(
    tcell, groupby="condition", method="wilcoxon", tie_correct=True, pts=True, reference="healthy", groups=["disease"]
)
table = do.get.dge_results(tcell)
df = do.tl.go_analysis(
    table,
    gene_key="GeneName",
    pval_key="padj",
    log2fc_key="log2fc",
    log2fc_cutoff=0.25,  # It will take -0.25 and +0.25
    specie="Human",
    go_catgs=["GO_Biological_Process_2023"],
)
df.head(10)

2025-10-22 16:31:14,783 - Running GSA on Up- and Down-regulated genes

	Gene_set	Term	Overlap	P-value	Adjusted P-value	Odds Ratio	Combined Score	Genes	state
0	GO_Biological_Process_2023	Regulation Of Apoptotic Process (GO:0042981)	115/705	3.547929e-11	1.465882e-07	2.105323	50.658438	TFRC;ARL6IP1;CIB1;FAIM2;TNF;IKZF3;CCND2;EPC1;P...	enriched
1	GO_Biological_Process_2023	Positive Regulation Of Cytokine Production (GO...	65/320	9.629913e-11	1.465882e-07	2.722232	62.784364	IL21;ITK;CD40;CD80;RORA;PTPN22;TNF;PNP;PDE4B;C...	enriched
2	GO_Biological_Process_2023	Regulation Of Gene Expression (GO:0010468)	162/1127	1.092209e-10	1.465882e-07	1.827286	41.913650	ZNF331;TFRC;NAB1;NAB2;JMJD1C;RORA;PRDM2;AHR;NR...	enriched
3	GO_Biological_Process_2023	Regulation Of B Cell Proliferation (GO:0030888)	20/44	1.404102e-10	1.465882e-07	8.779383	199.173072	IL21;IL10;LYN;VAV3;CD74;CD40;MEF2C;TFRC;TNFRSF...	enriched
4	GO_Biological_Process_2023	Positive Regulation Of Apoptotic Process (GO:0...	57/270	2.948489e-10	2.462578e-07	2.851035	62.564711	TOP2A;PRR7;BTG1;CTSV;TNF;ADAMTSL4;CTSL;CASP3;P...	enriched
5	GO_Biological_Process_2023	Response To Unfolded Protein (GO:0006986)	18/44	9.314947e-09	6.483203e-06	7.284420	134.700912	HSPA8;PTPN1;HSP90AA1;HSP90AB1;HSPA4;RHBDD1;HSP...	enriched
6	GO_Biological_Process_2023	Regulation Of DNA-templated Transcription (GO:...	237/1922	1.516467e-08	8.673926e-06	1.539930	27.725356	ZNF296;JMJD1C;IKZF2;IKZF3;BACH1;IKZF5;SPIB;GPB...	enriched
7	GO_Biological_Process_2023	Negative Regulation Of Apoptotic Process (GO:0...	80/482	1.661672e-08	8.673926e-06	2.126851	38.097973	ARF4;TFRC;ARL6IP1;CITED2;CIB1;FAIM2;MTRNR2L8;T...	enriched
8	GO_Biological_Process_2023	Regulation Of B Cell Activation (GO:0050864)	13/25	3.212989e-08	1.443806e-05	11.374688	196.252944	IL10;FCRL3;TNFAIP3;IKZF3;SAMSN1;SUPT6H;ZFP36L2...	enriched
9	GO_Biological_Process_2023	Antigen Receptor-Mediated Signaling Pathway (G...	33/134	3.457389e-08	1.443806e-05	3.453623	59.333826	IGHM;ITK;PTPN22;PTPRJ;MALT1;CD79A;IGHG1;CD19;Z...	enriched

We can visualise the top terms enriched in each condition with do.pl.split_bar_gsea(). But we need to do a pre-filtering to only consider significant terms.

df_filt = df[df["Adjusted P-value"] < 0.05]
do.pl.split_bar_gsea(
    df_filt,
    term_col="Term",
    col_split="Combined Score",  # Column to use for the x-axis
    cond_col="state",  # Column that splits the up and down-regulated terms
    pos_cond="enriched",  # value in cond_col that should be in the positive axis
)

2025-10-22 16:31:32,711 - !!! Assuming GO Terms are preprocessed (Only Significant terms included)

../_images/869152c01c238b0b0ba95a482aee6768613074a8c516e3e4569d65b281acd8bb.png

adata.write("/Users/david/Downloads/Data10x/adata.h5ad")

session_info.show(na=False, cpu=True, excludes=["backports"], std_lib=True, dependencies=True, html=True)

Click to view session information

-----
anndata             0.11.4
dotools_py          0.0.1
pandas              2.3.2
session_info        v1.0.1
sys                 3.11.13 (main, Jun  5 2025, 08:21:08) [Clang 14.0.6 ]
-----

Click to view modules imported as dependencies

Cython                      3.1.4
PIL                         11.3.0
absl                        2.3.1
adjustText                  1.3.0
appnope                     0.1.4
argparse                    1.1
array_api_compat            1.12.0
arrow                       1.3.0
attr                        25.3.0
attrs                       25.3.0
babel                       2.17.0
celltypist                  1.7.1
certifi                     2025.08.03
cffi                        2.0.0
charset_normalizer          3.4.3
cloudpickle                 3.1.1
comm                        0.2.3
coverage                    7.11.0
csv                         1.0
ctypes                      1.1.0
cycler                      0.12.1
cython                      3.1.4
dask                        2024.11.2
dateutil                    2.9.0.post0
debugpy                     1.8.17
decimal                     1.70
decorator                   5.2.1
defusedxml                  0.7.1
deprecated                  1.2.18
docrep                      0.3.2
doubletdetection            4.3.0.post1
et_xmlfile                  2.0.0
executing                   2.2.1
fsspec                      2025.9.0
geopandas                   1.1.1
gseapy                      1.1.10
h5py                        3.14.0
idna                        3.10
igraph                      0.11.9
ipaddress                   1.0
ipykernel                   6.30.1
ipywidgets                  8.1.7
jedi                        0.19.2
jinja2                      3.1.6
joblib                      1.5.2
json                        2.0.9
json5                       0.12.1
jsonpointer                 3.0.0
jsonschema                  4.25.1
jupyter_events              0.12.0
jupyter_server              2.17.0
jupyterlab_server           2.27.3
kiwisolver                  1.4.9
lark                        1.2.2
leidenalg                   0.10.2
lightning                   2.5.5
lightning_utilities         0.15.2
llvmlite                    0.45.0
logging                     0.5.1.2
markupsafe                  3.0.2
marshal                     4
matplotlib                  3.10.6
matplotlib_inline           0.1.7
ml_collections              1.1.0
mpmath                      1.3.0
mudata                      0.3.2
natsort                     8.4.0
nbformat                    5.10.4
numba                       0.62.0
numcodecs                   0.15.1
numpy                       2.3.3
openpyxl                    3.1.5
opt_einsum                  3.4.0
packaging                   25.0
parso                       0.8.5
patsy                       1.0.1
phenograph                  1.5.7
platform                    1.0.8
platformdirs                4.4.0
polars                      1.33.1
prompt_toolkit              3.0.52
psutil                      7.1.0
pure_eval                   0.2.3
pyarrow                     21.0.0
pycparser                   2.23
pygments                    2.19.2
pynndescent                 0.5.13
pyparsing                   3.2.4
pyproj                      3.7.2
pyro                        1.9.1
pytz                        2025.2
re                          2.2.1
requests                    2.32.5
rfc3339_validator           0.1.4
rfc3986_validator           0.1.1
scanpro                     0.4.0
scanpy                      1.11.3
scipy                       1.15.3
scvi                        1.4.0
seaborn                     0.13.2
shapely                     2.1.2
six                         1.17.0
sklearn                     1.7.2
sniffio                     1.3.1
socketserver                0.4
sparse                      0.17.0
sqlite3                     2.6.0
stack_data                  0.6.3
statsmodels                 0.14.5
stdlib_list                 0.11.1
sympy                       1.14.0
tarfile                     0.9.0
texttable                   1.7.0
threadpoolctl               3.6.0
tlz                         1.0.0
toolz                       1.0.0
torch                       2.8.0
torchmetrics                1.8.2
tornado                     6.5.2
tqdm                        4.67.1
traitlets                   5.14.3
umap                        0.5.9.post2
urllib3                     2.5.0
wcwidth                     0.2.13
websocket                   1.8.0
wrapt                       1.17.3
xarray                      2025.9.0
yaml                        6.0.2
zarr                        2.18.7
zlib                        1.0
zmq                         27.1.0

-----
IPython             9.5.0
jupyter_client      8.6.3
jupyter_core        5.8.1
jupyterlab          4.4.7
notebook            7.4.5
-----
Python 3.11.13 (main, Jun  5 2025, 08:21:08) [Clang 14.0.6 ]
macOS-26.0.1-arm64-arm-64bit
10 logical CPU cores, arm
-----
Session information updated at 2025-10-22 16:31