scCS Tutorial — Single-Condition Analysis

Single-cell Commitment Scores with Radial Star Embedding

scCS v0.7.0 computes RNA velocity-based commitment scores for k-furcation cell fate decisions.

What scCS does

  1. Takes a user-defined bifurcation cluster (progenitor) and k terminal fate clusters

  2. Builds a radial star embedding — progenitor at origin, each fate on its own arm

  3. Projects RNA velocity into this embedding

  4. Computes commitment scores (unCS, nCS) and per-cell fate affinities

  5. Optionally bootstraps confidence intervals on all scores

  6. Identifies driver genes (velocity-based + DEG-based) per fate arm

  7. Runs pathway enrichment (KEGG, GO BP, Reactome) per fate

Reference

Kriukov et al. (2025) Single-cell transcriptome of myeloid cells in response to transplantation of human retinal neurons reveals reversibility of microglial activation


Notebook outline

Section

What you learn

  1. Installation

pip install

  1. Load & preprocess

scVelo pancreas dataset

  1. RNA velocity

scVelo dynamical model

  1. Build star embedding

SingleScorer.build_embedding()

  1. Fix arm coverage

rebuild_embedding_with_subset_pseudotime()

  1. Fit & score

fit(), score(), bootstrap CI

  1. Visualize

star plot, rose, pairwise heatmap, bar chart

  1. Per-cell scores

fate affinity, entropy, NN-smoothed entropy

  1. Transfer labels

write scores back to full adata

  1. Subset scoring

score_per_subset()

  1. Driver genes

velocity drivers, DEG drivers

  1. Pathway enrichment

KEGG / GO BP per fate

  1. Expression trends

plot_expression_trends()

1. Installation

[1]:
%matplotlib inline
# Install from GitHub (recommended)
# !pip install git+https://github.com/mcrewcow/scCS.git

# Or from a local directory:
#!pip install -e /...

import scCS
print(f"scCS version: {scCS.__version__}")
scCS version: 0.7.4

2. Load & preprocess data

We use the built-in pancreas dataset from scVelo — a well-studied k=4 bifurcation (Ductal → Alpha, Beta, Delta, Epsilon). The dataset already contains spliced/unspliced counts and cluster annotations.

[2]:
import numpy as np
import pandas as pd
import anndata
import matplotlib.pyplot as plt
import scanpy as sc
import scvelo as scv

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
try:
    from statsmodels.tools.sm_exceptions import ConvergenceWarning
    warnings.filterwarnings("ignore", category=ConvergenceWarning)
except ImportError:
    pass

sc.settings.verbosity = 1
scv.settings.verbosity = 1

adata = scv.datasets.pancreas()
print(adata)
print("\nCluster labels:", adata.obs['clusters'].unique().tolist())

# --- Tutorial display helpers (compact tables instead of per-fate loops) ---
pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 15)
pd.set_option("display.width", 160)


def _stack_drivers(driver_dict, n_per_fate=5, cols=None):
    """Concatenate a {fate: DataFrame} dict into one tidy long-form table.

    Shows the top ``n_per_fate`` rows per fate; adds a ``fate`` column at the
    front. Pass ``cols`` to pick a subset of columns to display.
    """
    out = []
    for fate, df in driver_dict.items():
        if df is None or len(df) == 0:
            continue
        block = df.head(n_per_fate).copy()
        block.insert(0, "fate", fate)
        out.append(block)
    if not out:
        return pd.DataFrame()
    big = pd.concat(out, ignore_index=True)
    if cols is not None:
        keep = ["fate"] + [c for c in cols if c in big.columns]
        big = big[keep]
    return big


def _stack_enrichment(enrichment_dict, n_per_group=3):
    """Flatten enrichment {fate: {direction: df}} into one tidy table."""
    rows = []
    for fate, fate_results in enrichment_dict.items():
        for direction in ("up", "down"):
            df = fate_results.get(direction, None)
            if df is None or df.empty:
                continue
            top = df.head(n_per_group).copy()
            top.insert(0, "direction", direction)
            top.insert(0, "fate", fate)
            rows.append(top)
    if not rows:
        return pd.DataFrame()
    return pd.concat(rows, ignore_index=True)

AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
    var: 'highly_variable_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'

Cluster labels: ['Pre-endocrine', 'Ductal', 'Alpha', 'Ngn3 high EP', 'Delta', 'Beta', 'Ngn3 low EP', 'Epsilon']
[3]:
# Standard scVelo preprocessing
import inspect as _inspect
_fn_sig = _inspect.signature(scv.pp.filter_and_normalize).parameters
if "n_top_genes" in _fn_sig:
    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
else:
    # scvelo builds without n_top_genes: do filter_and_normalize then HVG.
    # Use flavor="cell_ranger" not the default "seurat": seurat passes
    # int n_bins to pd.cut on log-dispersions which contain +/-inf for
    # low-mean genes — pandas >= 2.2 rejects this with ValueError.
    # cell_ranger uses explicit bin edges including +/-inf and is
    # robust across pandas/py versions (scCS changelog v0.7.4).
    scv.pp.filter_and_normalize(adata, min_shared_counts=20)
    sc.pp.highly_variable_genes(
        adata, n_top_genes=2000, subset=True, flavor="cell_ranger"
    )
sc.pp.neighbors(adata)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)

3. RNA velocity

We run the dynamical model (most accurate) via scVelo. If your data already has velocity and velocity_graph, skip this section.

[4]:
scv.tl.recover_dynamics(adata, n_jobs=32)
scv.tl.velocity(adata, mode="dynamical")
scv.tl.velocity_graph(adata)
scv.tl.velocity_pseudotime(adata)

# Visualise velocity on UMAP
scv.pl.velocity_embedding_stream(adata, basis="umap", color="clusters",
                                  title="RNA velocity — pancreas")
/home/emil/miniforge3/envs/lab-py312/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=3974) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
/home/emil/miniforge3/envs/lab-py312/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=3974) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()
_images/scCS_tutorial_single_7_4.png

Alternatively, you can let scCS run the full velocity pipeline for you:

scorer = scCS.SingleScorer(adata, ...)
scorer.compute_velocity(mode="dynamical", n_top_genes=2000)

4. Build the radial star embedding

The pancreas dataset has a clear bifurcation from Ductal progenitors into four terminal fates: Alpha, Beta, Delta, Epsilon.

SingleScorer requires:

  • root — the progenitor cluster label

  • branches — list of terminal fate cluster labels

  • obs_key — column in adata.obs with cluster labels

[5]:
scorer = scCS.SingleScorer(
    adata,
    root="Pre-endocrine",
    branches=["Alpha", "Beta", "Delta", "Epsilon"],
    obs_key="clusters",
    n_angle_bins=36,          # 10° angular bins (default)
    sector_method="centroid",  # anchor sectors to fate centroids (recommended)
)

# Build the star embedding using scVelo pseudotime
scorer.build_embedding(
    ordering_metric="pseudotime",  # uses adata.obs['velocity_pseudotime']
    arm_scale=10.0,   # maximum radial distance
    jitter=0.3,       # perpendicular noise to avoid overplotting
)

print("Embedding shape:", scorer.adata_sub.obsm["X_sccs"].shape)
print("Cells in embedding:", scorer.adata_sub.n_obs)
[scCS] Building star embedding: root='Pre-endocrine', k=4 fates, metric='pseudotime'
[scCS] Subsetting: 1876 / 3696 cells kept
       (1820 cells from other populations excluded)
       Alpha: 481 cells (fate)
       Beta: 591 cells (fate)
       Delta: 70 cells (fate)
       Epsilon: 142 cells (fate)
       Pre-endocrine: 592 cells (progenitor)

[scCS] Star embedding built → adata_sub.obsm["X_sccs"] shape: (1876, 2)
       Arm angles: {'Alpha': 0.0, 'Beta': 90.0, 'Delta': 180.0, 'Epsilon': 270.0}
[scCS] Star embedding stored in scorer.adata_sub.obsm['X_sccs']. (1876 cells)
Embedding shape: (1876, 2)
Cells in embedding: 1876

5. Fix arm coverage with subset-local pseudotime

Problem: scVelo pseudotime is computed on the full adata. When we subset to bifurcation + terminal fate cells, the pseudotime range is compressed — cells cluster near the origin instead of spanning the full arm length.

Solution: refit_pseudotime() extracts the velocity_graph submatrix for the subset cells, recomputes pseudotime locally, scales it to [0, 1], and rebuilds the embedding.

When to use ``scale_01=True`` vs ``False``:

  • scale_01=True (default) — cells span the full arm; best for single-condition analysis

  • scale_01=False — preserves absolute pseudotime ordering; use for multi-condition comparisons where cross-condition pseudotime comparability matters

[6]:
# Recompute pseudotime on the subset subgraph and rebuild the embedding
scorer.refit_pseudotime(scale_01=True)

# Inspect the corrected pseudotime
pt_sub = scorer.adata_sub.obs["velocity_pseudotime"]
print(f"Subset pseudotime range: [{pt_sub.min():.3f}, {pt_sub.max():.3f}]")
[scCS] Recomputing pseudotime on subset (1876 / 3696 cells)...
[scCS] Used scanpy DPT (with diffmap) as pseudotime fallback.
[scCS] Subset pseudotime scaled to [0, 1].
[scCS] Subset pseudotime stored in adata_sub.obs['sccs_pseudotime']. Range: [0.000, 1.000]
[scCS] Rebuilding star embedding with subset-local pseudotime...
[scCS] Subsetting: 1876 / 3696 cells kept
       (1820 cells from other populations excluded)
       Alpha: 481 cells (fate)
       Beta: 591 cells (fate)
       Delta: 70 cells (fate)
       Epsilon: 142 cells (fate)
       Pre-endocrine: 592 cells (progenitor)

[scCS] Star embedding built → adata_sub.obsm["X_sccs"] shape: (1876, 2)
       Arm angles: {'Alpha': 0.0, 'Beta': 90.0, 'Delta': 180.0, 'Epsilon': 270.0}
[scCS] Embedding rebuilt. Call fit() again to update the FateMap and velocity projection.
Subset pseudotime range: [0.495, 1.000]

6. Fit and compute commitment scores

fit() builds the FateMap (validates clusters, computes fate centroids, projects velocity). score() computes all commitment scores.

Key outputs in CommitmentScoreResult

Attribute

Description

pairwise_unCS

Unnormalized CS matrix (k × k)

pairwise_nCS

Cell-count-normalized CS matrix (k × k)

commitment_vector

Fraction of velocity mass per fate

population_entropy

Population-level commitment entropy

mean_cell_entropy

Mean per-cell fate entropy

per_fate_entropy

Per-fate binary entropy

cell_scores

Per-cell fate affinity matrix (n_cells × k)

bootstrap_ci

Bootstrap CI dict (if n_bootstrap > 0)

[7]:
scorer.fit()
[scCS] Root cluster 'Pre-endocrine': 592 cells, centroid=(-0.00, -0.00)
[scCS]   Fate 'Alpha': 481 cells, centroid=(3.51, -0.01)
[scCS]   Fate 'Beta': 591 cells, centroid=(0.03, 4.38)
[scCS]   Fate 'Delta': 70 cells, centroid=(-4.75, -0.05)
[scCS]   Fate 'Epsilon': 142 cells, centroid=(-0.02, -2.55)
[scCS] FateMap built: k=4 fates
[scCS] Projecting velocity via scVelo on full adata → slicing to subset...
[scCS] Velocity projected. Shape: (1876, 2)
FateMap  (root='Pre-endocrine', k=4)
  Cluster key : 'clusters'
  Root cells  : 592
  Root centroid: (-0.005, -0.003)
  Fate 0: 'Alpha'  n_cells=481  centroid=(3.51, -0.01)  arm_angle=0.0°
  Fate 1: 'Beta'  n_cells=591  centroid=(0.03, 4.38)  arm_angle=90.0°
  Fate 2: 'Delta'  n_cells=70  centroid=(-4.75, -0.05)  arm_angle=180.0°
  Fate 3: 'Epsilon'  n_cells=142  centroid=(-0.02, -2.55)  arm_angle=270.0°
[7]:
SingleScorer(root='Pre-endocrine', branches=['Alpha', 'Beta', 'Delta', 'Epsilon'], k=4, status='fitted')
[8]:
fig = scorer.plot_nn_entropy_elbow(k_nn_range=range(5, 51, 5))
plt.show()
_images/scCS_tutorial_single_15_0.png
[9]:
# Score with bootstrap confidence intervals (n_bootstrap=500 recommended)
result = scorer.score(
    cell_level=True,   # compute per-cell fate affinities
    k_nn=15,                   # NN-smoothed entropy (15 nearest neighbours)
    n_bootstrap=500,           # bootstrap CI on nCS
    bootstrap_ci=0.95,         # 95% CI
)
[scCS] Computing bootstrap CI (n=500)...
=== CommitmentScoreResult ===
  Fates (4): Alpha, Beta, Delta, Epsilon
  Dominant fate: Beta

  Entropy metrics:
    Population entropy:           0.7052  [aggregate velocity-mass balance]
    Mean cell entropy:            0.9361  [per-cell average, k-way]
    Per-fate cell entropy:
      Alpha: 0.8246
      Beta: 0.8608
      Delta: 0.7541
      Epsilon: 0.6399
    NN-smoothed entropy (k=15):  mean=0.9547  [per-cell, stored in adata_sub.obs['cs_nn_entropy']]

  Commitment vector (normalized):
    Alpha: 0.2210
    Beta: 0.6449
    Delta: 0.0765
    Epsilon: 0.0576

  Pairwise nCS matrix:
            Alpha      Beta     Delta   Epsilon
Alpha    1.000000  0.421157  0.420532  1.133082
Beta     2.374413  1.000000  0.998516  2.690405
Delta    2.377942  1.001486  1.000000  2.694404
Epsilon  0.882548  0.371691  0.371140  1.000000

  Bootstrap 95% CI on nCS (n=500):
    CI low:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.369  0.352    0.922
Beta     2.119  1.000  0.851    2.270
Delta    1.989  0.855  1.000    2.103
Epsilon  0.723  0.303  0.290    1.000
    CI high:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.472  0.503    1.383
Beta     2.712  1.000  1.170    3.297
Delta    2.838  1.175  1.000    3.449
Epsilon  1.084  0.440  0.476    1.000
[10]:
# Summary table
print(result.summary())
=== CommitmentScoreResult ===
  Fates (4): Alpha, Beta, Delta, Epsilon
  Dominant fate: Beta

  Entropy metrics:
    Population entropy:           0.7052  [aggregate velocity-mass balance]
    Mean cell entropy:            0.9361  [per-cell average, k-way]
    Per-fate cell entropy:
      Alpha: 0.8246
      Beta: 0.8608
      Delta: 0.7541
      Epsilon: 0.6399
    NN-smoothed entropy (k=15):  mean=0.9547  [per-cell, stored in adata_sub.obs['cs_nn_entropy']]

  Commitment vector (normalized):
    Alpha: 0.2210
    Beta: 0.6449
    Delta: 0.0765
    Epsilon: 0.0576

  Pairwise nCS matrix:
            Alpha      Beta     Delta   Epsilon
Alpha    1.000000  0.421157  0.420532  1.133082
Beta     2.374413  1.000000  0.998516  2.690405
Delta    2.377942  1.001486  1.000000  2.694404
Epsilon  0.882548  0.371691  0.371140  1.000000

  Bootstrap 95% CI on nCS (n=500):
    CI low:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.369  0.352    0.922
Beta     2.119  1.000  0.851    2.270
Delta    1.989  0.855  1.000    2.103
Epsilon  0.723  0.303  0.290    1.000
    CI high:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.472  0.503    1.383
Beta     2.712  1.000  1.170    3.297
Delta    2.838  1.175  1.000    3.449
Epsilon  1.084  0.440  0.476    1.000
[11]:
# Access individual scores
print("Pairwise nCS matrix:")
print(pd.DataFrame(
    result.pairwise_nCS,
    index=result.fate_names,
    columns=result.fate_names
).round(3))

print("\nCommitment vector (fraction of velocity mass per fate):")
for fate, cv in zip(result.fate_names, result.commitment_vector):
    print(f"  {fate}: {cv:.3f}")

print(f"\nPopulation entropy: {result.population_entropy:.3f}")
print(f"Mean per-cell entropy: {result.mean_cell_entropy:.3f}")
Pairwise nCS matrix:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.421  0.421    1.133
Beta     2.374  1.000  0.999    2.690
Delta    2.378  1.001  1.000    2.694
Epsilon  0.883  0.372  0.371    1.000

Commitment vector (fraction of velocity mass per fate):
  Alpha: 0.221
  Beta: 0.645
  Delta: 0.076
  Epsilon: 0.058

Population entropy: 0.705
Mean per-cell entropy: 0.936
[12]:
# Bootstrap CI (if n_bootstrap > 0)
if result.bootstrap_ci is not None:
    ci = result.bootstrap_ci
    print(f"Bootstrap CI ({int(ci['ci_level']*100)}%, n={ci['n_bootstrap']}):")
    print("  nCS CI low:")
    print(pd.DataFrame(ci['ci_low'], index=result.fate_names, columns=result.fate_names).round(3))
    print("\n  nCS CI high:")
    print(pd.DataFrame(ci['ci_high'], index=result.fate_names, columns=result.fate_names).round(3))
Bootstrap CI (95%, n=500):
  nCS CI low:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.369  0.352    0.922
Beta     2.119  1.000  0.851    2.270
Delta    1.989  0.855  1.000    2.103
Epsilon  0.723  0.303  0.290    1.000

  nCS CI high:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.472  0.503    1.383
Beta     2.712  1.000  1.170    3.297
Delta    2.838  1.175  1.000    3.449
Epsilon  1.084  0.440  0.476    1.000

7. Visualizations

scCS provides several built-in plots. All are accessible as methods on the scorer or as standalone functions in scCS.plot.

7.1 Radial star embedding

[13]:
# Primary visualization: cells in the star embedding, coloured by fate
fig = scorer.plot_star(result, figsize=(8, 8))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_22_0.png
[14]:
# Colour by per-cell entropy instead of fate
fig = scorer.plot_star(result, color_by="entropy", figsize=(8, 8))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_23_0.png
[15]:
# Colour by NN-smoothed entropy (requires k_nn in score())
fig = scorer.plot_star(result, color_by="nn_entropy", figsize=(8, 8))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_24_0.png

7.2 Rose / polar plot

[16]:
# Rose plot: cumulative velocity magnitude per angular bin
fig = scorer.plot_rose(result, figsize=(6, 6))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_26_0.png

7.3 Pairwise nCS heatmap

[17]:
# Heatmap of pairwise normalized commitment scores
fig = scorer.plot_pairwise_cs(result, figsize=(5, 4))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_28_0.png

7.4 Commitment bar chart

[18]:
# Bar chart: unCS vs nCS per fate pair
fig = scorer.plot_commitment_bar(result, figsize=(8, 4))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_30_0.png

7.5 Per-cell fate affinity heatmap

[19]:
# Heatmap of per-cell fate affinities (rows = cells, cols = fates)
fig = scorer.plot_commitment_heatmap(result, figsize=(6, 5), max_cells=500)
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_32_0.png

7.6 NN entropy elbow plot

[20]:
# Elbow plot to choose k_nn for NN-smoothed entropy
# Sweeps k_nn values and plots mean NN entropy
fig = scorer.plot_nn_entropy_elbow(k_nn_range=range(5, 51, 5), figsize=(8, 4))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_34_0.png

8. Per-cell fate affinity scores

After score(cell_level=True), per-cell scores are stored in scorer.adata_sub.obs:

Column

Description

cs_{fate_name}

Per-cell affinity for each fate (sums to 1 per cell)

cs_dominant_fate

Fate with highest affinity

cs_entropy

Per-cell commitment entropy (0 = fully committed, 1 = uniform)

cs_nn_entropy

NN-smoothed entropy (if k_nn was set)

velocity_pseudotime_sub

Subset-local pseudotime

[21]:
# Inspect per-cell scores
obs_cols = [c for c in scorer.adata_sub.obs.columns if c.startswith("cs_")]
print("Per-cell score columns:", obs_cols)
scorer.adata_sub.obs[obs_cols].head(10)
Per-cell score columns: ['cs_Alpha', 'cs_Beta', 'cs_Delta', 'cs_Epsilon', 'cs_dominant_fate', 'cs_entropy', 'cs_nn_entropy']
[21]:
cs_Alpha cs_Beta cs_Delta cs_Epsilon cs_dominant_fate cs_entropy cs_nn_entropy
index
AAACCTGAGAGGGATA 0.345656 0.252071 0.154299 0.247973 Alpha 0.972886 0.992053
AAACCTGAGGCAATTA 0.247055 0.296687 0.252668 0.203590 Beta 0.993692 0.992740
AAACCTGTCCCTCTTT 0.324502 0.244887 0.175503 0.255107 Alpha 0.983665 0.958981
AAACGGGAGTAGCGGT 0.361040 0.324278 0.138481 0.176201 Alpha 0.946906 0.977282
AAACGGGCAAAGAATC 0.274640 0.476842 0.224002 0.024516 Beta 0.818077 0.875318
AAACGGGGTACAGTTC 0.251495 0.463168 0.247236 0.038101 Beta 0.846585 0.884746
AAACGGGGTCGGGTCT 0.268075 0.221469 0.232088 0.278367 Epsilon 0.996724 0.998418
AAACGGGTCCGCGCAA 0.328827 0.381496 0.170364 0.119313 Beta 0.929480 0.988123
AAACGGGTCGCATGGC 0.272847 0.479499 0.225780 0.021874 Beta 0.812556 0.817800
AAAGATGAGAGTACCG 0.213046 0.270671 0.286844 0.229439 Delta 0.994829 0.992597
[22]:
# Distribution of dominant fate assignments
print(scorer.adata_sub.obs["cs_dominant_fate"].value_counts())
cs_dominant_fate
Beta       836
Alpha      604
Delta      246
Epsilon    190
Name: count, dtype: int64
[23]:
# Entropy distribution per fate
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, len(result.fate_names), figsize=(12, 3), sharey=True)
for ax, fate in zip(axes, result.fate_names):
    mask = scorer.adata_sub.obs["cs_dominant_fate"] == fate
    ax.hist(scorer.adata_sub.obs.loc[mask, "cs_entropy"], bins=20, color="steelblue", edgecolor="white")
    ax.set_title(fate)
    ax.set_xlabel("Entropy")
axes[0].set_ylabel("# cells")
plt.suptitle("Per-cell commitment entropy by dominant fate", y=1.02)
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_38_0.png

9. Transfer labels to full adata

transfer_labels() writes per-cell scores from adata_sub back to the full adata, so you can use them in UMAP plots, downstream analyses, or integration with other tools.

Cells not in the embedding subset receive NaN (numeric) or 'unassigned' (categorical).

[24]:
scorer.transfer_labels(adata, result, prefix="cs_")

# Columns added to adata.obs
cs_cols = [c for c in adata.obs.columns if c.startswith("cs_")]
print("Columns added to adata.obs:", cs_cols)
adata.obs[cs_cols].head()
[scCS] Labels transferred to adata.obs for 1876 / 3696 cells. Columns: ['cs_Alpha', 'cs_Beta', 'cs_Delta', 'cs_Epsilon', 'cs_dominant_fate', 'cs_entropy']
Columns added to adata.obs: ['cs_Alpha', 'cs_Beta', 'cs_Delta', 'cs_Epsilon', 'cs_dominant_fate', 'cs_entropy', 'cs_nn_entropy']
[24]:
cs_Alpha cs_Beta cs_Delta cs_Epsilon cs_dominant_fate cs_entropy cs_nn_entropy
index
AAACCTGAGAGGGATA 0.345656 0.252071 0.154299 0.247973 Alpha 0.972886 0.992053
AAACCTGAGCCTTGAT NaN NaN NaN NaN unassigned NaN NaN
AAACCTGAGGCAATTA 0.247055 0.296687 0.252668 0.203590 Beta 0.993692 0.992740
AAACCTGCATCATCCC NaN NaN NaN NaN unassigned NaN NaN
AAACCTGGTAAGTGGC NaN NaN NaN NaN unassigned NaN NaN
[25]:
# Visualize on UMAP
sc.pl.umap(adata, color=["cs_dominant_fate", "cs_entropy", "cs_Alpha", "cs_Beta"],
           ncols=2, wspace=0.4)
_images/scCS_tutorial_single_41_0.png

10. Subset scoring

score_per_subset() computes commitment scores separately for each value of a metadata column in adata_sub.obs. Useful for comparing commitment across time points, batches, or any categorical variable.

Note: The mask is applied to adata_sub (the embedding subset), not the full adata.

Note on ``inf`` in pairwise nCS: When a subset contains only progenitor cells (e.g., ‘Pre-endocrine’) with no cells from any fate arm, pairwise_nCS will be inf for all off-diagonal entries. This is mathematically correct — nCS is undefined when a fate arm has 0 cells. A UserWarning is emitted automatically. Progenitor-only subsets are still useful for inspecting the commitment vector and population entropy.

[26]:
# Score separately for each cluster within the embedding
# (Here we use 'clusters' as a demo; in practice use a condition/batch column)
subset_results = scorer.score_per_subset(
    split_by="clusters",
    cell_level=False,
    n_bootstrap=0,
    verbose=False,
)

# Tidy summary: one row per subset with all off-diagonal nCS values + status
def _subset_summary(subset_results):
    rows = []
    for subset_name, res in subset_results.items():
        n = len(res.fate_names)
        ncs = res.pairwise_nCS
        row = {"subset": subset_name, "n_cells_in_fate": int(res.n_cells_per_fate.sum())}
        for i in range(n):
            for j in range(i + 1, n):
                col = f"nCS[{res.fate_names[i]} vs {res.fate_names[j]}]"
                row[col] = float(ncs[i, j])
        # Status flag: any inf?
        off_diag = [ncs[i, j] for i in range(n) for j in range(n) if i != j]
        if all(np.isinf(v) for v in off_diag):
            row["status"] = "progenitor-only (inf)"
        elif any(np.isinf(v) for v in off_diag):
            row["status"] = "partial (some inf)"
        else:
            row["status"] = "ok"
        rows.append(row)
    return pd.DataFrame(rows)

summary_df = _subset_summary(subset_results)
summary_df

[26]:
subset n_cells_in_fate nCS[Alpha vs Beta] nCS[Alpha vs Delta] nCS[Alpha vs Epsilon] nCS[Beta vs Delta] nCS[Beta vs Epsilon] nCS[Delta vs Epsilon] status
0 Pre-endocrine 0 inf inf inf inf inf inf progenitor-only (inf)
1 Alpha 481 0.0 0.0 0.0 inf inf inf partial (some inf)
2 Delta 70 inf inf inf inf inf 0.0 partial (some inf)
3 Beta 591 inf inf inf 0.0 0.0 inf partial (some inf)
4 Epsilon 142 inf inf inf inf inf inf partial (some inf)
[27]:
# Visualize subset comparison
fig = scorer.plot_subset_comparison(subset_results, figsize=(10, 4))
plt.tight_layout()
plt.show()
_images/scCS_tutorial_single_44_0.png

11. Driver genes

scCS provides three complementary approaches to identify fate-driving genes:

  1. Velocity drivers (get_velocity_drivers) — genes with highest mean RNA velocity in each fate arm (high positive velocity = gene is being upregulated along that fate)

  2. DEG drivers — differentially expressed genes between each fate arm and the bifurcation cluster (Wilcoxon rank-sum test)

  3. Velocity-fate correlation drivers (get_velocity_fate_drivers) — Spearman correlation between each gene’s velocity and per-cell fate affinity. Stronger signal than mean velocity because it filters out genes that are fast everywhere.

[28]:
# Velocity-based driver genes (top 20 per fate)
vel_drivers = scorer.get_velocity_drivers(n_top_genes=20)

# Compact long-form table: top 5 per fate, one DataFrame
print("Top velocity drivers (top 5 per fate):")
_stack_drivers(vel_drivers, n_per_fate=5, cols=["gene", "mean_velocity", "rank"])


── Velocity drivers: Alpha (top 20, sorted by delta_velocity) ──
 rank     gene  delta_velocity  mean_velocity
    1      Clu        1.488299       1.532060
    2     Cpa2        1.242302       0.851096
    3   Cryba2        1.057625       1.217162
    4     Ghrl        0.986426       0.384223
    5    Meis2        0.821635       0.665699
    6     Pax6        0.726139       0.883325
    7   Npepl1        0.725493       0.182687
    8   Tm4sf4        0.668253       0.466062
    9     Vasp        0.611339       0.097546
   10     Pax4        0.603224       0.034867
   11   Btbd17        0.577881       0.930656
   12     Krt7        0.547158      -0.280244
   13    Cd200        0.535196       0.330072
   14    Celf3        0.534336       0.370460
   15     Tox3        0.497146       0.251629
   16     Pdx1        0.496890       0.392475
   17   Ppp3ca        0.487356       0.200745
   18      Hn1        0.468364       0.031799
   19 BC023829        0.455840       0.336861
   20     Cpt2        0.432719       0.246379

── Velocity drivers: Beta (top 20, sorted by delta_velocity) ──
 rank          gene  delta_velocity  mean_velocity
    1          Ins2        3.672153       3.689225
    2          Nnat        3.631306       4.051527
    3          Isl1        3.562601       1.070217
    4           Clu        1.605093       1.648854
    5 1700086L19Rik        1.411600       0.453118
    6          Krt7        1.125878       0.298475
    7          Pax4        0.901425       0.333068
    8       Ppp1r1a        0.861976       1.539815
    9        Tm4sf4        0.842035       0.639844
   10       Igfbpl1        0.771356       0.268164
   11          Cpa2        0.714384       0.323178
   12          Pax6        0.697922       0.855108
   13          Dbn1        0.615245       0.314196
   14          Pdx1        0.535203       0.430788
   15         Meis2        0.518604       0.362668
   16      BC023829        0.514133       0.395154
   17         Cldn6        0.508870      -0.301879
   18         Glud1        0.481521       0.737995
   19       Smarcd2        0.460625       0.598269
   20       Lurap1l        0.456940       0.285098

── Velocity drivers: Delta (top 20, sorted by delta_velocity) ──
 rank     gene  delta_velocity  mean_velocity
    1     Cpa2        2.483908       2.092702
    2     Krt8        2.349361       1.407045
    3   Maged2        2.340670       3.380898
    4   Cryba2        1.744062       1.903599
    5     Ldha        1.710826       1.581080
    6      Hn1        1.123444       0.686879
    7   Cdkn1c        1.072009       1.007896
    8     Nnat        1.059840       1.480061
    9  Ppp1r1a        1.017116       1.694955
   10     Ambp        0.908994       0.752210
   11     Gnas        0.861592       6.604826
   12     Pax6        0.851152       1.008338
   13  Akr1c19        0.804932       1.123999
   14    Sparc        0.687036       0.649847
   15    Meis2        0.684853       0.528917
   16    Rpl12        0.670421      -0.154604
   17    Celf3        0.637237       0.473361
   18 BC023829        0.635596       0.516617
   19    Krt18        0.603725      -0.206350
   20    Foxa3        0.588107       0.411777

── Velocity drivers: Epsilon (top 20, sorted by delta_velocity) ──
 rank     gene  delta_velocity  mean_velocity
    1     Cpa2        1.345101       0.953895
    2   Cryba2        1.310028       1.469565
    3      Clu        1.293649       1.337410
    4    Meis2        1.209729       1.053792
    5     Pax4        0.843907       0.275551
    6    Syt13        0.799221       0.825314
    7    Siva1        0.731447       0.615843
    8     Pax6        0.731163       0.888349
    9  Lurap1l        0.654209       0.482367
   10  Arl6ip1        0.602216      -0.050500
   11   Nudt19        0.557069       0.466408
   12   Npepl1        0.499464      -0.043343
   13     Pdx1        0.488036       0.383621
   14    Rgs17        0.487121       0.559517
   15     Gars        0.450410       0.620145
   16     Cpt2        0.450140       0.263800
   17   Cdkn1c        0.401330       0.337217
   18 Ppp1r14b        0.393609      -0.294763
   19     Cdk1        0.393504       0.399177
   20      Tkt        0.385895       0.397531
Top velocity drivers (top 5 per fate):
[28]:
fate gene mean_velocity rank
0 Alpha Clu 1.532060 1
1 Alpha Cpa2 0.851096 2
2 Alpha Cryba2 1.217162 3
3 Alpha Ghrl 0.384223 4
4 Alpha Meis2 0.665699 5
5 Beta Ins2 3.689225 1
6 Beta Nnat 4.051527 2
7 Beta Isl1 1.070217 3
8 Beta Clu 1.648854 4
9 Beta 1700086L19Rik 0.453118 5
10 Delta Cpa2 2.092702 1
11 Delta Krt8 1.407045 2
12 Delta Maged2 3.380898 3
13 Delta Cryba2 1.903599 4
14 Delta Ldha 1.581080 5
15 Epsilon Cpa2 0.953895 1
16 Epsilon Cryba2 1.469565 2
17 Epsilon Clu 1.337410 3
18 Epsilon Meis2 1.053792 4
19 Epsilon Pax4 0.275551 5
[29]:
# DEG-based driver genes (top 20 per fate, vs the bifurcation cluster)
deg_drivers = scorer.get_deg_drivers(
    n_top_genes=20,
    pval_threshold=0.05,
    logfc_threshold=0.25,
)

print("Top DEG drivers (top 5 per fate):")
_stack_drivers(
    deg_drivers, n_per_fate=5,
    cols=["gene", "logfoldchange", "pval_adj", "pct_fate", "pct_progenitor"],
)


── DEG drivers: Alpha vs progenitor ──
   Significant: 761  (up: 376, down: 385)
   gene  logfoldchange      pval_adj  pct_fate  pct_progenitor
    Pyy     209.768677  1.181761e-93  0.968815             NaN
    Gcg     209.614731  9.196179e-64  0.656965             NaN
   Iapp      99.370232  4.065913e-99  0.854470             NaN
    Ttr      67.718864 2.617159e-103  1.000000             NaN
   Gnas      34.805965  2.937512e-52  1.000000             NaN
   Rbp4      33.815922  7.031753e-50  0.962578             NaN
  Hpgds      26.723766  3.763407e-02  0.087318             NaN
    Tnr      26.722797  2.795890e-02  0.091476             NaN
 Tmem27      17.364475 1.123328e-141  0.981289             NaN
Slc38a5      15.393972  4.940813e-60  0.920998             NaN
 Pcsk1n      12.127839  4.762578e-59  0.997921             NaN
   Gast       9.842274  1.419605e-32  0.538462             NaN
    Cpe       7.672148  1.079724e-40  1.000000             NaN
  Pcsk2       7.646369  6.491126e-63  0.846154             NaN
   Gpx3       7.507235  1.191563e-82  0.937630             NaN
  Meis2       6.603501  2.217035e-78  0.970894             NaN
  Peg10       6.532975  2.869438e-65  0.700624             NaN
   Ssr2       6.296807  1.585608e-69  0.985447             NaN
    Ppy       6.134921  1.251811e-15  0.378378             NaN
Tmsb15l       5.803368  2.070066e-73  0.711019             NaN

── DEG drivers: Beta vs progenitor ──
   Significant: 830  (up: 422, down: 408)
   gene  logfoldchange      pval_adj  pct_fate  pct_progenitor
   Iapp     185.345184 6.106240e-152  0.944162             NaN
    Pyy     139.123322  1.052978e-85  0.962775             NaN
   Ins1      99.108521  3.512617e-47  0.582064             NaN
   Ins2      47.560135  3.515790e-67  0.610829             NaN
   Nnat      42.420864 2.029962e-103  0.825719             NaN
   Rbp4      41.067871 6.750219e-103  0.996616             NaN
  P2ry1      27.845049  4.690784e-06  0.164129             NaN
   Fmo1      26.452681  3.894453e-02  0.081218             NaN
 Plppr1      26.415466  3.894453e-02  0.081218             NaN
   Gnas      26.366089  3.362044e-60  1.000000             NaN
    Ttr      22.955372  6.047775e-52  0.994924             NaN
   Dlk1      12.507628  4.599450e-97  0.966159             NaN
 Sec61b      12.047029 6.698179e-114  1.000000             NaN
    Npy      12.042974  1.086211e-07  0.191201             NaN
Ppp1r1a      11.322408  1.013795e-94  0.712352             NaN
  Pcsk2       9.181964 2.013853e-127  0.984772             NaN
   Calr       8.696516  3.976336e-95  0.959391             NaN
  Rpl10       8.650661  6.788600e-35  1.000000             NaN
  Hspa5       8.530251 2.027799e-109  0.969543             NaN
 Sdf2l1       7.616785  3.683485e-85  0.747885             NaN

── DEG drivers: Delta vs progenitor ──
   Significant: 386  (up: 156, down: 230)
  gene  logfoldchange     pval_adj  pct_fate  pct_progenitor
   Pyy     484.160583 5.419924e-34  1.000000             NaN
   Sst     332.010773 9.905999e-30  0.871429             NaN
  Rbp4     194.580841 2.024127e-37  1.000000             NaN
  Iapp     102.586014 2.319858e-35  0.985714             NaN
Ptprz1      28.902637 3.369656e-03  0.257143             NaN
 Cd24a      20.200207 1.633998e-31  1.000000             NaN
   Ppy      18.444256 6.494096e-07  0.485714             NaN
  Hhex      16.970627 6.566081e-31  0.985714             NaN
  Dlk1      16.967428 4.469675e-28  1.000000             NaN
 Rpl10      13.510839 4.188939e-13  1.000000             NaN
  Ssr2      10.061992 2.442933e-24  1.000000             NaN
  Gpx3       9.890513 3.689698e-25  0.985714             NaN
  Isl1       9.400220 3.580594e-12  1.000000             NaN
  Hadh       9.289947 7.724100e-31  0.971429             NaN
  Arg1       8.830122 9.905999e-30  0.900000             NaN
 Pcsk2       8.306222 1.427407e-20  0.928571             NaN
   Ttr       7.740336 4.406957e-03  0.971429             NaN
Spock3       7.235566 4.338918e-05  0.342857             NaN
  Mest       6.525658 6.399791e-18  0.714286             NaN
Igfbp7       6.383225 4.517170e-17  0.685714             NaN

── DEG drivers: Epsilon vs progenitor ──
   Significant: 518  (up: 266, down: 252)
   gene  logfoldchange     pval_adj  pct_fate  pct_progenitor
   Ghrl     495.486511 3.376669e-72  1.000000             NaN
    Pyy      81.501137 3.980708e-18  0.873239             NaN
   Rbp4      74.675728 6.843394e-44  0.992958             NaN
 Lrpprc      22.582724 8.565872e-16  0.873239             NaN
 Cdkn1a      21.258612 4.533111e-15  0.887324             NaN
    Gcg      20.178659 1.772858e-04  0.338028             NaN
   Iapp      18.956543 1.029570e-23  0.760563             NaN
   Isl1      18.905531 3.368229e-35  0.992958             NaN
    Ttr      16.562450 4.734949e-04  0.929577             NaN
 Maged2      15.220592 1.711104e-41  0.943662             NaN
  Hspa5       9.455750 8.303204e-39  0.957746             NaN
 Tmem27       8.894045 2.939794e-24  0.739437             NaN
   Calr       8.733923 2.532672e-39  0.950704             NaN
   Ssr2       7.860883 2.188520e-39  0.992958             NaN
   Arg1       7.192320 3.382770e-34  0.732394             NaN
  Rpl10       7.168509 2.855208e-09  1.000000             NaN
Slc38a5       7.034479 3.542731e-11  0.838028             NaN
  Ccnd2       6.007015 4.025146e-26  0.661972             NaN
  Acsl1       5.986801 1.903084e-05  0.267606             NaN
  Anpep       5.644922 5.435417e-36  0.802817             NaN
Top DEG drivers (top 5 per fate):
[29]:
fate gene logfoldchange pval_adj pct_fate pct_progenitor
0 Alpha Pyy 209.768677 1.181761e-93 0.968815 NaN
1 Alpha Gcg 209.614731 9.196179e-64 0.656965 NaN
2 Alpha Iapp 99.370232 4.065913e-99 0.854470 NaN
3 Alpha Ttr 67.718864 2.617159e-103 1.000000 NaN
4 Alpha Gnas 34.805965 2.937512e-52 1.000000 NaN
5 Beta Iapp 185.345184 6.106240e-152 0.944162 NaN
6 Beta Pyy 139.123322 1.052978e-85 0.962775 NaN
7 Beta Ins1 99.108521 3.512617e-47 0.582064 NaN
8 Beta Ins2 47.560135 3.515790e-67 0.610829 NaN
9 Beta Nnat 42.420864 2.029962e-103 0.825719 NaN
10 Delta Pyy 484.160583 5.419924e-34 1.000000 NaN
11 Delta Sst 332.010773 9.905999e-30 0.871429 NaN
12 Delta Rbp4 194.580841 2.024127e-37 1.000000 NaN
13 Delta Iapp 102.586014 2.319858e-35 0.985714 NaN
14 Delta Ptprz1 28.902637 3.369656e-03 0.257143 NaN
15 Epsilon Ghrl 495.486511 3.376669e-72 1.000000 NaN
16 Epsilon Pyy 81.501137 3.980708e-18 0.873239 NaN
17 Epsilon Rbp4 74.675728 6.843394e-44 0.992958 NaN
18 Epsilon Tnr 26.569187 2.522357e-01 0.091549 NaN
19 Epsilon Hpgds 26.106819 4.514074e-01 0.070423 NaN

11.3 Velocity-fate correlation drivers

get_velocity_fate_drivers() computes the Spearman correlation between each gene’s velocity and the per-cell fate affinity score for each fate arm. Genes with high positive correlation are being upregulated specifically as cells commit to that fate — a stronger signal than mean velocity alone, because it filters out genes that are fast everywhere.

Requires score(cell_level=True) to have been called first (needs result.cell_scores).

[30]:
# Velocity-fate correlation drivers
# Correlates each gene's velocity with per-cell fate affinity scores
# (requires cell_level=True in score())
try:
    vf_drivers = scorer.get_velocity_fate_drivers(
        result,
        n_top_genes=20,
        pval_threshold=0.05,
    )
    print("Top velocity-fate drivers (top 5 per fate):")
    display(_stack_drivers(
        vf_drivers, n_per_fate=5,
        cols=["gene", "spearman_r", "pval_adj", "mean_velocity", "delta_velocity"],
    ))
except ValueError as e:
    print(f"Skipped: {e}")
    print("Tip: call scorer.score(cell_level=True) first.")


── Velocity-fate drivers: Alpha (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 683 / 2000
 rank          gene  spearman_r     pval_adj  mean_velocity
    1        Adam23    0.332275 2.726568e-46       0.035280
    2           Klb    0.309560 6.025260e-40       0.045717
    3       Tmem206    0.265787 4.237692e-29       0.099139
    4        Notch2    0.257026 2.741812e-27       0.073804
    5        Sorcs2    0.235934 5.877863e-23       0.037468
    6       Rasgrf2    0.235511 6.661376e-23       0.025164
    7         Epha4    0.235026 7.813431e-23       0.052760
    8        Megf11    0.228079 1.824037e-21       0.007886
    9          Sgce    0.224234 9.184396e-21       0.052389
   10         Kif2c    0.223713 1.097454e-20       0.083291
   11         Sel1l    0.223152 1.274883e-20       0.055911
   12          Bst2    0.222681 1.434448e-20       0.190410
   13          Ctsb    0.219757 4.981190e-20      -0.068126
   14        Gabrb3    0.217725 1.071893e-19       0.055429
   15        Tmem51    0.214352 4.378911e-19       0.109053
   16       Fam234a    0.214279 4.378911e-19       0.044228
   17 1700011H14Rik    0.212987 7.086263e-19       0.285092
   18          Gch1    0.211215 1.369510e-18       0.472799
   19         Cdh11    0.211206 1.369510e-18       0.021027
   20       Zcchc16    0.210129 2.081817e-18       0.022147

── Velocity-fate drivers: Beta (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 924 / 2000
 rank    gene  spearman_r      pval_adj  mean_velocity
    1   Gng12    0.687204 4.410440e-259       0.364720
    2    Nnat    0.675808 1.430221e-247       4.051527
    3    Sdk2    0.571355 3.286960e-160       0.011898
    4   Dact2    0.559535 2.499373e-152       0.045325
    5   Ap1s2    0.549641 5.731258e-146       0.087439
    6   Vat1l    0.548981 1.264673e-145       0.064852
    7  Sphkap    0.541680 4.498258e-141       0.119721
    8    Mxd4    0.515216 2.594995e-125      -0.069785
    9  Slc2a2    0.511380 2.850628e-123       0.166863
   10    Nol4    0.506439 1.524017e-120       0.096487
   11 Shroom3    0.501075 1.259699e-117      -0.008355
   12  Zbtb7c    0.497687 7.505370e-116       0.024958
   13    Ins2    0.488525 5.173891e-111       3.689225
   14 Igfbpl1    0.474913 3.992365e-104       0.268164
   15 Slc7a14    0.461875  7.903311e-98       0.072172
   16  Ptprz1    0.459658  8.304841e-97       0.021035
   17   Ccnd2    0.458424  3.072975e-96       0.365967
   18      Gc    0.455471  7.272239e-95       0.137919
   19     Ghr    0.454304  2.464647e-94       0.248989
   20 Slc30a8    0.449188  5.200257e-92       0.450795

── Velocity-fate drivers: Delta (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 685 / 2000
 rank     gene  spearman_r     pval_adj  mean_velocity
    1   Zbtb20    0.303609 1.783641e-38       0.166360
    2   Fndc3a    0.293026 9.165943e-36      -0.005701
    3    Pbld2    0.258827 1.428849e-27       0.038566
    4    Vat1l    0.252901 2.293744e-26       0.065950
    5   Trim35    0.250792 5.950762e-26       0.184455
    6     Hap1    0.248696 1.537300e-25       0.031227
    7      Arx    0.245404 7.175120e-25      -0.018569
    8     Asb4    0.236969 3.565400e-23      -0.003783
    9   Vps37a    0.227608 2.126823e-21      -0.083059
   10     Chd3    0.223959 9.343843e-21      -0.079594
   11   Snap25    0.222951 1.330732e-20       0.048876
   12     Mia3    0.221878 2.046525e-20       0.047561
   13     Ufm1    0.217722 1.073379e-19      -0.142988
   14  Stard10    0.214950 3.289204e-19      -0.088858
   15 BC048546    0.214442 3.949377e-19      -0.140049
   16     Pon2    0.208136 4.756439e-18      -0.013737
   17     Tmx1    0.203896 2.536844e-17      -0.220162
   18      Cpm    0.202291 4.706904e-17       0.038128
   19     Gcsh    0.201279 6.868679e-17      -0.086339
   20     Hagh    0.198569 1.954275e-16       0.038047

── Velocity-fate drivers: Epsilon (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 924 / 2000
 rank    gene  spearman_r      pval_adj  mean_velocity
    1  Rab27a    0.514252 8.184805e-125       0.055813
    2    Pdk2    0.512925 4.186366e-124       0.051594
    3  Sorcs1    0.499450 8.951898e-117       0.005206
    4  Il17re    0.485971 1.042002e-109      -0.002558
    5   Auts2    0.484087 9.271443e-109       0.256760
    6   Pycr2    0.474232 8.299458e-104      -0.452137
    7  Nusap1    0.467321 2.024908e-100       0.014434
    8    Mnx1    0.465603  1.321285e-99      -0.004751
    9    Chga    0.460968  2.055227e-97      -2.422998
   10  Npepl1    0.449508  3.960569e-92      -0.043343
   11   Gm609    0.449416  4.222258e-92       0.024659
   12 Efcab10    0.448844  7.240713e-92      -0.045064
   13   Gsto1    0.445875  1.588429e-90      -0.086271
   14   Igf1r    0.445424  2.468092e-90      -0.019749
   15  Eps8l2    0.442219  6.679705e-89       0.126253
   16   Kcnb2    0.441447  1.439386e-88       0.029580
   17 Cbfa2t2    0.432747  9.568963e-85       0.159800
   18 Fam213a    0.432126  1.687532e-84       0.043521
   19   Josd2    0.428882  4.017044e-83       0.046612
   20  Kif20a    0.428003  9.327305e-83       0.081612
Top velocity-fate drivers (top 5 per fate):
fate gene spearman_r pval_adj mean_velocity delta_velocity
0 Alpha Adam23 0.332275 2.726568e-46 0.035280 0.026601
1 Alpha Klb 0.309560 6.025260e-40 0.045717 0.037019
2 Alpha Tmem206 0.265787 4.237692e-29 0.099139 0.149552
3 Alpha Notch2 0.257026 2.741812e-27 0.073804 0.069658
4 Alpha Sorcs2 0.235934 5.877863e-23 0.037468 0.026698
5 Beta Gng12 0.687204 4.410440e-259 0.364720 0.065286
6 Beta Nnat 0.675808 1.430221e-247 4.051527 3.631306
7 Beta Sdk2 0.571355 3.286960e-160 0.011898 0.011638
8 Beta Dact2 0.559535 2.499373e-152 0.045325 0.018165
9 Beta Ap1s2 0.549641 5.731258e-146 0.087439 0.024644
10 Delta Zbtb20 0.303609 1.783641e-38 0.166360 -0.036171
11 Delta Fndc3a 0.293026 9.165943e-36 -0.005701 -0.049914
12 Delta Pbld2 0.258827 1.428849e-27 0.038566 0.025424
13 Delta Vat1l 0.252901 2.293744e-26 0.065950 0.013837
14 Delta Trim35 0.250792 5.950762e-26 0.184455 0.063726
15 Epsilon Rab27a 0.514252 8.184805e-125 0.055813 0.007661
16 Epsilon Pdk2 0.512925 4.186366e-124 0.051594 0.072639
17 Epsilon Sorcs1 0.499450 8.951898e-117 0.005206 0.004127
18 Epsilon Il17re 0.485971 1.042002e-109 -0.002558 -0.027604
19 Epsilon Auts2 0.484087 9.271443e-109 0.256760 0.302319

12. Pathway enrichment

get_enrichment() runs over-representation analysis (ORA) on the DEG driver genes for each fate using gseapy (KEGG, GO Biological Process, Reactome).

Requires an internet connection and a valid organism code.

[31]:
# Pathway enrichment per fate
# Note: requires gseapy and internet access for Enrichr API
# Results will be empty with synthetic data (too few significant DEGs)

try:
    enrichment = scorer.get_enrichment(
        deg_drivers,
        organism="mouse",     # "mouse" or "human"
        pval_threshold=0.05,
        logfc_threshold=0.25,
        plot=True,            # dot plots saved per fate
        n_top_pathways=15,
    )

    # Single compact summary: top 3 pathways per (fate, direction)
    enrich_df = _stack_enrichment(enrichment, n_per_group=3)
    if not enrich_df.empty:
        display(enrich_df[["fate", "direction", "Gene_set", "Term", "Overlap", "Adjusted P-value"]])
    else:
        print("(No enriched terms — typical with small / synthetic datasets.)")

except ImportError:
    print("gseapy not installed. Run: pip install gseapy")
except Exception as e:
    print(f"Enrichment skipped: {e}")
    print("(Expected with synthetic data — use real data for meaningful results.)")


============================================================
  Pathway enrichment: Alpha
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 376
  Down-regulated genes: 385
============================================================

  [up] Significant terms: 121
       Gene_set                                                       Term Overlap  Adjusted P-value
KEGG_2019_Mouse                Protein processing in endoplasmic reticulum  17/163          0.000003
KEGG_2019_Mouse                                  Thyroid hormone synthesis   10/73          0.000127
KEGG_2019_Mouse                                                   Lysosome  12/124          0.000313
KEGG_2019_Mouse                                     Gastric acid secretion    9/74          0.000600
  Reactome_2022   Response To Elevated Platelet Cytosolic Ca2+ R-HSA-76005  13/130          0.000783
KEGG_2019_Mouse                                      Circadian entrainment   10/99          0.000825
  Reactome_2022                        Platelet Degranulation R-HSA-114608  12/125          0.000966
  Reactome_2022                                    Hemostasis R-HSA-109582  28/576          0.000966
  Reactome_2022                     G-protein Mediated Events R-HSA-112040    8/52          0.000966
  Reactome_2022 Platelet Activation, Signaling And Aggregation R-HSA-76002  17/254          0.001008

  [down] Significant terms: 252
                  Gene_set                                                                                  Term Overlap  Adjusted P-value
GO_Biological_Process_2021                            cotranslational protein targeting to membrane (GO:0006613)   24/94      1.627570e-17
GO_Biological_Process_2021                                                  cytoplasmic translation (GO:0002181)   24/93      1.627570e-17
GO_Biological_Process_2021              SRP-dependent cotranslational protein targeting to membrane (GO:0006614)   23/90      6.790532e-17
             Reactome_2022                                        Eukaryotic Translation Elongation R-HSA-156842   22/90      5.465181e-16
             Reactome_2022                                        Eukaryotic Translation Termination R-HSA-72764   22/90      5.465181e-16
             Reactome_2022                                                 Peptide Chain Elongation R-HSA-156902   22/86      5.465181e-16
             Reactome_2022 Nonsense Mediated Decay (NMD) Independent Of Exon Junction Complex (EJC) R-HSA-975956   22/92      6.859847e-16
GO_Biological_Process_2021      nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184)  24/113      8.686504e-16
GO_Biological_Process_2021                                                  protein targeting to ER (GO:0045047)  23/103      1.101897e-15
             Reactome_2022                                                   Viral mRNA Translation R-HSA-192823   21/90      4.696474e-15
_images/scCS_tutorial_single_51_1.png
_images/scCS_tutorial_single_51_2.png

============================================================
  Pathway enrichment: Beta
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 422
  Down-regulated genes: 408
============================================================

  [up] Significant terms: 121
                  Gene_set                                                     Term Overlap  Adjusted P-value
           KEGG_2019_Mouse              Protein processing in endoplasmic reticulum  23/163      1.407868e-10
GO_Biological_Process_2021     IRE1-mediated unfolded protein response (GO:0036498)   12/53      1.863342e-06
           KEGG_2019_Mouse                                        Insulin secretion   13/86      3.415612e-06
             Reactome_2022 Amino Acid Transport Across Plasma Membrane R-HSA-352230    9/33      1.321921e-05
             Reactome_2022              IRE1alpha Activates Chaperones R-HSA-381070   10/48      1.321921e-05
             Reactome_2022             Unfolded Protein Response (UPR) R-HSA-381119   13/89      1.321921e-05
             Reactome_2022            Integration Of Energy Metabolism R-HSA-163685  13/105      6.483513e-05
             Reactome_2022           XBP1(S) Activates Chaperone Genes R-HSA-381038    9/46      6.628341e-05
             Reactome_2022             Regulation Of Insulin Secretion R-HSA-422356   11/77      8.152971e-05
GO_Biological_Process_2021                        amino acid transport (GO:0006865)   10/50      8.704602e-05

  [down] Significant terms: 245
                  Gene_set                                                                                  Term Overlap  Adjusted P-value
GO_Biological_Process_2021                                                  cytoplasmic translation (GO:0002181)   28/93      4.333388e-22
GO_Biological_Process_2021                            cotranslational protein targeting to membrane (GO:0006613)   28/94      4.333388e-22
GO_Biological_Process_2021              SRP-dependent cotranslational protein targeting to membrane (GO:0006614)   27/90      1.679608e-21
             Reactome_2022                                                 Peptide Chain Elongation R-HSA-156902   26/86      1.148227e-20
             Reactome_2022                                        Eukaryotic Translation Elongation R-HSA-156842   26/90      1.428872e-20
             Reactome_2022                                        Eukaryotic Translation Termination R-HSA-72764   26/90      1.428872e-20
             Reactome_2022 Nonsense Mediated Decay (NMD) Independent Of Exon Junction Complex (EJC) R-HSA-975956   26/92      2.014986e-20
GO_Biological_Process_2021      nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184)  28/113      5.547475e-20
GO_Biological_Process_2021                                                  protein targeting to ER (GO:0045047)  27/103      5.547475e-20
             Reactome_2022                                                   Viral mRNA Translation R-HSA-192823   25/90      1.461345e-19
_images/scCS_tutorial_single_51_4.png
_images/scCS_tutorial_single_51_5.png

============================================================
  Pathway enrichment: Delta
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 156
  Down-regulated genes: 230
============================================================

  [up] Significant terms: 41
                  Gene_set                                                                        Term Overlap  Adjusted P-value
           KEGG_2019_Mouse                                 Protein processing in endoplasmic reticulum  14/163      6.690234e-09
           KEGG_2019_Mouse                                                   Thyroid hormone synthesis    7/73      1.429359e-04
             Reactome_2022                                         Metabolism Of Proteins R-HSA-392499 35/1890      3.776032e-04
             Reactome_2022                                 IRE1alpha Activates Chaperones R-HSA-381070    5/48      3.289099e-03
             Reactome_2022                Incretin Synthesis, Secretion, And Inactivation R-HSA-400508    4/24      3.289099e-03
             Reactome_2022   SRP-dependent Cotranslational Protein Targeting To Membrane R-HSA-1799339   7/108      3.289099e-03
           KEGG_2019_Mouse                                                              Protein export    4/28      3.697039e-03
GO_Biological_Process_2021                                regulation of insulin secretion (GO:0050796)   7/104      4.368748e-03
GO_Biological_Process_2021 posttranslational protein targeting to membrane, translocation (GO:0031204)     3/7      4.368748e-03
GO_Biological_Process_2021                                 protein N-linked glycosylation (GO:0006487)    6/67      4.368748e-03
_images/scCS_tutorial_single_51_7.png

============================================================
  Pathway enrichment: Epsilon
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 266
  Down-regulated genes: 252
============================================================

  [up] Significant terms: 47
                  Gene_set                                                                                      Term Overlap  Adjusted P-value
           KEGG_2019_Mouse                                               Protein processing in endoplasmic reticulum  17/163      1.392463e-08
             Reactome_2022                                                  Peptide Hormone Metabolism R-HSA-2980736   10/89      1.607304e-04
             Reactome_2022 Aberrant Regulation Of Mitotic G1/S Transition In Cancer Due To RB1 Defects R-HSA-9659787    5/17      4.275598e-04
             Reactome_2022                                               IRE1alpha Activates Chaperones R-HSA-381070    7/48      4.275598e-04
             Reactome_2022                                              Unfolded Protein Response (UPR) R-HSA-381119    9/89      4.275598e-04
           KEGG_2019_Mouse                                                                                  Lysosome  10/124      6.781878e-04
             Reactome_2022                                                       Platelet Degranulation R-HSA-114608  10/125      7.374527e-04
GO_Biological_Process_2021                                      IRE1-mediated unfolded protein response (GO:0036498)    8/53      8.563951e-04
             Reactome_2022                                  Response To Elevated Platelet Cytosolic Ca2+ R-HSA-76005  10/130      8.709228e-04
             Reactome_2022                              Incretin Synthesis, Secretion, And Inactivation R-HSA-400508    5/24      1.076303e-03

  [down] Significant terms: 361
                  Gene_set                                                                                  Term Overlap  Adjusted P-value
GO_Biological_Process_2021                            cotranslational protein targeting to membrane (GO:0006613)   27/94      1.419635e-26
GO_Biological_Process_2021                                                  cytoplasmic translation (GO:0002181)   27/93      1.419635e-26
GO_Biological_Process_2021              SRP-dependent cotranslational protein targeting to membrane (GO:0006614)   26/90      9.147683e-26
             Reactome_2022                                                 Peptide Chain Elongation R-HSA-156902   25/86      1.231498e-24
             Reactome_2022                                        Eukaryotic Translation Elongation R-HSA-156842   25/90      1.482618e-24
             Reactome_2022                                        Eukaryotic Translation Termination R-HSA-72764   25/90      1.482618e-24
GO_Biological_Process_2021      nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184)  27/113      1.781313e-24
             Reactome_2022 Nonsense Mediated Decay (NMD) Independent Of Exon Junction Complex (EJC) R-HSA-975956   25/92      2.059459e-24
GO_Biological_Process_2021                                                  protein targeting to ER (GO:0045047)  26/103      2.772652e-24
             Reactome_2022                                                   Viral mRNA Translation R-HSA-192823   24/90      2.428940e-23
_images/scCS_tutorial_single_51_9.png
_images/scCS_tutorial_single_51_10.png
fate direction Gene_set Term Overlap Adjusted P-value
0 Alpha up KEGG_2019_Mouse Protein processing in endoplasmic reticulum 17/163 3.014548e-06
1 Alpha up KEGG_2019_Mouse Thyroid hormone synthesis 10/73 1.267707e-04
2 Alpha up KEGG_2019_Mouse Lysosome 12/124 3.133244e-04
3 Alpha down GO_Biological_Process_2021 cotranslational protein targeting to membrane ... 24/94 1.627570e-17
4 Alpha down GO_Biological_Process_2021 cytoplasmic translation (GO:0002181) 24/93 1.627570e-17
... ... ... ... ... ... ...
16 Epsilon up Reactome_2022 Peptide Hormone Metabolism R-HSA-2980736 10/89 1.607304e-04
17 Epsilon up Reactome_2022 Aberrant Regulation Of Mitotic G1/S Transition... 5/17 4.275598e-04
18 Epsilon down GO_Biological_Process_2021 cotranslational protein targeting to membrane ... 27/94 1.419635e-26
19 Epsilon down GO_Biological_Process_2021 cytoplasmic translation (GO:0002181) 27/93 1.419635e-26
20 Epsilon down GO_Biological_Process_2021 SRP-dependent cotranslational protein targetin... 26/90 9.147683e-26

21 rows × 6 columns

[32]:
# ── Custom gene sets ──────────────────────────────────────────────────────
# You can specify any Enrichr library:
# https://maayanlab.cloud/Enrichr/#libraries

custom_gene_sets = [
    'KEGG_2019_Mouse',
    'GO_Biological_Process_2021',
    'GO_Molecular_Function_2021',
    'Reactome_2022',
    'WikiPathway_2021_Mouse',
]

# enrichment = scorer.get_enrichment(
#     deg_drivers,
#     gene_sets=custom_gene_sets,
#     organism='mouse',
# )
print("Custom gene sets configured (commented out to avoid API call with synthetic data).")
print(f"Available sets: {custom_gene_sets}")

Custom gene sets configured (commented out to avoid API call with synthetic data).
Available sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'GO_Molecular_Function_2021', 'Reactome_2022', 'WikiPathway_2021_Mouse']
[33]:
# Export enrichment tables to CSV
try:
    scCS.export_enrichment_tables(enrichment, output_dir="enrichment_results/")
    print("Tables saved to enrichment_results/")
except Exception as e:
    print(f"Export skipped: {e}")
Saved: enrichment_results/enrichment_Alpha_up.csv
Saved: enrichment_results/enrichment_Alpha_down.csv
Saved: enrichment_results/enrichment_Beta_up.csv
Saved: enrichment_results/enrichment_Beta_down.csv
Saved: enrichment_results/enrichment_Delta_up.csv
Saved: enrichment_results/enrichment_Epsilon_up.csv
Saved: enrichment_results/enrichment_Epsilon_down.csv
Tables saved to enrichment_results/

Summary — scCS single-condition workflow

import scCS

# 1. Create scorer
scorer = scCS.SingleScorer(
    adata,
    root="Ductal",
    branches=["Alpha", "Beta", "Delta", "Epsilon"],
    obs_key="clusters",
)

# 2. Build embedding (fix arm coverage)
scorer.build_embedding(ordering_metric="pseudotime")
scorer.refit_pseudotime(scale_01=True)

# 3. Fit & score
scorer.fit()
result = scorer.score(cell_level=True, k_nn=15, n_bootstrap=500)
print(result.summary())

# 4. Visualize
scorer.plot_star(result)
scorer.plot_rose(result)
scorer.plot_pairwise_cs(result)
scorer.plot_commitment_bar(result)

# 5. Transfer labels to full adata
scorer.transfer_labels(adata, result)

# 6. Driver genes & enrichment
vel_drivers = scorer.get_velocity_drivers()
deg_drivers = scorer.get_deg_drivers()
enrichment  = scorer.get_enrichment(deg_drivers, organism="Human")

For multi-condition analysis, see the companion notebooks