scCS Tutorial — Single-Condition Analysis¶

Single-cell Commitment Scores with Radial Star Embedding¶

scCS v0.7.0 computes RNA velocity-based commitment scores for k-furcation cell fate decisions.

What scCS does¶

Takes a user-defined bifurcation cluster (progenitor) and k terminal fate clusters
Builds a radial star embedding — progenitor at origin, each fate on its own arm
Projects RNA velocity into this embedding
Computes commitment scores (unCS, nCS) and per-cell fate affinities
Optionally bootstraps confidence intervals on all scores
Identifies driver genes (velocity-based + DEG-based) per fate arm
Runs pathway enrichment (KEGG, GO BP, Reactome) per fate

Reference¶

Kriukov et al. (2025) Single-cell transcriptome of myeloid cells in response to transplantation of human retinal neurons reveals reversibility of microglial activation

Notebook outline

Section	What you learn
Installation	pip install
Load & preprocess	scVelo pancreas dataset
RNA velocity	scVelo dynamical model
Build star embedding	`SingleScorer.build_embedding()`
Fix arm coverage	`rebuild_embedding_with_subset_pseudotime()`
Fit & score	`fit()`, `score()`, bootstrap CI
Visualize	star plot, rose, pairwise heatmap, bar chart
Per-cell scores	fate affinity, entropy, NN-smoothed entropy
Transfer labels	write scores back to full adata
Subset scoring	`score_per_subset()`
Driver genes	velocity drivers, DEG drivers
Pathway enrichment	KEGG / GO BP per fate
Expression trends	`plot_expression_trends()`

1. Installation¶

[1]:

%matplotlib inline
# Install from GitHub (recommended)
# !pip install git+https://github.com/mcrewcow/scCS.git

# Or from a local directory:
#!pip install -e /...

import scCS
print(f"scCS version: {scCS.__version__}")

scCS version: 0.7.4

2. Load & preprocess data¶

We use the built-in pancreas dataset from scVelo — a well-studied k=4 bifurcation (Ductal → Alpha, Beta, Delta, Epsilon). The dataset already contains spliced/unspliced counts and cluster annotations.

[2]:

import numpy as np
import pandas as pd
import anndata
import matplotlib.pyplot as plt
import scanpy as sc
import scvelo as scv

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
try:
    from statsmodels.tools.sm_exceptions import ConvergenceWarning
    warnings.filterwarnings("ignore", category=ConvergenceWarning)
except ImportError:
    pass

sc.settings.verbosity = 1
scv.settings.verbosity = 1

adata = scv.datasets.pancreas()
print(adata)
print("\nCluster labels:", adata.obs['clusters'].unique().tolist())

# --- Tutorial display helpers (compact tables instead of per-fate loops) ---
pd.set_option("display.max_rows", 20)
pd.set_option("display.max_columns", 15)
pd.set_option("display.width", 160)


def _stack_drivers(driver_dict, n_per_fate=5, cols=None):
    """Concatenate a {fate: DataFrame} dict into one tidy long-form table.

    Shows the top ``n_per_fate`` rows per fate; adds a ``fate`` column at the
    front. Pass ``cols`` to pick a subset of columns to display.
    """
    out = []
    for fate, df in driver_dict.items():
        if df is None or len(df) == 0:
            continue
        block = df.head(n_per_fate).copy()
        block.insert(0, "fate", fate)
        out.append(block)
    if not out:
        return pd.DataFrame()
    big = pd.concat(out, ignore_index=True)
    if cols is not None:
        keep = ["fate"] + [c for c in cols if c in big.columns]
        big = big[keep]
    return big


def _stack_enrichment(enrichment_dict, n_per_group=3):
    """Flatten enrichment {fate: {direction: df}} into one tidy table."""
    rows = []
    for fate, fate_results in enrichment_dict.items():
        for direction in ("up", "down"):
            df = fate_results.get(direction, None)
            if df is None or df.empty:
                continue
            top = df.head(n_per_group).copy()
            top.insert(0, "direction", direction)
            top.insert(0, "fate", fate)
            rows.append(top)
    if not rows:
        return pd.DataFrame()
    return pd.concat(rows, ignore_index=True)

AnnData object with n_obs × n_vars = 3696 × 27998
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
    var: 'highly_variable_genes'
    uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'

Cluster labels: ['Pre-endocrine', 'Ductal', 'Alpha', 'Ngn3 high EP', 'Delta', 'Beta', 'Ngn3 low EP', 'Epsilon']

[3]:

# Standard scVelo preprocessing
import inspect as _inspect
_fn_sig = _inspect.signature(scv.pp.filter_and_normalize).parameters
if "n_top_genes" in _fn_sig:
    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
else:
    # scvelo builds without n_top_genes: do filter_and_normalize then HVG.
    # Use flavor="cell_ranger" not the default "seurat": seurat passes
    # int n_bins to pd.cut on log-dispersions which contain +/-inf for
    # low-mean genes — pandas >= 2.2 rejects this with ValueError.
    # cell_ranger uses explicit bin edges including +/-inf and is
    # robust across pandas/py versions (scCS changelog v0.7.4).
    scv.pp.filter_and_normalize(adata, min_shared_counts=20)
    sc.pp.highly_variable_genes(
        adata, n_top_genes=2000, subset=True, flavor="cell_ranger"
    )
sc.pp.neighbors(adata)
scv.pp.moments(adata, n_pcs=None, n_neighbors=None)

3. RNA velocity¶

We run the dynamical model (most accurate) via scVelo. If your data already has velocity and velocity_graph, skip this section.

[4]:

scv.tl.recover_dynamics(adata, n_jobs=32)
scv.tl.velocity(adata, mode="dynamical")
scv.tl.velocity_graph(adata)
scv.tl.velocity_pseudotime(adata)

# Visualise velocity on UMAP
scv.pl.velocity_embedding_stream(adata, basis="umap", color="clusters",
                                  title="RNA velocity — pancreas")

/home/emil/miniforge3/envs/lab-py312/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=3974) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()

/home/emil/miniforge3/envs/lab-py312/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=3974) is multi-threaded, use of fork() may lead to deadlocks in the child.
  self.pid = os.fork()

Alternatively, you can let scCS run the full velocity pipeline for you:

scorer = scCS.SingleScorer(adata, ...)
scorer.compute_velocity(mode="dynamical", n_top_genes=2000)

4. Build the radial star embedding¶

The pancreas dataset has a clear bifurcation from Ductal progenitors into four terminal fates: Alpha, Beta, Delta, Epsilon.

SingleScorer requires:

root — the progenitor cluster label
branches — list of terminal fate cluster labels
obs_key — column in adata.obs with cluster labels

[5]:

scorer = scCS.SingleScorer(
    adata,
    root="Pre-endocrine",
    branches=["Alpha", "Beta", "Delta", "Epsilon"],
    obs_key="clusters",
    n_angle_bins=36,          # 10° angular bins (default)
    sector_method="centroid",  # anchor sectors to fate centroids (recommended)
)

# Build the star embedding using scVelo pseudotime
scorer.build_embedding(
    ordering_metric="pseudotime",  # uses adata.obs['velocity_pseudotime']
    arm_scale=10.0,   # maximum radial distance
    jitter=0.3,       # perpendicular noise to avoid overplotting
)

print("Embedding shape:", scorer.adata_sub.obsm["X_sccs"].shape)
print("Cells in embedding:", scorer.adata_sub.n_obs)

[scCS] Building star embedding: root='Pre-endocrine', k=4 fates, metric='pseudotime'
[scCS] Subsetting: 1876 / 3696 cells kept
       (1820 cells from other populations excluded)
       Alpha: 481 cells (fate)
       Beta: 591 cells (fate)
       Delta: 70 cells (fate)
       Epsilon: 142 cells (fate)
       Pre-endocrine: 592 cells (progenitor)

[scCS] Star embedding built → adata_sub.obsm["X_sccs"] shape: (1876, 2)
       Arm angles: {'Alpha': 0.0, 'Beta': 90.0, 'Delta': 180.0, 'Epsilon': 270.0}
[scCS] Star embedding stored in scorer.adata_sub.obsm['X_sccs']. (1876 cells)
Embedding shape: (1876, 2)
Cells in embedding: 1876

5. Fix arm coverage with subset-local pseudotime¶

Problem: scVelo pseudotime is computed on the full adata. When we subset to bifurcation + terminal fate cells, the pseudotime range is compressed — cells cluster near the origin instead of spanning the full arm length.

Solution: refit_pseudotime() extracts the velocity_graph submatrix for the subset cells, recomputes pseudotime locally, scales it to [0, 1], and rebuilds the embedding.

When to use ``scale_01=True`` vs ``False``:

scale_01=True (default) — cells span the full arm; best for single-condition analysis

scale_01=False — preserves absolute pseudotime ordering; use for multi-condition comparisons where cross-condition pseudotime comparability matters

[6]:

# Recompute pseudotime on the subset subgraph and rebuild the embedding
scorer.refit_pseudotime(scale_01=True)

# Inspect the corrected pseudotime
pt_sub = scorer.adata_sub.obs["velocity_pseudotime"]
print(f"Subset pseudotime range: [{pt_sub.min():.3f}, {pt_sub.max():.3f}]")

[scCS] Recomputing pseudotime on subset (1876 / 3696 cells)...
[scCS] Used scanpy DPT (with diffmap) as pseudotime fallback.
[scCS] Subset pseudotime scaled to [0, 1].
[scCS] Subset pseudotime stored in adata_sub.obs['sccs_pseudotime']. Range: [0.000, 1.000]
[scCS] Rebuilding star embedding with subset-local pseudotime...
[scCS] Subsetting: 1876 / 3696 cells kept
       (1820 cells from other populations excluded)
       Alpha: 481 cells (fate)
       Beta: 591 cells (fate)
       Delta: 70 cells (fate)
       Epsilon: 142 cells (fate)
       Pre-endocrine: 592 cells (progenitor)

[scCS] Star embedding built → adata_sub.obsm["X_sccs"] shape: (1876, 2)
       Arm angles: {'Alpha': 0.0, 'Beta': 90.0, 'Delta': 180.0, 'Epsilon': 270.0}
[scCS] Embedding rebuilt. Call fit() again to update the FateMap and velocity projection.
Subset pseudotime range: [0.495, 1.000]

6. Fit and compute commitment scores¶

fit() builds the FateMap (validates clusters, computes fate centroids, projects velocity). score() computes all commitment scores.

Key outputs in `CommitmentScoreResult`¶

Attribute	Description
`pairwise_unCS`	Unnormalized CS matrix (k × k)
`pairwise_nCS`	Cell-count-normalized CS matrix (k × k)
`commitment_vector`	Fraction of velocity mass per fate
`population_entropy`	Population-level commitment entropy
`mean_cell_entropy`	Mean per-cell fate entropy
`per_fate_entropy`	Per-fate binary entropy
`cell_scores`	Per-cell fate affinity matrix (n_cells × k)
`bootstrap_ci`	Bootstrap CI dict (if `n_bootstrap > 0`)

[7]:

scorer.fit()

[scCS] Root cluster 'Pre-endocrine': 592 cells, centroid=(-0.00, -0.00)
[scCS]   Fate 'Alpha': 481 cells, centroid=(3.51, -0.01)
[scCS]   Fate 'Beta': 591 cells, centroid=(0.03, 4.38)
[scCS]   Fate 'Delta': 70 cells, centroid=(-4.75, -0.05)
[scCS]   Fate 'Epsilon': 142 cells, centroid=(-0.02, -2.55)
[scCS] FateMap built: k=4 fates
[scCS] Projecting velocity via scVelo on full adata → slicing to subset...
[scCS] Velocity projected. Shape: (1876, 2)
FateMap  (root='Pre-endocrine', k=4)
  Cluster key : 'clusters'
  Root cells  : 592
  Root centroid: (-0.005, -0.003)
  Fate 0: 'Alpha'  n_cells=481  centroid=(3.51, -0.01)  arm_angle=0.0°
  Fate 1: 'Beta'  n_cells=591  centroid=(0.03, 4.38)  arm_angle=90.0°
  Fate 2: 'Delta'  n_cells=70  centroid=(-4.75, -0.05)  arm_angle=180.0°
  Fate 3: 'Epsilon'  n_cells=142  centroid=(-0.02, -2.55)  arm_angle=270.0°

[7]:

SingleScorer(root='Pre-endocrine', branches=['Alpha', 'Beta', 'Delta', 'Epsilon'], k=4, status='fitted')

[8]:

fig = scorer.plot_nn_entropy_elbow(k_nn_range=range(5, 51, 5))
plt.show()

[9]:

# Score with bootstrap confidence intervals (n_bootstrap=500 recommended)
result = scorer.score(
    cell_level=True,   # compute per-cell fate affinities
    k_nn=15,                   # NN-smoothed entropy (15 nearest neighbours)
    n_bootstrap=500,           # bootstrap CI on nCS
    bootstrap_ci=0.95,         # 95% CI
)

[scCS] Computing bootstrap CI (n=500)...
=== CommitmentScoreResult ===
  Fates (4): Alpha, Beta, Delta, Epsilon
  Dominant fate: Beta

  Entropy metrics:
    Population entropy:           0.7052  [aggregate velocity-mass balance]
    Mean cell entropy:            0.9361  [per-cell average, k-way]
    Per-fate cell entropy:
      Alpha: 0.8246
      Beta: 0.8608
      Delta: 0.7541
      Epsilon: 0.6399
    NN-smoothed entropy (k=15):  mean=0.9547  [per-cell, stored in adata_sub.obs['cs_nn_entropy']]

  Commitment vector (normalized):
    Alpha: 0.2210
    Beta: 0.6449
    Delta: 0.0765
    Epsilon: 0.0576

  Pairwise nCS matrix:
            Alpha      Beta     Delta   Epsilon
Alpha    1.000000  0.421157  0.420532  1.133082
Beta     2.374413  1.000000  0.998516  2.690405
Delta    2.377942  1.001486  1.000000  2.694404
Epsilon  0.882548  0.371691  0.371140  1.000000

  Bootstrap 95% CI on nCS (n=500):
    CI low:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.369  0.352    0.922
Beta     2.119  1.000  0.851    2.270
Delta    1.989  0.855  1.000    2.103
Epsilon  0.723  0.303  0.290    1.000
    CI high:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.472  0.503    1.383
Beta     2.712  1.000  1.170    3.297
Delta    2.838  1.175  1.000    3.449
Epsilon  1.084  0.440  0.476    1.000

[10]:

# Summary table
print(result.summary())

=== CommitmentScoreResult ===
  Fates (4): Alpha, Beta, Delta, Epsilon
  Dominant fate: Beta

  Entropy metrics:
    Population entropy:           0.7052  [aggregate velocity-mass balance]
    Mean cell entropy:            0.9361  [per-cell average, k-way]
    Per-fate cell entropy:
      Alpha: 0.8246
      Beta: 0.8608
      Delta: 0.7541
      Epsilon: 0.6399
    NN-smoothed entropy (k=15):  mean=0.9547  [per-cell, stored in adata_sub.obs['cs_nn_entropy']]

  Commitment vector (normalized):
    Alpha: 0.2210
    Beta: 0.6449
    Delta: 0.0765
    Epsilon: 0.0576

  Pairwise nCS matrix:
            Alpha      Beta     Delta   Epsilon
Alpha    1.000000  0.421157  0.420532  1.133082
Beta     2.374413  1.000000  0.998516  2.690405
Delta    2.377942  1.001486  1.000000  2.694404
Epsilon  0.882548  0.371691  0.371140  1.000000

  Bootstrap 95% CI on nCS (n=500):
    CI low:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.369  0.352    0.922
Beta     2.119  1.000  0.851    2.270
Delta    1.989  0.855  1.000    2.103
Epsilon  0.723  0.303  0.290    1.000
    CI high:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.472  0.503    1.383
Beta     2.712  1.000  1.170    3.297
Delta    2.838  1.175  1.000    3.449
Epsilon  1.084  0.440  0.476    1.000

[11]:

# Access individual scores
print("Pairwise nCS matrix:")
print(pd.DataFrame(
    result.pairwise_nCS,
    index=result.fate_names,
    columns=result.fate_names
).round(3))

print("\nCommitment vector (fraction of velocity mass per fate):")
for fate, cv in zip(result.fate_names, result.commitment_vector):
    print(f"  {fate}: {cv:.3f}")

print(f"\nPopulation entropy: {result.population_entropy:.3f}")
print(f"Mean per-cell entropy: {result.mean_cell_entropy:.3f}")

Pairwise nCS matrix:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.421  0.421    1.133
Beta     2.374  1.000  0.999    2.690
Delta    2.378  1.001  1.000    2.694
Epsilon  0.883  0.372  0.371    1.000

Commitment vector (fraction of velocity mass per fate):
  Alpha: 0.221
  Beta: 0.645
  Delta: 0.076
  Epsilon: 0.058

Population entropy: 0.705
Mean per-cell entropy: 0.936

[12]:

# Bootstrap CI (if n_bootstrap > 0)
if result.bootstrap_ci is not None:
    ci = result.bootstrap_ci
    print(f"Bootstrap CI ({int(ci['ci_level']*100)}%, n={ci['n_bootstrap']}):")
    print("  nCS CI low:")
    print(pd.DataFrame(ci['ci_low'], index=result.fate_names, columns=result.fate_names).round(3))
    print("\n  nCS CI high:")
    print(pd.DataFrame(ci['ci_high'], index=result.fate_names, columns=result.fate_names).round(3))

Bootstrap CI (95%, n=500):
  nCS CI low:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.369  0.352    0.922
Beta     2.119  1.000  0.851    2.270
Delta    1.989  0.855  1.000    2.103
Epsilon  0.723  0.303  0.290    1.000

  nCS CI high:
         Alpha   Beta  Delta  Epsilon
Alpha    1.000  0.472  0.503    1.383
Beta     2.712  1.000  1.170    3.297
Delta    2.838  1.175  1.000    3.449
Epsilon  1.084  0.440  0.476    1.000

7. Visualizations¶

scCS provides several built-in plots. All are accessible as methods on the scorer or as standalone functions in scCS.plot.

7.1 Radial star embedding¶

[13]:

# Primary visualization: cells in the star embedding, coloured by fate
fig = scorer.plot_star(result, figsize=(8, 8))
plt.tight_layout()
plt.show()

[14]:

# Colour by per-cell entropy instead of fate
fig = scorer.plot_star(result, color_by="entropy", figsize=(8, 8))
plt.tight_layout()
plt.show()

[15]:

# Colour by NN-smoothed entropy (requires k_nn in score())
fig = scorer.plot_star(result, color_by="nn_entropy", figsize=(8, 8))
plt.tight_layout()
plt.show()

7.2 Rose / polar plot¶

[16]:

# Rose plot: cumulative velocity magnitude per angular bin
fig = scorer.plot_rose(result, figsize=(6, 6))
plt.tight_layout()
plt.show()

7.3 Pairwise nCS heatmap¶

[17]:

# Heatmap of pairwise normalized commitment scores
fig = scorer.plot_pairwise_cs(result, figsize=(5, 4))
plt.tight_layout()
plt.show()

7.4 Commitment bar chart¶

[18]:

# Bar chart: unCS vs nCS per fate pair
fig = scorer.plot_commitment_bar(result, figsize=(8, 4))
plt.tight_layout()
plt.show()

7.5 Per-cell fate affinity heatmap¶

[19]:

# Heatmap of per-cell fate affinities (rows = cells, cols = fates)
fig = scorer.plot_commitment_heatmap(result, figsize=(6, 5), max_cells=500)
plt.tight_layout()
plt.show()

7.6 NN entropy elbow plot¶

[20]:

# Elbow plot to choose k_nn for NN-smoothed entropy
# Sweeps k_nn values and plots mean NN entropy
fig = scorer.plot_nn_entropy_elbow(k_nn_range=range(5, 51, 5), figsize=(8, 4))
plt.tight_layout()
plt.show()

8. Per-cell fate affinity scores¶

After score(cell_level=True), per-cell scores are stored in scorer.adata_sub.obs:

Column	Description
`cs_{fate_name}`	Per-cell affinity for each fate (sums to 1 per cell)
`cs_dominant_fate`	Fate with highest affinity
`cs_entropy`	Per-cell commitment entropy (0 = fully committed, 1 = uniform)
`cs_nn_entropy`	NN-smoothed entropy (if `k_nn` was set)
`velocity_pseudotime_sub`	Subset-local pseudotime

[21]:

# Inspect per-cell scores
obs_cols = [c for c in scorer.adata_sub.obs.columns if c.startswith("cs_")]
print("Per-cell score columns:", obs_cols)
scorer.adata_sub.obs[obs_cols].head(10)

Per-cell score columns: ['cs_Alpha', 'cs_Beta', 'cs_Delta', 'cs_Epsilon', 'cs_dominant_fate', 'cs_entropy', 'cs_nn_entropy']

[21]:

	cs_Alpha	cs_Beta	cs_Delta	cs_Epsilon	cs_dominant_fate	cs_entropy	cs_nn_entropy
index
AAACCTGAGAGGGATA	0.345656	0.252071	0.154299	0.247973	Alpha	0.972886	0.992053
AAACCTGAGGCAATTA	0.247055	0.296687	0.252668	0.203590	Beta	0.993692	0.992740
AAACCTGTCCCTCTTT	0.324502	0.244887	0.175503	0.255107	Alpha	0.983665	0.958981
AAACGGGAGTAGCGGT	0.361040	0.324278	0.138481	0.176201	Alpha	0.946906	0.977282
AAACGGGCAAAGAATC	0.274640	0.476842	0.224002	0.024516	Beta	0.818077	0.875318
AAACGGGGTACAGTTC	0.251495	0.463168	0.247236	0.038101	Beta	0.846585	0.884746
AAACGGGGTCGGGTCT	0.268075	0.221469	0.232088	0.278367	Epsilon	0.996724	0.998418
AAACGGGTCCGCGCAA	0.328827	0.381496	0.170364	0.119313	Beta	0.929480	0.988123
AAACGGGTCGCATGGC	0.272847	0.479499	0.225780	0.021874	Beta	0.812556	0.817800
AAAGATGAGAGTACCG	0.213046	0.270671	0.286844	0.229439	Delta	0.994829	0.992597

[22]:

# Distribution of dominant fate assignments
print(scorer.adata_sub.obs["cs_dominant_fate"].value_counts())

cs_dominant_fate
Beta       836
Alpha      604
Delta      246
Epsilon    190
Name: count, dtype: int64

[23]:

# Entropy distribution per fate
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, len(result.fate_names), figsize=(12, 3), sharey=True)
for ax, fate in zip(axes, result.fate_names):
    mask = scorer.adata_sub.obs["cs_dominant_fate"] == fate
    ax.hist(scorer.adata_sub.obs.loc[mask, "cs_entropy"], bins=20, color="steelblue", edgecolor="white")
    ax.set_title(fate)
    ax.set_xlabel("Entropy")
axes[0].set_ylabel("# cells")
plt.suptitle("Per-cell commitment entropy by dominant fate", y=1.02)
plt.tight_layout()
plt.show()

9. Transfer labels to full adata¶

transfer_labels() writes per-cell scores from adata_sub back to the full adata, so you can use them in UMAP plots, downstream analyses, or integration with other tools.

Cells not in the embedding subset receive NaN (numeric) or 'unassigned' (categorical).

[24]:

scorer.transfer_labels(adata, result, prefix="cs_")

# Columns added to adata.obs
cs_cols = [c for c in adata.obs.columns if c.startswith("cs_")]
print("Columns added to adata.obs:", cs_cols)
adata.obs[cs_cols].head()

[scCS] Labels transferred to adata.obs for 1876 / 3696 cells. Columns: ['cs_Alpha', 'cs_Beta', 'cs_Delta', 'cs_Epsilon', 'cs_dominant_fate', 'cs_entropy']
Columns added to adata.obs: ['cs_Alpha', 'cs_Beta', 'cs_Delta', 'cs_Epsilon', 'cs_dominant_fate', 'cs_entropy', 'cs_nn_entropy']

[24]:

	cs_Alpha	cs_Beta	cs_Delta	cs_Epsilon	cs_dominant_fate	cs_entropy	cs_nn_entropy
index
AAACCTGAGAGGGATA	0.345656	0.252071	0.154299	0.247973	Alpha	0.972886	0.992053
AAACCTGAGCCTTGAT	NaN	NaN	NaN	NaN	unassigned	NaN	NaN
AAACCTGAGGCAATTA	0.247055	0.296687	0.252668	0.203590	Beta	0.993692	0.992740
AAACCTGCATCATCCC	NaN	NaN	NaN	NaN	unassigned	NaN	NaN
AAACCTGGTAAGTGGC	NaN	NaN	NaN	NaN	unassigned	NaN	NaN

[25]:

# Visualize on UMAP
sc.pl.umap(adata, color=["cs_dominant_fate", "cs_entropy", "cs_Alpha", "cs_Beta"],
           ncols=2, wspace=0.4)

10. Subset scoring¶

score_per_subset() computes commitment scores separately for each value of a metadata column in adata_sub.obs. Useful for comparing commitment across time points, batches, or any categorical variable.

Note: The mask is applied to adata_sub (the embedding subset), not the full adata.

Note on ``inf`` in pairwise nCS: When a subset contains only progenitor cells (e.g., ‘Pre-endocrine’) with no cells from any fate arm, pairwise_nCS will be inf for all off-diagonal entries. This is mathematically correct — nCS is undefined when a fate arm has 0 cells. A UserWarning is emitted automatically. Progenitor-only subsets are still useful for inspecting the commitment vector and population entropy.

[26]:

# Score separately for each cluster within the embedding
# (Here we use 'clusters' as a demo; in practice use a condition/batch column)
subset_results = scorer.score_per_subset(
    split_by="clusters",
    cell_level=False,
    n_bootstrap=0,
    verbose=False,
)

# Tidy summary: one row per subset with all off-diagonal nCS values + status
def _subset_summary(subset_results):
    rows = []
    for subset_name, res in subset_results.items():
        n = len(res.fate_names)
        ncs = res.pairwise_nCS
        row = {"subset": subset_name, "n_cells_in_fate": int(res.n_cells_per_fate.sum())}
        for i in range(n):
            for j in range(i + 1, n):
                col = f"nCS[{res.fate_names[i]} vs {res.fate_names[j]}]"
                row[col] = float(ncs[i, j])
        # Status flag: any inf?
        off_diag = [ncs[i, j] for i in range(n) for j in range(n) if i != j]
        if all(np.isinf(v) for v in off_diag):
            row["status"] = "progenitor-only (inf)"
        elif any(np.isinf(v) for v in off_diag):
            row["status"] = "partial (some inf)"
        else:
            row["status"] = "ok"
        rows.append(row)
    return pd.DataFrame(rows)

summary_df = _subset_summary(subset_results)
summary_df

[26]:

	subset	n_cells_in_fate	nCS[Alpha vs Beta]	nCS[Alpha vs Delta]	nCS[Alpha vs Epsilon]	nCS[Beta vs Delta]	nCS[Beta vs Epsilon]	nCS[Delta vs Epsilon]	status
0	Pre-endocrine	0	inf	inf	inf	inf	inf	inf	progenitor-only (inf)
1	Alpha	481	0.0	0.0	0.0	inf	inf	inf	partial (some inf)
2	Delta	70	inf	inf	inf	inf	inf	0.0	partial (some inf)
3	Beta	591	inf	inf	inf	0.0	0.0	inf	partial (some inf)
4	Epsilon	142	inf	inf	inf	inf	inf	inf	partial (some inf)

[27]:

# Visualize subset comparison
fig = scorer.plot_subset_comparison(subset_results, figsize=(10, 4))
plt.tight_layout()
plt.show()

11. Driver genes¶

scCS provides three complementary approaches to identify fate-driving genes:

Velocity drivers (get_velocity_drivers) — genes with highest mean RNA velocity in each fate arm (high positive velocity = gene is being upregulated along that fate)
DEG drivers — differentially expressed genes between each fate arm and the bifurcation cluster (Wilcoxon rank-sum test)
Velocity-fate correlation drivers (get_velocity_fate_drivers) — Spearman correlation between each gene’s velocity and per-cell fate affinity. Stronger signal than mean velocity because it filters out genes that are fast everywhere.

[28]:

# Velocity-based driver genes (top 20 per fate)
vel_drivers = scorer.get_velocity_drivers(n_top_genes=20)

# Compact long-form table: top 5 per fate, one DataFrame
print("Top velocity drivers (top 5 per fate):")
_stack_drivers(vel_drivers, n_per_fate=5, cols=["gene", "mean_velocity", "rank"])


── Velocity drivers: Alpha (top 20, sorted by delta_velocity) ──
 rank     gene  delta_velocity  mean_velocity
    1      Clu        1.488299       1.532060
    2     Cpa2        1.242302       0.851096
    3   Cryba2        1.057625       1.217162
    4     Ghrl        0.986426       0.384223
    5    Meis2        0.821635       0.665699
    6     Pax6        0.726139       0.883325
    7   Npepl1        0.725493       0.182687
    8   Tm4sf4        0.668253       0.466062
    9     Vasp        0.611339       0.097546
   10     Pax4        0.603224       0.034867
   11   Btbd17        0.577881       0.930656
   12     Krt7        0.547158      -0.280244
   13    Cd200        0.535196       0.330072
   14    Celf3        0.534336       0.370460
   15     Tox3        0.497146       0.251629
   16     Pdx1        0.496890       0.392475
   17   Ppp3ca        0.487356       0.200745
   18      Hn1        0.468364       0.031799
   19 BC023829        0.455840       0.336861
   20     Cpt2        0.432719       0.246379

── Velocity drivers: Beta (top 20, sorted by delta_velocity) ──
 rank          gene  delta_velocity  mean_velocity
    1          Ins2        3.672153       3.689225
    2          Nnat        3.631306       4.051527
    3          Isl1        3.562601       1.070217
    4           Clu        1.605093       1.648854
    5 1700086L19Rik        1.411600       0.453118
    6          Krt7        1.125878       0.298475
    7          Pax4        0.901425       0.333068
    8       Ppp1r1a        0.861976       1.539815
    9        Tm4sf4        0.842035       0.639844
   10       Igfbpl1        0.771356       0.268164
   11          Cpa2        0.714384       0.323178
   12          Pax6        0.697922       0.855108
   13          Dbn1        0.615245       0.314196
   14          Pdx1        0.535203       0.430788
   15         Meis2        0.518604       0.362668
   16      BC023829        0.514133       0.395154
   17         Cldn6        0.508870      -0.301879
   18         Glud1        0.481521       0.737995
   19       Smarcd2        0.460625       0.598269
   20       Lurap1l        0.456940       0.285098

── Velocity drivers: Delta (top 20, sorted by delta_velocity) ──
 rank     gene  delta_velocity  mean_velocity
    1     Cpa2        2.483908       2.092702
    2     Krt8        2.349361       1.407045
    3   Maged2        2.340670       3.380898
    4   Cryba2        1.744062       1.903599
    5     Ldha        1.710826       1.581080
    6      Hn1        1.123444       0.686879
    7   Cdkn1c        1.072009       1.007896
    8     Nnat        1.059840       1.480061
    9  Ppp1r1a        1.017116       1.694955
   10     Ambp        0.908994       0.752210
   11     Gnas        0.861592       6.604826
   12     Pax6        0.851152       1.008338
   13  Akr1c19        0.804932       1.123999
   14    Sparc        0.687036       0.649847
   15    Meis2        0.684853       0.528917
   16    Rpl12        0.670421      -0.154604
   17    Celf3        0.637237       0.473361
   18 BC023829        0.635596       0.516617
   19    Krt18        0.603725      -0.206350
   20    Foxa3        0.588107       0.411777

── Velocity drivers: Epsilon (top 20, sorted by delta_velocity) ──
 rank     gene  delta_velocity  mean_velocity
    1     Cpa2        1.345101       0.953895
    2   Cryba2        1.310028       1.469565
    3      Clu        1.293649       1.337410
    4    Meis2        1.209729       1.053792
    5     Pax4        0.843907       0.275551
    6    Syt13        0.799221       0.825314
    7    Siva1        0.731447       0.615843
    8     Pax6        0.731163       0.888349
    9  Lurap1l        0.654209       0.482367
   10  Arl6ip1        0.602216      -0.050500
   11   Nudt19        0.557069       0.466408
   12   Npepl1        0.499464      -0.043343
   13     Pdx1        0.488036       0.383621
   14    Rgs17        0.487121       0.559517
   15     Gars        0.450410       0.620145
   16     Cpt2        0.450140       0.263800
   17   Cdkn1c        0.401330       0.337217
   18 Ppp1r14b        0.393609      -0.294763
   19     Cdk1        0.393504       0.399177
   20      Tkt        0.385895       0.397531
Top velocity drivers (top 5 per fate):

[28]:

	fate	gene	mean_velocity	rank
0	Alpha	Clu	1.532060	1
1	Alpha	Cpa2	0.851096	2
2	Alpha	Cryba2	1.217162	3
3	Alpha	Ghrl	0.384223	4
4	Alpha	Meis2	0.665699	5
5	Beta	Ins2	3.689225	1
6	Beta	Nnat	4.051527	2
7	Beta	Isl1	1.070217	3
8	Beta	Clu	1.648854	4
9	Beta	1700086L19Rik	0.453118	5
10	Delta	Cpa2	2.092702	1
11	Delta	Krt8	1.407045	2
12	Delta	Maged2	3.380898	3
13	Delta	Cryba2	1.903599	4
14	Delta	Ldha	1.581080	5
15	Epsilon	Cpa2	0.953895	1
16	Epsilon	Cryba2	1.469565	2
17	Epsilon	Clu	1.337410	3
18	Epsilon	Meis2	1.053792	4
19	Epsilon	Pax4	0.275551	5

[29]:

# DEG-based driver genes (top 20 per fate, vs the bifurcation cluster)
deg_drivers = scorer.get_deg_drivers(
    n_top_genes=20,
    pval_threshold=0.05,
    logfc_threshold=0.25,
)

print("Top DEG drivers (top 5 per fate):")
_stack_drivers(
    deg_drivers, n_per_fate=5,
    cols=["gene", "logfoldchange", "pval_adj", "pct_fate", "pct_progenitor"],
)


── DEG drivers: Alpha vs progenitor ──
   Significant: 761  (up: 376, down: 385)
   gene  logfoldchange      pval_adj  pct_fate  pct_progenitor
    Pyy     209.768677  1.181761e-93  0.968815             NaN
    Gcg     209.614731  9.196179e-64  0.656965             NaN
   Iapp      99.370232  4.065913e-99  0.854470             NaN
    Ttr      67.718864 2.617159e-103  1.000000             NaN
   Gnas      34.805965  2.937512e-52  1.000000             NaN
   Rbp4      33.815922  7.031753e-50  0.962578             NaN
  Hpgds      26.723766  3.763407e-02  0.087318             NaN
    Tnr      26.722797  2.795890e-02  0.091476             NaN
 Tmem27      17.364475 1.123328e-141  0.981289             NaN
Slc38a5      15.393972  4.940813e-60  0.920998             NaN
 Pcsk1n      12.127839  4.762578e-59  0.997921             NaN
   Gast       9.842274  1.419605e-32  0.538462             NaN
    Cpe       7.672148  1.079724e-40  1.000000             NaN
  Pcsk2       7.646369  6.491126e-63  0.846154             NaN
   Gpx3       7.507235  1.191563e-82  0.937630             NaN
  Meis2       6.603501  2.217035e-78  0.970894             NaN
  Peg10       6.532975  2.869438e-65  0.700624             NaN
   Ssr2       6.296807  1.585608e-69  0.985447             NaN
    Ppy       6.134921  1.251811e-15  0.378378             NaN
Tmsb15l       5.803368  2.070066e-73  0.711019             NaN

── DEG drivers: Beta vs progenitor ──
   Significant: 830  (up: 422, down: 408)
   gene  logfoldchange      pval_adj  pct_fate  pct_progenitor
   Iapp     185.345184 6.106240e-152  0.944162             NaN
    Pyy     139.123322  1.052978e-85  0.962775             NaN
   Ins1      99.108521  3.512617e-47  0.582064             NaN
   Ins2      47.560135  3.515790e-67  0.610829             NaN
   Nnat      42.420864 2.029962e-103  0.825719             NaN
   Rbp4      41.067871 6.750219e-103  0.996616             NaN
  P2ry1      27.845049  4.690784e-06  0.164129             NaN
   Fmo1      26.452681  3.894453e-02  0.081218             NaN
 Plppr1      26.415466  3.894453e-02  0.081218             NaN
   Gnas      26.366089  3.362044e-60  1.000000             NaN
    Ttr      22.955372  6.047775e-52  0.994924             NaN
   Dlk1      12.507628  4.599450e-97  0.966159             NaN
 Sec61b      12.047029 6.698179e-114  1.000000             NaN
    Npy      12.042974  1.086211e-07  0.191201             NaN
Ppp1r1a      11.322408  1.013795e-94  0.712352             NaN
  Pcsk2       9.181964 2.013853e-127  0.984772             NaN
   Calr       8.696516  3.976336e-95  0.959391             NaN
  Rpl10       8.650661  6.788600e-35  1.000000             NaN
  Hspa5       8.530251 2.027799e-109  0.969543             NaN
 Sdf2l1       7.616785  3.683485e-85  0.747885             NaN

── DEG drivers: Delta vs progenitor ──
   Significant: 386  (up: 156, down: 230)
  gene  logfoldchange     pval_adj  pct_fate  pct_progenitor
   Pyy     484.160583 5.419924e-34  1.000000             NaN
   Sst     332.010773 9.905999e-30  0.871429             NaN
  Rbp4     194.580841 2.024127e-37  1.000000             NaN
  Iapp     102.586014 2.319858e-35  0.985714             NaN
Ptprz1      28.902637 3.369656e-03  0.257143             NaN
 Cd24a      20.200207 1.633998e-31  1.000000             NaN
   Ppy      18.444256 6.494096e-07  0.485714             NaN
  Hhex      16.970627 6.566081e-31  0.985714             NaN
  Dlk1      16.967428 4.469675e-28  1.000000             NaN
 Rpl10      13.510839 4.188939e-13  1.000000             NaN
  Ssr2      10.061992 2.442933e-24  1.000000             NaN
  Gpx3       9.890513 3.689698e-25  0.985714             NaN
  Isl1       9.400220 3.580594e-12  1.000000             NaN
  Hadh       9.289947 7.724100e-31  0.971429             NaN
  Arg1       8.830122 9.905999e-30  0.900000             NaN
 Pcsk2       8.306222 1.427407e-20  0.928571             NaN
   Ttr       7.740336 4.406957e-03  0.971429             NaN
Spock3       7.235566 4.338918e-05  0.342857             NaN
  Mest       6.525658 6.399791e-18  0.714286             NaN
Igfbp7       6.383225 4.517170e-17  0.685714             NaN

── DEG drivers: Epsilon vs progenitor ──
   Significant: 518  (up: 266, down: 252)
   gene  logfoldchange     pval_adj  pct_fate  pct_progenitor
   Ghrl     495.486511 3.376669e-72  1.000000             NaN
    Pyy      81.501137 3.980708e-18  0.873239             NaN
   Rbp4      74.675728 6.843394e-44  0.992958             NaN
 Lrpprc      22.582724 8.565872e-16  0.873239             NaN
 Cdkn1a      21.258612 4.533111e-15  0.887324             NaN
    Gcg      20.178659 1.772858e-04  0.338028             NaN
   Iapp      18.956543 1.029570e-23  0.760563             NaN
   Isl1      18.905531 3.368229e-35  0.992958             NaN
    Ttr      16.562450 4.734949e-04  0.929577             NaN
 Maged2      15.220592 1.711104e-41  0.943662             NaN
  Hspa5       9.455750 8.303204e-39  0.957746             NaN
 Tmem27       8.894045 2.939794e-24  0.739437             NaN
   Calr       8.733923 2.532672e-39  0.950704             NaN
   Ssr2       7.860883 2.188520e-39  0.992958             NaN
   Arg1       7.192320 3.382770e-34  0.732394             NaN
  Rpl10       7.168509 2.855208e-09  1.000000             NaN
Slc38a5       7.034479 3.542731e-11  0.838028             NaN
  Ccnd2       6.007015 4.025146e-26  0.661972             NaN
  Acsl1       5.986801 1.903084e-05  0.267606             NaN
  Anpep       5.644922 5.435417e-36  0.802817             NaN
Top DEG drivers (top 5 per fate):

[29]:

	fate	gene	logfoldchange	pval_adj	pct_fate	pct_progenitor
0	Alpha	Pyy	209.768677	1.181761e-93	0.968815	NaN
1	Alpha	Gcg	209.614731	9.196179e-64	0.656965	NaN
2	Alpha	Iapp	99.370232	4.065913e-99	0.854470	NaN
3	Alpha	Ttr	67.718864	2.617159e-103	1.000000	NaN
4	Alpha	Gnas	34.805965	2.937512e-52	1.000000	NaN
5	Beta	Iapp	185.345184	6.106240e-152	0.944162	NaN
6	Beta	Pyy	139.123322	1.052978e-85	0.962775	NaN
7	Beta	Ins1	99.108521	3.512617e-47	0.582064	NaN
8	Beta	Ins2	47.560135	3.515790e-67	0.610829	NaN
9	Beta	Nnat	42.420864	2.029962e-103	0.825719	NaN
10	Delta	Pyy	484.160583	5.419924e-34	1.000000	NaN
11	Delta	Sst	332.010773	9.905999e-30	0.871429	NaN
12	Delta	Rbp4	194.580841	2.024127e-37	1.000000	NaN
13	Delta	Iapp	102.586014	2.319858e-35	0.985714	NaN
14	Delta	Ptprz1	28.902637	3.369656e-03	0.257143	NaN
15	Epsilon	Ghrl	495.486511	3.376669e-72	1.000000	NaN
16	Epsilon	Pyy	81.501137	3.980708e-18	0.873239	NaN
17	Epsilon	Rbp4	74.675728	6.843394e-44	0.992958	NaN
18	Epsilon	Tnr	26.569187	2.522357e-01	0.091549	NaN
19	Epsilon	Hpgds	26.106819	4.514074e-01	0.070423	NaN

11.3 Velocity-fate correlation drivers¶

get_velocity_fate_drivers() computes the Spearman correlation between each gene’s velocity and the per-cell fate affinity score for each fate arm. Genes with high positive correlation are being upregulated specifically as cells commit to that fate — a stronger signal than mean velocity alone, because it filters out genes that are fast everywhere.

Requires score(cell_level=True) to have been called first (needs result.cell_scores).

[30]:

# Velocity-fate correlation drivers
# Correlates each gene's velocity with per-cell fate affinity scores
# (requires cell_level=True in score())
try:
    vf_drivers = scorer.get_velocity_fate_drivers(
        result,
        n_top_genes=20,
        pval_threshold=0.05,
    )
    print("Top velocity-fate drivers (top 5 per fate):")
    display(_stack_drivers(
        vf_drivers, n_per_fate=5,
        cols=["gene", "spearman_r", "pval_adj", "mean_velocity", "delta_velocity"],
    ))
except ValueError as e:
    print(f"Skipped: {e}")
    print("Tip: call scorer.score(cell_level=True) first.")


── Velocity-fate drivers: Alpha (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 683 / 2000
 rank          gene  spearman_r     pval_adj  mean_velocity
    1        Adam23    0.332275 2.726568e-46       0.035280
    2           Klb    0.309560 6.025260e-40       0.045717
    3       Tmem206    0.265787 4.237692e-29       0.099139
    4        Notch2    0.257026 2.741812e-27       0.073804
    5        Sorcs2    0.235934 5.877863e-23       0.037468
    6       Rasgrf2    0.235511 6.661376e-23       0.025164
    7         Epha4    0.235026 7.813431e-23       0.052760
    8        Megf11    0.228079 1.824037e-21       0.007886
    9          Sgce    0.224234 9.184396e-21       0.052389
   10         Kif2c    0.223713 1.097454e-20       0.083291
   11         Sel1l    0.223152 1.274883e-20       0.055911
   12          Bst2    0.222681 1.434448e-20       0.190410
   13          Ctsb    0.219757 4.981190e-20      -0.068126
   14        Gabrb3    0.217725 1.071893e-19       0.055429
   15        Tmem51    0.214352 4.378911e-19       0.109053
   16       Fam234a    0.214279 4.378911e-19       0.044228
   17 1700011H14Rik    0.212987 7.086263e-19       0.285092
   18          Gch1    0.211215 1.369510e-18       0.472799
   19         Cdh11    0.211206 1.369510e-18       0.021027
   20       Zcchc16    0.210129 2.081817e-18       0.022147

── Velocity-fate drivers: Beta (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 924 / 2000
 rank    gene  spearman_r      pval_adj  mean_velocity
    1   Gng12    0.687204 4.410440e-259       0.364720
    2    Nnat    0.675808 1.430221e-247       4.051527
    3    Sdk2    0.571355 3.286960e-160       0.011898
    4   Dact2    0.559535 2.499373e-152       0.045325
    5   Ap1s2    0.549641 5.731258e-146       0.087439
    6   Vat1l    0.548981 1.264673e-145       0.064852
    7  Sphkap    0.541680 4.498258e-141       0.119721
    8    Mxd4    0.515216 2.594995e-125      -0.069785
    9  Slc2a2    0.511380 2.850628e-123       0.166863
   10    Nol4    0.506439 1.524017e-120       0.096487
   11 Shroom3    0.501075 1.259699e-117      -0.008355
   12  Zbtb7c    0.497687 7.505370e-116       0.024958
   13    Ins2    0.488525 5.173891e-111       3.689225
   14 Igfbpl1    0.474913 3.992365e-104       0.268164
   15 Slc7a14    0.461875  7.903311e-98       0.072172
   16  Ptprz1    0.459658  8.304841e-97       0.021035
   17   Ccnd2    0.458424  3.072975e-96       0.365967
   18      Gc    0.455471  7.272239e-95       0.137919
   19     Ghr    0.454304  2.464647e-94       0.248989
   20 Slc30a8    0.449188  5.200257e-92       0.450795

── Velocity-fate drivers: Delta (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 685 / 2000
 rank     gene  spearman_r     pval_adj  mean_velocity
    1   Zbtb20    0.303609 1.783641e-38       0.166360
    2   Fndc3a    0.293026 9.165943e-36      -0.005701
    3    Pbld2    0.258827 1.428849e-27       0.038566
    4    Vat1l    0.252901 2.293744e-26       0.065950
    5   Trim35    0.250792 5.950762e-26       0.184455
    6     Hap1    0.248696 1.537300e-25       0.031227
    7      Arx    0.245404 7.175120e-25      -0.018569
    8     Asb4    0.236969 3.565400e-23      -0.003783
    9   Vps37a    0.227608 2.126823e-21      -0.083059
   10     Chd3    0.223959 9.343843e-21      -0.079594
   11   Snap25    0.222951 1.330732e-20       0.048876
   12     Mia3    0.221878 2.046525e-20       0.047561
   13     Ufm1    0.217722 1.073379e-19      -0.142988
   14  Stard10    0.214950 3.289204e-19      -0.088858
   15 BC048546    0.214442 3.949377e-19      -0.140049
   16     Pon2    0.208136 4.756439e-18      -0.013737
   17     Tmx1    0.203896 2.536844e-17      -0.220162
   18      Cpm    0.202291 4.706904e-17       0.038128
   19     Gcsh    0.201279 6.868679e-17      -0.086339
   20     Hagh    0.198569 1.954275e-16       0.038047

── Velocity-fate drivers: Epsilon (top 20, sorted by Spearman r) ──
   Significant (FDR < 0.05): 924 / 2000
 rank    gene  spearman_r      pval_adj  mean_velocity
    1  Rab27a    0.514252 8.184805e-125       0.055813
    2    Pdk2    0.512925 4.186366e-124       0.051594
    3  Sorcs1    0.499450 8.951898e-117       0.005206
    4  Il17re    0.485971 1.042002e-109      -0.002558
    5   Auts2    0.484087 9.271443e-109       0.256760
    6   Pycr2    0.474232 8.299458e-104      -0.452137
    7  Nusap1    0.467321 2.024908e-100       0.014434
    8    Mnx1    0.465603  1.321285e-99      -0.004751
    9    Chga    0.460968  2.055227e-97      -2.422998
   10  Npepl1    0.449508  3.960569e-92      -0.043343
   11   Gm609    0.449416  4.222258e-92       0.024659
   12 Efcab10    0.448844  7.240713e-92      -0.045064
   13   Gsto1    0.445875  1.588429e-90      -0.086271
   14   Igf1r    0.445424  2.468092e-90      -0.019749
   15  Eps8l2    0.442219  6.679705e-89       0.126253
   16   Kcnb2    0.441447  1.439386e-88       0.029580
   17 Cbfa2t2    0.432747  9.568963e-85       0.159800
   18 Fam213a    0.432126  1.687532e-84       0.043521
   19   Josd2    0.428882  4.017044e-83       0.046612
   20  Kif20a    0.428003  9.327305e-83       0.081612
Top velocity-fate drivers (top 5 per fate):

	fate	gene	spearman_r	pval_adj	mean_velocity	delta_velocity
0	Alpha	Adam23	0.332275	2.726568e-46	0.035280	0.026601
1	Alpha	Klb	0.309560	6.025260e-40	0.045717	0.037019
2	Alpha	Tmem206	0.265787	4.237692e-29	0.099139	0.149552
3	Alpha	Notch2	0.257026	2.741812e-27	0.073804	0.069658
4	Alpha	Sorcs2	0.235934	5.877863e-23	0.037468	0.026698
5	Beta	Gng12	0.687204	4.410440e-259	0.364720	0.065286
6	Beta	Nnat	0.675808	1.430221e-247	4.051527	3.631306
7	Beta	Sdk2	0.571355	3.286960e-160	0.011898	0.011638
8	Beta	Dact2	0.559535	2.499373e-152	0.045325	0.018165
9	Beta	Ap1s2	0.549641	5.731258e-146	0.087439	0.024644
10	Delta	Zbtb20	0.303609	1.783641e-38	0.166360	-0.036171
11	Delta	Fndc3a	0.293026	9.165943e-36	-0.005701	-0.049914
12	Delta	Pbld2	0.258827	1.428849e-27	0.038566	0.025424
13	Delta	Vat1l	0.252901	2.293744e-26	0.065950	0.013837
14	Delta	Trim35	0.250792	5.950762e-26	0.184455	0.063726
15	Epsilon	Rab27a	0.514252	8.184805e-125	0.055813	0.007661
16	Epsilon	Pdk2	0.512925	4.186366e-124	0.051594	0.072639
17	Epsilon	Sorcs1	0.499450	8.951898e-117	0.005206	0.004127
18	Epsilon	Il17re	0.485971	1.042002e-109	-0.002558	-0.027604
19	Epsilon	Auts2	0.484087	9.271443e-109	0.256760	0.302319

12. Pathway enrichment¶

get_enrichment() runs over-representation analysis (ORA) on the DEG driver genes for each fate using gseapy (KEGG, GO Biological Process, Reactome).

Requires an internet connection and a valid organism code.

[31]:

# Pathway enrichment per fate
# Note: requires gseapy and internet access for Enrichr API
# Results will be empty with synthetic data (too few significant DEGs)

try:
    enrichment = scorer.get_enrichment(
        deg_drivers,
        organism="mouse",     # "mouse" or "human"
        pval_threshold=0.05,
        logfc_threshold=0.25,
        plot=True,            # dot plots saved per fate
        n_top_pathways=15,
    )

    # Single compact summary: top 3 pathways per (fate, direction)
    enrich_df = _stack_enrichment(enrichment, n_per_group=3)
    if not enrich_df.empty:
        display(enrich_df[["fate", "direction", "Gene_set", "Term", "Overlap", "Adjusted P-value"]])
    else:
        print("(No enriched terms — typical with small / synthetic datasets.)")

except ImportError:
    print("gseapy not installed. Run: pip install gseapy")
except Exception as e:
    print(f"Enrichment skipped: {e}")
    print("(Expected with synthetic data — use real data for meaningful results.)")


============================================================
  Pathway enrichment: Alpha
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 376
  Down-regulated genes: 385
============================================================

  [up] Significant terms: 121
       Gene_set                                                       Term Overlap  Adjusted P-value
KEGG_2019_Mouse                Protein processing in endoplasmic reticulum  17/163          0.000003
KEGG_2019_Mouse                                  Thyroid hormone synthesis   10/73          0.000127
KEGG_2019_Mouse                                                   Lysosome  12/124          0.000313
KEGG_2019_Mouse                                     Gastric acid secretion    9/74          0.000600
  Reactome_2022   Response To Elevated Platelet Cytosolic Ca2+ R-HSA-76005  13/130          0.000783
KEGG_2019_Mouse                                      Circadian entrainment   10/99          0.000825
  Reactome_2022                        Platelet Degranulation R-HSA-114608  12/125          0.000966
  Reactome_2022                                    Hemostasis R-HSA-109582  28/576          0.000966
  Reactome_2022                     G-protein Mediated Events R-HSA-112040    8/52          0.000966
  Reactome_2022 Platelet Activation, Signaling And Aggregation R-HSA-76002  17/254          0.001008

  [down] Significant terms: 252
                  Gene_set                                                                                  Term Overlap  Adjusted P-value
GO_Biological_Process_2021                            cotranslational protein targeting to membrane (GO:0006613)   24/94      1.627570e-17
GO_Biological_Process_2021                                                  cytoplasmic translation (GO:0002181)   24/93      1.627570e-17
GO_Biological_Process_2021              SRP-dependent cotranslational protein targeting to membrane (GO:0006614)   23/90      6.790532e-17
             Reactome_2022                                        Eukaryotic Translation Elongation R-HSA-156842   22/90      5.465181e-16
             Reactome_2022                                        Eukaryotic Translation Termination R-HSA-72764   22/90      5.465181e-16
             Reactome_2022                                                 Peptide Chain Elongation R-HSA-156902   22/86      5.465181e-16
             Reactome_2022 Nonsense Mediated Decay (NMD) Independent Of Exon Junction Complex (EJC) R-HSA-975956   22/92      6.859847e-16
GO_Biological_Process_2021      nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184)  24/113      8.686504e-16
GO_Biological_Process_2021                                                  protein targeting to ER (GO:0045047)  23/103      1.101897e-15
             Reactome_2022                                                   Viral mRNA Translation R-HSA-192823   21/90      4.696474e-15


============================================================
  Pathway enrichment: Beta
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 422
  Down-regulated genes: 408
============================================================

  [up] Significant terms: 121
                  Gene_set                                                     Term Overlap  Adjusted P-value
           KEGG_2019_Mouse              Protein processing in endoplasmic reticulum  23/163      1.407868e-10
GO_Biological_Process_2021     IRE1-mediated unfolded protein response (GO:0036498)   12/53      1.863342e-06
           KEGG_2019_Mouse                                        Insulin secretion   13/86      3.415612e-06
             Reactome_2022 Amino Acid Transport Across Plasma Membrane R-HSA-352230    9/33      1.321921e-05
             Reactome_2022              IRE1alpha Activates Chaperones R-HSA-381070   10/48      1.321921e-05
             Reactome_2022             Unfolded Protein Response (UPR) R-HSA-381119   13/89      1.321921e-05
             Reactome_2022            Integration Of Energy Metabolism R-HSA-163685  13/105      6.483513e-05
             Reactome_2022           XBP1(S) Activates Chaperone Genes R-HSA-381038    9/46      6.628341e-05
             Reactome_2022             Regulation Of Insulin Secretion R-HSA-422356   11/77      8.152971e-05
GO_Biological_Process_2021                        amino acid transport (GO:0006865)   10/50      8.704602e-05

  [down] Significant terms: 245
                  Gene_set                                                                                  Term Overlap  Adjusted P-value
GO_Biological_Process_2021                                                  cytoplasmic translation (GO:0002181)   28/93      4.333388e-22
GO_Biological_Process_2021                            cotranslational protein targeting to membrane (GO:0006613)   28/94      4.333388e-22
GO_Biological_Process_2021              SRP-dependent cotranslational protein targeting to membrane (GO:0006614)   27/90      1.679608e-21
             Reactome_2022                                                 Peptide Chain Elongation R-HSA-156902   26/86      1.148227e-20
             Reactome_2022                                        Eukaryotic Translation Elongation R-HSA-156842   26/90      1.428872e-20
             Reactome_2022                                        Eukaryotic Translation Termination R-HSA-72764   26/90      1.428872e-20
             Reactome_2022 Nonsense Mediated Decay (NMD) Independent Of Exon Junction Complex (EJC) R-HSA-975956   26/92      2.014986e-20
GO_Biological_Process_2021      nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184)  28/113      5.547475e-20
GO_Biological_Process_2021                                                  protein targeting to ER (GO:0045047)  27/103      5.547475e-20
             Reactome_2022                                                   Viral mRNA Translation R-HSA-192823   25/90      1.461345e-19


============================================================
  Pathway enrichment: Delta
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 156
  Down-regulated genes: 230
============================================================

  [up] Significant terms: 41
                  Gene_set                                                                        Term Overlap  Adjusted P-value
           KEGG_2019_Mouse                                 Protein processing in endoplasmic reticulum  14/163      6.690234e-09
           KEGG_2019_Mouse                                                   Thyroid hormone synthesis    7/73      1.429359e-04
             Reactome_2022                                         Metabolism Of Proteins R-HSA-392499 35/1890      3.776032e-04
             Reactome_2022                                 IRE1alpha Activates Chaperones R-HSA-381070    5/48      3.289099e-03
             Reactome_2022                Incretin Synthesis, Secretion, And Inactivation R-HSA-400508    4/24      3.289099e-03
             Reactome_2022   SRP-dependent Cotranslational Protein Targeting To Membrane R-HSA-1799339   7/108      3.289099e-03
           KEGG_2019_Mouse                                                              Protein export    4/28      3.697039e-03
GO_Biological_Process_2021                                regulation of insulin secretion (GO:0050796)   7/104      4.368748e-03
GO_Biological_Process_2021 posttranslational protein targeting to membrane, translocation (GO:0031204)     3/7      4.368748e-03
GO_Biological_Process_2021                                 protein N-linked glycosylation (GO:0006487)    6/67      4.368748e-03


============================================================
  Pathway enrichment: Epsilon
  Gene sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'Reactome_2022']
  Up-regulated genes  : 266
  Down-regulated genes: 252
============================================================

  [up] Significant terms: 47
                  Gene_set                                                                                      Term Overlap  Adjusted P-value
           KEGG_2019_Mouse                                               Protein processing in endoplasmic reticulum  17/163      1.392463e-08
             Reactome_2022                                                  Peptide Hormone Metabolism R-HSA-2980736   10/89      1.607304e-04
             Reactome_2022 Aberrant Regulation Of Mitotic G1/S Transition In Cancer Due To RB1 Defects R-HSA-9659787    5/17      4.275598e-04
             Reactome_2022                                               IRE1alpha Activates Chaperones R-HSA-381070    7/48      4.275598e-04
             Reactome_2022                                              Unfolded Protein Response (UPR) R-HSA-381119    9/89      4.275598e-04
           KEGG_2019_Mouse                                                                                  Lysosome  10/124      6.781878e-04
             Reactome_2022                                                       Platelet Degranulation R-HSA-114608  10/125      7.374527e-04
GO_Biological_Process_2021                                      IRE1-mediated unfolded protein response (GO:0036498)    8/53      8.563951e-04
             Reactome_2022                                  Response To Elevated Platelet Cytosolic Ca2+ R-HSA-76005  10/130      8.709228e-04
             Reactome_2022                              Incretin Synthesis, Secretion, And Inactivation R-HSA-400508    5/24      1.076303e-03

  [down] Significant terms: 361
                  Gene_set                                                                                  Term Overlap  Adjusted P-value
GO_Biological_Process_2021                            cotranslational protein targeting to membrane (GO:0006613)   27/94      1.419635e-26
GO_Biological_Process_2021                                                  cytoplasmic translation (GO:0002181)   27/93      1.419635e-26
GO_Biological_Process_2021              SRP-dependent cotranslational protein targeting to membrane (GO:0006614)   26/90      9.147683e-26
             Reactome_2022                                                 Peptide Chain Elongation R-HSA-156902   25/86      1.231498e-24
             Reactome_2022                                        Eukaryotic Translation Elongation R-HSA-156842   25/90      1.482618e-24
             Reactome_2022                                        Eukaryotic Translation Termination R-HSA-72764   25/90      1.482618e-24
GO_Biological_Process_2021      nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (GO:0000184)  27/113      1.781313e-24
             Reactome_2022 Nonsense Mediated Decay (NMD) Independent Of Exon Junction Complex (EJC) R-HSA-975956   25/92      2.059459e-24
GO_Biological_Process_2021                                                  protein targeting to ER (GO:0045047)  26/103      2.772652e-24
             Reactome_2022                                                   Viral mRNA Translation R-HSA-192823   24/90      2.428940e-23

	fate	direction	Gene_set	Term	Overlap	Adjusted P-value
0	Alpha	up	KEGG_2019_Mouse	Protein processing in endoplasmic reticulum	17/163	3.014548e-06
1	Alpha	up	KEGG_2019_Mouse	Thyroid hormone synthesis	10/73	1.267707e-04
2	Alpha	up	KEGG_2019_Mouse	Lysosome	12/124	3.133244e-04
3	Alpha	down	GO_Biological_Process_2021	cotranslational protein targeting to membrane ...	24/94	1.627570e-17
4	Alpha	down	GO_Biological_Process_2021	cytoplasmic translation (GO:0002181)	24/93	1.627570e-17
...	...	...	...	...	...	...
16	Epsilon	up	Reactome_2022	Peptide Hormone Metabolism R-HSA-2980736	10/89	1.607304e-04
17	Epsilon	up	Reactome_2022	Aberrant Regulation Of Mitotic G1/S Transition...	5/17	4.275598e-04
18	Epsilon	down	GO_Biological_Process_2021	cotranslational protein targeting to membrane ...	27/94	1.419635e-26
19	Epsilon	down	GO_Biological_Process_2021	cytoplasmic translation (GO:0002181)	27/93	1.419635e-26
20	Epsilon	down	GO_Biological_Process_2021	SRP-dependent cotranslational protein targetin...	26/90	9.147683e-26

21 rows × 6 columns

[32]:

# ── Custom gene sets ──────────────────────────────────────────────────────
# You can specify any Enrichr library:
# https://maayanlab.cloud/Enrichr/#libraries

custom_gene_sets = [
    'KEGG_2019_Mouse',
    'GO_Biological_Process_2021',
    'GO_Molecular_Function_2021',
    'Reactome_2022',
    'WikiPathway_2021_Mouse',
]

# enrichment = scorer.get_enrichment(
#     deg_drivers,
#     gene_sets=custom_gene_sets,
#     organism='mouse',
# )
print("Custom gene sets configured (commented out to avoid API call with synthetic data).")
print(f"Available sets: {custom_gene_sets}")

Custom gene sets configured (commented out to avoid API call with synthetic data).
Available sets: ['KEGG_2019_Mouse', 'GO_Biological_Process_2021', 'GO_Molecular_Function_2021', 'Reactome_2022', 'WikiPathway_2021_Mouse']

[33]:

# Export enrichment tables to CSV
try:
    scCS.export_enrichment_tables(enrichment, output_dir="enrichment_results/")
    print("Tables saved to enrichment_results/")
except Exception as e:
    print(f"Export skipped: {e}")

Saved: enrichment_results/enrichment_Alpha_up.csv
Saved: enrichment_results/enrichment_Alpha_down.csv
Saved: enrichment_results/enrichment_Beta_up.csv
Saved: enrichment_results/enrichment_Beta_down.csv
Saved: enrichment_results/enrichment_Delta_up.csv
Saved: enrichment_results/enrichment_Epsilon_up.csv
Saved: enrichment_results/enrichment_Epsilon_down.csv
Tables saved to enrichment_results/

13. Expression trends along fate arms¶

plot_expression_trends() shows how gene expression changes along each fate arm, ordered by a differentiation metric.

The x_axis parameter controls the ordering:

'affinity' (default) — per-cell fate affinity score
'pseudotime' — subset-local pseudotime
'radial_distance' — distance from origin in the star embedding

[34]:

# Pick a few marker genes
genes_of_interest = ["Ins1", "Gcg", "Sst", "Ppy"]  # Beta, Alpha, Delta, Epsilon markers

# Expression trends ordered by fate affinity
# Signature: plot_expression_trends(adata_sub, result, genes, x_axis=...)
try:
    fig = scCS.plot_expression_trends(
        scorer.adata_sub,
        result,
        genes=genes_of_interest,
        x_axis="affinity",   # or 'pseudotime' or 'radial_distance'
        figsize=(12, 8),
    )
    plt.tight_layout()
    plt.show()
except Exception as e:
    print(f"Expression trends skipped: {e}")

[35]:

# Ordered by pseudotime
try:
    fig = scCS.plot_expression_trends(
        scorer.adata_sub,
        result,
        genes=genes_of_interest,
        x_axis="pseudotime",
        figsize=(12, 8),
    )
    plt.tight_layout()
    plt.show()
except Exception as e:
    print(f"Expression trends skipped: {e}")

[36]:

obs_key = "clusters"   # <-- change to your obs column

color_map = dict(zip(
    adata.obs[obs_key].cat.categories,
    adata.uns[f"{obs_key}_colors"],
))
print(color_map)

{'Ductal': '#8fbc8f', 'Ngn3 low EP': '#f4a460', 'Ngn3 high EP': '#fdbf6f', 'Pre-endocrine': '#ff7f00', 'Beta': '#b2df8a', 'Alpha': '#1f78b4', 'Delta': '#6a3d9a', 'Epsilon': '#cab2d6'}

[37]:

for fate_name in result.fate_names:
    fig = scCS.plot_expression_trends(
        adata, result,
        genes=genes_of_interest,
        fate=fate_name,
        color_map=color_map,
        ncols=4,
    )
    fig.suptitle(f"Expression trends — {fate_name}", y=1.02)
    fig.savefig(f"expression_trends_{fate_name}.svg", bbox_inches="tight")

[38]:

# CellRank-style: cells binned by per-cell CS for a chosen fate,
# mean expression per bin plotted with LOWESS smooth.
# Requires compute_cell_level=True (Cell 2 above).

genes_of_interest = ["Ins2"]   # <-- your genes

fig = scCS.plot_expression_trends(
    adata,
    result,
    genes=genes_of_interest,
    fate=result.dominant_fate,   # or e.g. fate="Activated"
    n_bins=60,                   # bins along CS axis
    layer=None,             # or None to use adata.X
    smooth=True,
    smooth_frac=0.4,             # LOWESS bandwidth (0–1)
    color_map=color_map,
    ncols=2,
)
plt.show()

[39]:

fig = scCS.plot_expression_trends(
    adata,
    result,
    genes=genes_of_interest,
    fate=result.dominant_fate,   # or e.g. fate="Activated"
    n_bins=60,                   # bins along CS axis
    layer=None,             # or None to use adata.X
    smooth=True,
    smooth_frac=0.4,             # LOWESS bandwidth (0–1)
    color_map=color_map,
    ncols=2, x_axis="pseudotime",
)
plt.show()

Summary — scCS single-condition workflow¶

import scCS

# 1. Create scorer
scorer = scCS.SingleScorer(
    adata,
    root="Ductal",
    branches=["Alpha", "Beta", "Delta", "Epsilon"],
    obs_key="clusters",
)

# 2. Build embedding (fix arm coverage)
scorer.build_embedding(ordering_metric="pseudotime")
scorer.refit_pseudotime(scale_01=True)

# 3. Fit & score
scorer.fit()
result = scorer.score(cell_level=True, k_nn=15, n_bootstrap=500)
print(result.summary())

# 4. Visualize
scorer.plot_star(result)
scorer.plot_rose(result)
scorer.plot_pairwise_cs(result)
scorer.plot_commitment_bar(result)

# 5. Transfer labels to full adata
scorer.transfer_labels(adata, result)

# 6. Driver genes & enrichment
vel_drivers = scorer.get_velocity_drivers()
deg_drivers = scorer.get_deg_drivers()
enrichment  = scorer.get_enrichment(deg_drivers, organism="Human")

For multi-condition analysis, see the companion notebooks