Mathematical Framework¶
This page provides full mathematical derivations for all scCS scoring metrics and statistical tests. For practical usage, see the tutorial notebooks.
Radial Star Embedding¶
Given a progenitor cluster (root) and k terminal fate clusters (branches), scCS constructs a 2D radial star embedding where:
The progenitor population is placed at the origin.
Each fate arm extends from the origin at angle:
for equal spacing, or at the centroid-derived angle for centroid-based sector assignment.
Cells are positioned along their assigned arm at radial distance proportional to their differentiation metric (pseudotime, CytoTRACE2, or any custom score):
where \(m_i \in [0, 1]\) is the scaled metric value and \(s\) is the arm scale parameter (default: 10.0).
The resulting 2D coordinates are stored in adata.obsm['X_sccs'].
Velocity Projection¶
Per-cell RNA velocity vectors \((v_x^{\text{orig}}, v_y^{\text{orig}})\) from scVelo are projected into the star embedding coordinate system:
where \(w_{xj}, w_{yj}\) are the projection weights derived from the PCA loadings of the star embedding space.
Three projection strategies are available:
PCA projection — project velocity through the same PCA loadings used to construct the star embedding.
Delta expression — compute velocity as the difference between spliced and inferred full (spliced + unspliced) expression, then project.
Cosine similarity — compute per-cell velocity direction as cosine similarity with each arm direction.
Commitment Scores¶
Magnitude (Eq. 1):
Angle (Eq. 2–3):
Angular binning (Eq. 4–6):
Angles are binned into \(N\) equal sectors of width \(360°/N\) (default \(N = 36\), i.e., 10° per bin). Each cell contributes its magnitude to the corresponding bin:
Sector assignment:
Each fate arm \(j\) is assigned a set of bins (a sector). Two methods:
Equal sectors: each fate gets exactly \(N/k\) consecutive bins.
Centroid sectors: bins are assigned to the fate whose centroid direction is closest to the bin center.
Sector magnitude:
Unnormalized Commitment Score (Eq. 8):
Values > 1 indicate stronger commitment to fate \(i\) than fate \(j\).
Normalized Commitment Score (Eq. 9):
This corrects for differences in population size. The manuscript values are \(\text{unCS}(0,1) = 9.335\) and \(\text{nCS}(0,1) = 8.066\).
Commitment vector:
This is a probability distribution over fates (sums to 1).
Per-Cell Fate Affinity¶
Per-cell fate affinity scores are computed from the cosine similarity between each cell’s velocity vector and the unit direction toward each fate centroid, shifted to [0, 1]:
where \(\hat{d}_j\) is the unit vector from the root centroid toward fate \(j\)’s centroid.
Magnitude weighting: Cells with near-zero velocity magnitude (typically progenitors at the origin) are blended toward the uniform distribution:
where \(w_i = 1\) if \(\text{magnitude}_i > q_{\alpha}\) (5th percentile threshold), else \(w_i = \text{magnitude}_i / q_{\alpha}\).
Row normalization: Scores are normalized to sum to 1 per cell:
Entropy Metrics¶
Population entropy:
where \(p_j = M_{\text{sector}}(j) / \sum_l M_{\text{sector}}(l)\). Normalized to [0, 1] by dividing by \(\log(k)\).
Mean cell entropy (primary metric):
Each cell’s entropy is normalized by \(\log(k)\) to [0, 1], then averaged.
Per-fate cell entropy:
For each fate \(j\), compute the mean binary entropy of the affinity score treated as a Bernoulli distribution:
where:
NN-smoothed per-cell entropy:
For each cell \(i\), average the cell scores over its \(k_{\text{nn}}\) nearest neighbors in the scCS embedding:
Then compute k-way Shannon entropy on the smoothed scores:
Statistical Framework¶
Pairwise comparison (PairScorer)¶
Permutation test (default for k=2 conditions):
For each fate arm, test whether per-cell affinity scores differ between conditions A and B:
Compute observed mean difference: \(\Delta_0 = \bar{s}_A - \bar{s}_B\)
For \(b = 1, \ldots, B\) (default B=1000): - Shuffle condition labels - Recompute mean difference \(\Delta_b\)
Empirical p-value: \(p = \frac{\#(|\Delta_b| \geq |\Delta_0|) + 1}{B + 1}\)
Delta-CS with bootstrap CI:
Bootstrap CI obtained by resampling cells within each condition \(B\) times (default 500) and computing the empirical \((1-\alpha)/2\) and \((1+\alpha)/2\) quantiles.
Multi-comparison (MultiScorer)¶
Omnibus tests:
For each fate arm, test whether per-cell affinity scores differ across ALL conditions simultaneously.
Kruskal-Wallis H test (non-parametric, recommended):
where \(N\) is total cells, \(n_g\) is cells in group \(g\), and \(\bar{R}_g\) is the mean rank of group \(g\).
One-way ANOVA (parametric):
Post-hoc pairwise comparisons:
Only meaningful after an omnibus test rejects \(H_0\).
Dunn’s test (non-parametric, recommended with Kruskal-Wallis):
Uses rank-based pairwise comparisons with multiple testing correction.
Implemented via scikit-posthocs.
Tukey HSD (parametric, for balanced designs):
Conover-Iman test (more powerful than Dunn, non-parametric):
Uses rank-based pairwise t-statistics with multiple testing correction.
Implemented via scikit-posthocs.
Multiple testing correction:
Benjamini-Hochberg FDR:
Sort p-values: \(p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(m)}\)
Adjusted p-value: \(p_{(i)}^{\text{adj}} = \min\left(\frac{m \cdot p_{(i)}}{i}, 1\right)\)
Enforce monotonicity from right to left.
Bonferroni:
Holm-Bonferroni step-down:
Sort p-values ascending.
Adjusted p-value: \(p_{(i)}^{\text{adj}} = \max\left(p_{(i-1)}^{\text{adj}}, \frac{(m - i + 1) \cdot p_{(i)}}{1}\right)\)
Enforce monotonicity from left to right.
Mixed-effects models¶
Linear mixed model (PairScorer + MultiScorer):
where \(y_{ij}\) is the affinity score of cell \(i\) for fate \(j\), \(u_{\text{sample}} \sim N(0, \sigma^2_u)\) is the random intercept for biological replicate, and \(\epsilon_{ij} \sim N(0, \sigma^2)\).
Contrast testing (MultiScorer):
For a contrast between conditions A and B:
Trajectory shift¶
For each fate arm, test whether pseudotime distributions differ across conditions:
Kolmogorov-Smirnov test:
Wasserstein distance:
Bootstrap CI on \(W_1\) obtained by resampling cells within each condition.