Mathematical Framework ====================== This page provides full mathematical derivations for all scCS scoring metrics and statistical tests. For practical usage, see the tutorial notebooks. Radial Star Embedding --------------------- Given a progenitor cluster (root) and k terminal fate clusters (branches), scCS constructs a 2D radial star embedding where: - The progenitor population is placed at the origin. - Each fate arm extends from the origin at angle: .. math:: \alpha_j = \frac{2\pi j}{k} \quad (j = 0, 1, \ldots, k-1) for equal spacing, or at the centroid-derived angle for centroid-based sector assignment. - Cells are positioned along their assigned arm at radial distance proportional to their differentiation metric (pseudotime, CytoTRACE2, or any custom score): .. math:: r_i = m_i \cdot s where :math:`m_i \in [0, 1]` is the scaled metric value and :math:`s` is the arm scale parameter (default: 10.0). The resulting 2D coordinates are stored in ``adata.obsm['X_sccs']``. Velocity Projection ------------------- Per-cell RNA velocity vectors :math:`(v_x^{\text{orig}}, v_y^{\text{orig}})` from scVelo are projected into the star embedding coordinate system: .. math:: v_x^{\text{sccs}} = \sum_j v_j^{\text{orig}} \cdot w_{xj} .. math:: v_y^{\text{sccs}} = \sum_j v_j^{\text{orig}} \cdot w_{yj} where :math:`w_{xj}, w_{yj}` are the projection weights derived from the PCA loadings of the star embedding space. Three projection strategies are available: 1. **PCA projection** — project velocity through the same PCA loadings used to construct the star embedding. 2. **Delta expression** — compute velocity as the difference between spliced and inferred full (spliced + unspliced) expression, then project. 3. **Cosine similarity** — compute per-cell velocity direction as cosine similarity with each arm direction. Commitment Scores ----------------- **Magnitude** (Eq. 1): .. math:: \text{magnitude}_i = \sqrt{v_{x,i}^2 + v_{y,i}^2} **Angle** (Eq. 2–3): .. math:: \theta_i = \text{atan2}(v_{y,i}, v_{x,i}) \quad \in [0°, 360°) **Angular binning** (Eq. 4–6): Angles are binned into :math:`N` equal sectors of width :math:`360°/N` (default :math:`N = 36`, i.e., 10° per bin). Each cell contributes its magnitude to the corresponding bin: .. math:: M_{\text{bin}}(b) = \sum_{i: \theta_i \in \text{bin}_b} \text{magnitude}_i **Sector assignment**: Each fate arm :math:`j` is assigned a set of bins (a sector). Two methods: - **Equal sectors**: each fate gets exactly :math:`N/k` consecutive bins. - **Centroid sectors**: bins are assigned to the fate whose centroid direction is closest to the bin center. **Sector magnitude**: .. math:: M_{\text{sector}}(j) = \sum_{b \in \text{sector}_j} M_{\text{bin}}(b) **Unnormalized Commitment Score** (Eq. 8): .. math:: \text{unCS}(i, j) = \frac{M_{\text{sector}}(i)}{M_{\text{sector}}(j)} Values > 1 indicate stronger commitment to fate :math:`i` than fate :math:`j`. **Normalized Commitment Score** (Eq. 9): .. math:: \text{nCS}(i, j) = \text{unCS}(i, j) \times \frac{n_{\text{cells}}(j)}{n_{\text{cells}}(i)} This corrects for differences in population size. The manuscript values are :math:`\text{unCS}(0,1) = 9.335` and :math:`\text{nCS}(0,1) = 8.066`. **Commitment vector**: .. math:: p_j = \frac{M_{\text{sector}}(j)}{\sum_{l} M_{\text{sector}}(l)} This is a probability distribution over fates (sums to 1). Per-Cell Fate Affinity ---------------------- Per-cell fate affinity scores are computed from the cosine similarity between each cell's velocity vector and the unit direction toward each fate centroid, shifted to [0, 1]: .. math:: s_{ij} = \frac{1}{2}\left(1 + \frac{\vec{v}_i \cdot \hat{d}_j}{|\vec{v}_i|}\right) where :math:`\hat{d}_j` is the unit vector from the root centroid toward fate :math:`j`'s centroid. **Magnitude weighting**: Cells with near-zero velocity magnitude (typically progenitors at the origin) are blended toward the uniform distribution: .. math:: s_{ij}^{\text{weighted}} = w_i \cdot s_{ij} + (1 - w_i) \cdot \frac{1}{k} where :math:`w_i = 1` if :math:`\text{magnitude}_i > q_{\alpha}` (5th percentile threshold), else :math:`w_i = \text{magnitude}_i / q_{\alpha}`. **Row normalization**: Scores are normalized to sum to 1 per cell: .. math:: s_{ij}^{\text{final}} = \frac{s_{ij}^{\text{weighted}}}{\sum_l s_{il}^{\text{weighted}}} Entropy Metrics --------------- **Population entropy**: .. math:: H_{\text{pop}} = -\frac{\sum_{j=1}^{k} p_j \log(p_j)}{\log(k)} where :math:`p_j = M_{\text{sector}}(j) / \sum_l M_{\text{sector}}(l)`. Normalized to [0, 1] by dividing by :math:`\log(k)`. **Mean cell entropy** (primary metric): .. math:: H_{\text{cell}} = \frac{1}{n}\sum_{i=1}^{n} \left[-\frac{\sum_{j=1}^{k} s_{ij} \log(s_{ij})}{\log(k)}\right] Each cell's entropy is normalized by :math:`\log(k)` to [0, 1], then averaged. **Per-fate cell entropy**: For each fate :math:`j`, compute the mean binary entropy of the affinity score treated as a Bernoulli distribution: .. math:: h_j = \frac{1}{n}\sum_{i=1}^{n} H_{\text{bin}}(s_{ij}, 1 - s_{ij}) where: .. math:: H_{\text{bin}}(p, 1-p) = -\frac{p \log(p) + (1-p) \log(1-p)}{\log(2)} **NN-smoothed per-cell entropy**: For each cell :math:`i`, average the cell scores over its :math:`k_{\text{nn}}` nearest neighbors in the scCS embedding: .. math:: \bar{s}_{ij} = \frac{1}{k_{\text{nn}}} \sum_{n \in \text{NN}(i)} s_{nj} Then compute k-way Shannon entropy on the smoothed scores: .. math:: H_{\text{nn},i} = -\frac{\sum_{j=1}^{k} \bar{s}_{ij} \log(\bar{s}_{ij})}{\log(k)} Statistical Framework --------------------- Pairwise comparison (PairScorer) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Permutation test** (default for k=2 conditions): For each fate arm, test whether per-cell affinity scores differ between conditions A and B: 1. Compute observed mean difference: :math:`\Delta_0 = \bar{s}_A - \bar{s}_B` 2. For :math:`b = 1, \ldots, B` (default B=1000): - Shuffle condition labels - Recompute mean difference :math:`\Delta_b` 3. Empirical p-value: :math:`p = \frac{\#(|\Delta_b| \geq |\Delta_0|) + 1}{B + 1}` **Delta-CS with bootstrap CI**: .. math:: \Delta\text{nCS}(i,j) = \text{nCS}_A(i,j) - \text{nCS}_B(i,j) Bootstrap CI obtained by resampling cells within each condition :math:`B` times (default 500) and computing the empirical :math:`(1-\alpha)/2` and :math:`(1+\alpha)/2` quantiles. Multi-comparison (MultiScorer) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Omnibus tests**: For each fate arm, test whether per-cell affinity scores differ across ALL conditions simultaneously. *Kruskal-Wallis H test* (non-parametric, recommended): .. math:: H = \frac{12}{N(N+1)} \sum_{g=1}^{G} n_g \left(\bar{R}_g - \frac{N+1}{2}\right)^2 where :math:`N` is total cells, :math:`n_g` is cells in group :math:`g`, and :math:`\bar{R}_g` is the mean rank of group :math:`g`. *One-way ANOVA* (parametric): .. math:: F = \frac{\text{SSB} / (G-1)}{\text{SSW} / (N-G)} **Post-hoc pairwise comparisons**: Only meaningful after an omnibus test rejects :math:`H_0`. *Dunn's test* (non-parametric, recommended with Kruskal-Wallis): Uses rank-based pairwise comparisons with multiple testing correction. Implemented via ``scikit-posthocs``. *Tukey HSD* (parametric, for balanced designs): .. math:: q = \frac{\bar{X}_A - \bar{X}_B}{\sqrt{\text{MSW} / n}} *Conover-Iman test* (more powerful than Dunn, non-parametric): Uses rank-based pairwise t-statistics with multiple testing correction. Implemented via ``scikit-posthocs``. **Multiple testing correction**: *Benjamini-Hochberg FDR*: 1. Sort p-values: :math:`p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(m)}` 2. Adjusted p-value: :math:`p_{(i)}^{\text{adj}} = \min\left(\frac{m \cdot p_{(i)}}{i}, 1\right)` 3. Enforce monotonicity from right to left. *Bonferroni*: .. math:: p^{\text{adj}} = \min(m \cdot p, 1) *Holm-Bonferroni step-down*: 1. Sort p-values ascending. 2. Adjusted p-value: :math:`p_{(i)}^{\text{adj}} = \max\left(p_{(i-1)}^{\text{adj}}, \frac{(m - i + 1) \cdot p_{(i)}}{1}\right)` 3. Enforce monotonicity from left to right. Mixed-effects models ~~~~~~~~~~~~~~~~~~~~ **Linear mixed model** (PairScorer + MultiScorer): .. math:: y_{ij} = \beta_0 + \beta_1 \cdot \text{condition}_i + u_{\text{sample}(i)} + \epsilon_{ij} where :math:`y_{ij}` is the affinity score of cell :math:`i` for fate :math:`j`, :math:`u_{\text{sample}} \sim N(0, \sigma^2_u)` is the random intercept for biological replicate, and :math:`\epsilon_{ij} \sim N(0, \sigma^2)`. **Contrast testing** (MultiScorer): For a contrast between conditions A and B: .. math:: \hat{\delta} = \hat{\beta}_A - \hat{\beta}_B .. math:: \text{SE}(\hat{\delta}) = \sqrt{\text{SE}(\hat{\beta}_A)^2 + \text{SE}(\hat{\beta}_B)^2} .. math:: z = \frac{\hat{\delta}}{\text{SE}(\hat{\delta})}, \quad p = 2 \cdot \Phi(-|z|) Trajectory shift ~~~~~~~~~~~~~~~~ For each fate arm, test whether pseudotime distributions differ across conditions: *Kolmogorov-Smirnov test*: .. math:: D = \sup_x |F_A(x) - F_B(x)| *Wasserstein distance*: .. math:: W_1 = \int_0^1 |F_A^{-1}(u) - F_B^{-1}(u)| du Bootstrap CI on :math:`W_1` obtained by resampling cells within each condition.