scCS.embedding

embedding.py — Radial star embedding for scCS.

Constructs a custom 2D layout where:
  • The bifurcation cluster (progenitor) sits at the origin (0, 0).

  • Each terminal fate population occupies its own radial arm, evenly spaced at 360/k degrees around the origin.

  • Within each arm, cells are ordered along the radial axis by a differentiation metric (pseudotime, CytoTRACE2, pathway score, etc.) so that less-differentiated cells are close to the center and more-differentiated cells are at the periphery.

  • ONLY cells belonging to the bifurcation cluster or a terminal fate are included. All other populations are excluded from the embedding.

The result is stored in adata_sub.obsm[‘X_sccs’] on the returned subset AnnData, and looks like a star or sunburst when plotted — one arm per fate, radiating from the progenitor.

Velocity projection

RNA velocity vectors (from scVelo) are projected into this custom 2D space by computing the transition-probability-weighted displacement of each cell in the scCS coordinate system.

Differentiation metrics supported

  • ‘pseudotime’ : scVelo velocity_pseudotime (default)

  • ‘cytotrace’ : CytoTRACE2 score (column in adata.obs)

  • ‘custom’ : any per-cell numeric column in adata.obs

  • np.ndarray : directly supplied per-cell scores (shape n_cells,)

In all cases, higher score = more differentiated = farther from center. If the metric is inverted (e.g., CytoTRACE2 where high = less differentiated), pass invert_ordering=True.

Functions

build_star_embedding(→ anndata.AnnData)

Build the radial star embedding on a subset of adata.

project_velocity_star(→ Tuple[numpy.ndarray, ...)

Project RNA velocity into the scCS star embedding space.

run_velocity_pipeline(→ None)

Run the full scVelo RNA velocity pipeline.

compute_local_pseudotime(→ numpy.ndarray)

Recompute velocity pseudotime on the subset's induced subgraph.

scale_metric_01(→ numpy.ndarray)

Min-max scale a per-cell metric to [0, 1].

Module Contents

scCS.embedding.build_star_embedding(adata, root: str, branches: List[str], obs_key: str = 'leiden', ordering_metric: str | numpy.ndarray = 'pseudotime', invert_ordering: bool = False, arm_scale: float = 10.0, jitter: float = 0.3, seed: int = 42, arm_norm: str = 'global') anndata.AnnData[source]

Build the radial star embedding on a subset of adata.

Only cells belonging to the bifurcation cluster or a terminal fate cluster are included. All other populations are excluded entirely.

Parameters:
  • adata (AnnData) – Full dataset. Will NOT be modified.

  • root (str) – Label of the progenitor/bifurcation cluster in adata.obs[obs_key]. These cells are placed at the origin.

  • branches (list of str) – Labels of the k terminal fate populations. Each gets one radial arm.

  • obs_key (str) – Column in adata.obs with cluster labels.

  • ordering_metric (str or np.ndarray) – How to order cells along each arm: - ‘pseudotime’ : uses adata.obs[‘velocity_pseudotime’] (computed if absent) - ‘cytotrace’ : uses adata.obs[‘cytotrace2_score’] (must be pre-computed) - any str : uses adata.obs[ordering_metric] directly - np.ndarray : per-cell scores, shape (n_cells,) for the FULL adata Higher value = more differentiated = farther from center.

  • invert_ordering (bool) – If True, invert the metric so that high values map to the center (use for metrics where high = less differentiated, e.g. raw CytoTRACE2).

  • arm_scale (float) – Maximum radial distance (length of each arm).

  • jitter (float) – Gaussian noise added perpendicular to each arm to avoid overplotting.

  • seed (int) – Random seed for jitter.

  • arm_norm ({"global", "per_arm"}, default "global") –

    How to normalize the ordering metric onto the radial arms. The rescale formula (s - s_min) / (s_max - s_min) * arm_scale is only applied to fate cells (bifurcation cells sit at the origin); s_min and s_max are computed from fate cells only in both modes since v0.7.4, so the closest fate cell always maps to r 0 and the furthest to r arm_scale.

    • "global" (default, v0.7.3+): compute one (s_min, s_max) = (fate_scores.min(), fate_scores.max()) over all fate cells and apply uniformly to every arm. Arms whose cells span shorter pseudotime intervals stay visibly shorter. Preserves the relative ordering of cells across arms — if Alpha cells span a wider pseudotime range than Delta cells, the Alpha arm extends further. Biologically meaningful: arm length reflects how far each fate has differentiated from the progenitor on a shared scale.

    • "per_arm" (legacy, pre-v0.7.3 default): each arm gets its own (s_min, s_max) = (fate_mask_scores.min(), fate_mask_scores.max()) and is mapped to [0, arm_scale] independently. All arms reach the full arm_scale regardless of how compressed/extended their pseudotime range is. Provided for reproducibility of older plots.

    Changed in version 0.7.4: Both modes now compute s_min/s_max from fate cells only, instead of including bifurcation cells. This removes a visible gap between the origin and the start of each arm in v0.7.3 "global" mode.

Returns:

adata_sub – Subset containing ONLY bifurcation + terminal fate cells. Star embedding stored in adata_sub.obsm[‘X_sccs’]. Metadata stored in adata_sub.uns[‘sccs’].

Return type:

AnnData

scCS.embedding.project_velocity_star(adata_sub, adata_full=None, verbose: bool = True) Tuple[numpy.ndarray, numpy.ndarray][source]

Project RNA velocity into the scCS star embedding space.

Uses the transition probability matrix from the full (unsubsetted) adata to compute the expected displacement of each subset cell in the X_sccs coordinate system.

This is necessary because subsetting breaks the velocity/neighbor graph matrices (they retain full-dataset dimensions). We always use the full graph and restrict to subset cell indices.

Parameters:
  • adata_sub (AnnData) – Subset returned by build_star_embedding(). Must have X_sccs in obsm and a ‘sccs_parent_indices’ entry in uns (set automatically).

  • adata_full (AnnData, optional) – The original full dataset with intact velocity_graph in uns. If None, falls back to using adata_sub directly (only works if velocity_graph was computed on the subset).

Returns:

vx, vy – Velocity components in the scCS embedding. Also stored in adata_sub.obsm[‘velocity_sccs’].

Return type:

np.ndarray, shape (n_sub_cells,)

scCS.embedding.run_velocity_pipeline(adata, mode: str = 'dynamical', n_top_genes: int = 2000, n_pcs: int = 30, n_neighbors: int = 30, min_shared_counts: int = 20, verbose: bool = True) None[source]

Run the full scVelo RNA velocity pipeline.

Requires spliced and unspliced count layers.

Parameters:
  • adata (AnnData) – Must contain layers ‘spliced’ and ‘unspliced’.

  • mode ({'dynamical', 'stochastic', 'steady_state'})

  • n_top_genes (int)

  • n_pcs (int)

  • n_neighbors (int)

  • min_shared_counts (int)

  • verbose (bool)

scCS.embedding.compute_local_pseudotime(adata_sub, adata_full, scale_01: bool = True, verbose: bool = True) numpy.ndarray[source]

Recompute velocity pseudotime on the subset’s induced subgraph.

When build_star_embedding uses ordering_metric='pseudotime', the pseudotime is resolved on the full adata before subsetting. This means the pseudotime range within the bifurcation+fate subset is compressed and non-uniform: cells that span the full differentiation axis in the subset may all cluster near 0 or 1 on the arm, leaving large empty stretches.

This function extracts the velocity_graph submatrix for the subset cells, recomputes pseudotime locally, and optionally scales it to [0, 1]. The result is stored in adata_sub.obs['sccs_pseudotime'] and returned as an array.

Call this after build_embedding() and before (or instead of) using the full-adata pseudotime for arm ordering. To rebuild the embedding with the corrected pseudotime, pass the returned array as a custom metric:

scorer.build_embedding(ordering_metric='pseudotime')
pt_sub = compute_local_pseudotime(scorer.adata_sub, adata)
scorer.build_embedding(ordering_metric=pt_sub_full)
# where pt_sub_full is the subset scores mapped back to full adata indices

Alternatively, use the convenience method SingleScorer.refit_pseudotime().

Parameters:
  • adata_sub (AnnData) – Subset returned by build_star_embedding(). Must have uns['sccs']['parent_indices'] set (done automatically).

  • adata_full (AnnData) – Full dataset with intact uns['velocity_graph'].

  • scale_01 (bool) – If True (default), min-max scale the recomputed pseudotime to [0, 1] within the subset. This ensures cells span the full arm length regardless of where the subset sits in the global pseudotime range. If False, the raw pseudotime values are returned (useful when you want to compare absolute pseudotime across conditions).

  • verbose (bool)

Returns:

pt_sub – Subset-local pseudotime, stored in adata_sub.obs['sccs_pseudotime'].

Return type:

np.ndarray, shape (n_sub_cells,)

scCS.embedding.scale_metric_01(scores: numpy.ndarray) numpy.ndarray[source]

Min-max scale a per-cell metric to [0, 1].

Useful for normalizing any differentiation metric (pseudotime, CytoTRACE2, pathway score, etc.) before passing it to build_star_embedding so that cells span the full arm length uniformly.

Parameters:

scores (np.ndarray, shape (n_cells,)) – Per-cell metric values. NaN values are preserved.

Returns:

scaled – Values in [0, 1]. Returns zeros if all values are identical.

Return type:

np.ndarray, shape (n_cells,)