scCS.drivers¶
drivers.py — Driver gene identification for scCS fate arms.
Two complementary strategies:
Velocity-based drivers For each fate arm, rank genes by their mean scVelo velocity in arm cells. High positive velocity = gene is being actively upregulated along that fate. Requires the ‘velocity’ layer (from scVelo pipeline).
DEG-based drivers For each fate arm, run a Wilcoxon rank-sum test comparing arm cells vs the bifurcation (progenitor) cluster. Returns logFC and adjusted p-value per gene, with a significance flag.
Both functions operate on adata_sub (the subset returned by build_star_embedding), which contains only bifurcation + terminal fate cells.
Functions¶
|
Rank genes by mean scVelo velocity in each fate arm's cells. |
|
Find DEGs for each fate arm vs the bifurcation cluster (Wilcoxon). |
|
Identify driver genes by correlating gene velocity with fate affinity. |
Module Contents¶
- scCS.drivers.get_velocity_drivers(adata_sub, fate_names: List[str], obs_key: str, root: str, n_top_genes: int = 50) Dict[str, pandas.DataFrame][source]¶
Rank genes by mean scVelo velocity in each fate arm’s cells.
- Parameters:
adata_sub (AnnData) – Subset containing only bifurcation + terminal fate cells. Must have the ‘velocity’ layer (from scVelo).
fate_names (list of str) – Terminal fate cluster labels.
obs_key (str) – Column in adata_sub.obs with cluster labels.
root (str) – Label of the progenitor cluster (used for context only).
n_top_genes (int) – Number of top driver genes to print per fate.
- Returns:
dict – Sorted by mean_velocity descending (most upregulated first).
- Return type:
fate_name -> DataFrame with columns [gene, mean_velocity, rank]
- scCS.drivers.get_deg_drivers(adata_sub, fate_names: List[str], obs_key: str, root: str, n_top_genes: int = 50, pval_threshold: float = 0.05, logfc_threshold: float = 0.25) Dict[str, pandas.DataFrame][source]¶
Find DEGs for each fate arm vs the bifurcation cluster (Wilcoxon).
For each fate arm, compares arm cells against progenitor (bifurcation) cells using a Wilcoxon rank-sum test via scanpy.
- Parameters:
adata_sub (AnnData) – Subset containing only bifurcation + terminal fate cells.
fate_names (list of str) – Terminal fate cluster labels.
obs_key (str) – Column in adata_sub.obs with cluster labels.
root (str) – Label of the progenitor cluster (reference group).
n_top_genes (int) – Number of top significant DEGs to print per fate.
pval_threshold (float) – Adjusted p-value threshold for significance.
logfc_threshold (float) – Minimum absolute log fold-change for significance.
- Returns:
dict – [gene, logfoldchange, pval, pval_adj, significant] Sorted by logfoldchange descending.
- Return type:
fate_name -> DataFrame with columns:
- scCS.drivers.get_velocity_fate_drivers(adata_sub, cell_scores: numpy.ndarray, fate_names: List[str], obs_key: str, root: str, n_top_genes: int = 50, pval_threshold: float = 0.05, min_cells: int = 10) Dict[str, pandas.DataFrame][source]¶
Identify driver genes by correlating gene velocity with fate affinity.
For each fate arm, computes the Spearman correlation between each gene’s velocity (from the ‘velocity’ layer) and the cell’s fate affinity score (from cell_scores[:, j]). Genes with high positive Spearman correlation are being upregulated specifically as cells commit to that fate — a stronger signal than mean velocity alone, because it filters out genes that are fast everywhere.
Algorithm¶
For each fate j, extract velocity matrix V (n_cells × n_genes).
Extract fate affinity vector a (n_cells,) = cell_scores[:, j].
Compute Spearman correlation between a and each gene’s velocity column.
Compute FDR-corrected p-values (Benjamini-Hochberg via statsmodels).
Return DataFrame sorted by spearman_r descending.
- param adata_sub:
Subset containing only bifurcation + terminal fate cells. Must have the ‘velocity’ layer (from scVelo).
- type adata_sub:
AnnData
- param cell_scores:
Per-cell fate affinity scores from CommitmentScoreResult.cell_scores.
- type cell_scores:
np.ndarray, shape (n_cells, k)
- param fate_names:
Terminal fate cluster labels (length k).
- type fate_names:
list of str
- param obs_key:
Column in adata_sub.obs with cluster labels.
- type obs_key:
str
- param root:
Label of the progenitor cluster.
- type root:
str
- param n_top_genes:
Number of top driver genes to print per fate.
- type n_top_genes:
int
- param pval_threshold:
FDR-adjusted p-value threshold for significance.
- type pval_threshold:
float
- param min_cells:
Minimum number of cells required to compute correlations.
- type min_cells:
int
- returns:
dict –
- [gene, spearman_r, pval, pval_adj, mean_velocity, delta_velocity,
significant]
Sorted by spearman_r descending.
- rtype:
fate_name -> DataFrame with columns: