scCS.drivers ============ .. py:module:: scCS.drivers .. autoapi-nested-parse:: drivers.py — Driver gene identification for scCS fate arms. Two complementary strategies: 1. Velocity-based drivers For each fate arm, rank genes by their mean scVelo velocity in arm cells. High positive velocity = gene is being actively upregulated along that fate. Requires the 'velocity' layer (from scVelo pipeline). 2. DEG-based drivers For each fate arm, run a Wilcoxon rank-sum test comparing arm cells vs the bifurcation (progenitor) cluster. Returns logFC and adjusted p-value per gene, with a significance flag. Both functions operate on adata_sub (the subset returned by build_star_embedding), which contains only bifurcation + terminal fate cells. Functions --------- .. autoapisummary:: scCS.drivers.get_velocity_drivers scCS.drivers.get_deg_drivers scCS.drivers.get_velocity_fate_drivers Module Contents --------------- .. py:function:: get_velocity_drivers(adata_sub, fate_names: List[str], obs_key: str, root: str, n_top_genes: int = 50) -> Dict[str, pandas.DataFrame] Rank genes by mean scVelo velocity in each fate arm's cells. :param adata_sub: Subset containing only bifurcation + terminal fate cells. Must have the 'velocity' layer (from scVelo). :type adata_sub: AnnData :param fate_names: Terminal fate cluster labels. :type fate_names: list of str :param obs_key: Column in adata_sub.obs with cluster labels. :type obs_key: str :param root: Label of the progenitor cluster (used for context only). :type root: str :param n_top_genes: Number of top driver genes to print per fate. :type n_top_genes: int :returns: **dict** -- Sorted by mean_velocity descending (most upregulated first). :rtype: fate_name -> DataFrame with columns [gene, mean_velocity, rank] .. py:function:: get_deg_drivers(adata_sub, fate_names: List[str], obs_key: str, root: str, n_top_genes: int = 50, pval_threshold: float = 0.05, logfc_threshold: float = 0.25) -> Dict[str, pandas.DataFrame] Find DEGs for each fate arm vs the bifurcation cluster (Wilcoxon). For each fate arm, compares arm cells against progenitor (bifurcation) cells using a Wilcoxon rank-sum test via scanpy. :param adata_sub: Subset containing only bifurcation + terminal fate cells. :type adata_sub: AnnData :param fate_names: Terminal fate cluster labels. :type fate_names: list of str :param obs_key: Column in adata_sub.obs with cluster labels. :type obs_key: str :param root: Label of the progenitor cluster (reference group). :type root: str :param n_top_genes: Number of top significant DEGs to print per fate. :type n_top_genes: int :param pval_threshold: Adjusted p-value threshold for significance. :type pval_threshold: float :param logfc_threshold: Minimum absolute log fold-change for significance. :type logfc_threshold: float :returns: **dict** -- [gene, logfoldchange, pval, pval_adj, significant] Sorted by logfoldchange descending. :rtype: fate_name -> DataFrame with columns: .. py:function:: get_velocity_fate_drivers(adata_sub, cell_scores: numpy.ndarray, fate_names: List[str], obs_key: str, root: str, n_top_genes: int = 50, pval_threshold: float = 0.05, min_cells: int = 10) -> Dict[str, pandas.DataFrame] Identify driver genes by correlating gene velocity with fate affinity. For each fate arm, computes the Spearman correlation between each gene's velocity (from the 'velocity' layer) and the cell's fate affinity score (from cell_scores[:, j]). Genes with high positive Spearman correlation are being upregulated specifically as cells commit to that fate — a stronger signal than mean velocity alone, because it filters out genes that are fast everywhere. Algorithm --------- 1. For each fate j, extract velocity matrix V (n_cells × n_genes). 2. Extract fate affinity vector a (n_cells,) = cell_scores[:, j]. 3. Compute Spearman correlation between a and each gene's velocity column. 4. Compute FDR-corrected p-values (Benjamini-Hochberg via statsmodels). 5. Return DataFrame sorted by spearman_r descending. :param adata_sub: Subset containing only bifurcation + terminal fate cells. Must have the 'velocity' layer (from scVelo). :type adata_sub: AnnData :param cell_scores: Per-cell fate affinity scores from CommitmentScoreResult.cell_scores. :type cell_scores: np.ndarray, shape (n_cells, k) :param fate_names: Terminal fate cluster labels (length k). :type fate_names: list of str :param obs_key: Column in adata_sub.obs with cluster labels. :type obs_key: str :param root: Label of the progenitor cluster. :type root: str :param n_top_genes: Number of top driver genes to print per fate. :type n_top_genes: int :param pval_threshold: FDR-adjusted p-value threshold for significance. :type pval_threshold: float :param min_cells: Minimum number of cells required to compute correlations. :type min_cells: int :returns: **dict** -- [gene, spearman_r, pval, pval_adj, mean_velocity, delta_velocity, significant] Sorted by spearman_r descending. :rtype: fate_name -> DataFrame with columns: