scCS.enrichment¶

enrichment.py — Pathway enrichment analysis for scCS fate arms.

Runs Enrichr ORA (over-representation analysis) on DEG driver genes for each fate arm, separately for up- and down-regulated genes.

Default gene sets (mouse):

KEGG_2019_Mouse
GO_Biological_Process_2021
Reactome_2022

Requires gseapy >= 1.0. Install with: pip install gseapy

Results are returned as DataFrames and optionally visualized as dot plots (dot size = gene ratio, color = -log10 adjusted p-value).

Functions¶

`run_enrichment_per_fate`(→ Dict[str, Dict[str, ...)	Run Enrichr ORA on DEG driver genes for each fate arm.
`export_enrichment_tables`(→ List[str])	Save enrichment result DataFrames to CSV files.

Module Contents¶

scCS.enrichment.run_enrichment_per_fate(deg_drivers: Dict[str, pandas.DataFrame], fate_names: List[str] | None = None, gene_sets: List[str] | None = None, organism: str = 'mouse', pval_threshold: float = 0.05, logfc_threshold: float = 0.25, plot: bool = True, n_top_pathways: int = 15) → Dict[str, Dict[str, pandas.DataFrame]][source]¶

Run Enrichr ORA on DEG driver genes for each fate arm.

Runs separately for up-regulated and down-regulated genes. Requires gseapy >= 1.0.

Parameters:

deg_drivers (dict) – Output of get_deg_drivers(). fate_name -> DataFrame[gene, logfoldchange, pval, pval_adj, significant]
fate_names (list of str, optional) – Terminal fate cluster labels (determines iteration order). If omitted (default None), the fate names are inferred from deg_drivers.keys() in their natural insertion order. If provided but missing entries that appear in deg_drivers, a warning is emitted and only the intersection is used.
gene_sets (list of str, optional) – Enrichr gene set library names. Defaults to KEGG + GO BP + Reactome for the specified organism.
organism (str) – ‘mouse’ or ‘human’. Used for default gene sets and Enrichr organism.
pval_threshold (float) – Adjusted p-value threshold for reporting enriched terms.
logfc_threshold (float) – Minimum absolute logFC used to split up/down gene lists.
plot (bool) – If True, generate dot plots per fate per direction.
n_top_pathways (int) – Number of top enriched terms to show in dot plots.

Returns:

dict – Each DataFrame has columns: [Gene_set, Term, Overlap, P-value, Adjusted P-value, Genes] Sorted by Adjusted P-value ascending. Empty DataFrame if no significant terms found.

Return type:

fate_name -> {‘up’: DataFrame, ‘down’: DataFrame}

scCS.enrichment.export_enrichment_tables(enrichment_results: Dict[str, Dict[str, pandas.DataFrame]], output_dir: str = '.', prefix: str = 'enrichment') → List[str][source]¶

Save enrichment result DataFrames to CSV files.

Parameters:

enrichment_results (dict) – Output of run_enrichment_per_fate().
output_dir (str) – Directory to save files.
prefix (str) – Filename prefix.

Returns:

list of str

Return type:

paths of saved files.