scCS.enrichment

enrichment.py — Pathway enrichment analysis for scCS fate arms.

Runs Enrichr ORA (over-representation analysis) on DEG driver genes for each fate arm, separately for up- and down-regulated genes.

Default gene sets (mouse):
  • KEGG_2019_Mouse

  • GO_Biological_Process_2021

  • Reactome_2022

Requires gseapy >= 1.0. Install with: pip install gseapy

Results are returned as DataFrames and optionally visualized as dot plots (dot size = gene ratio, color = -log10 adjusted p-value).

Functions

run_enrichment_per_fate(→ Dict[str, Dict[str, ...)

Run Enrichr ORA on DEG driver genes for each fate arm.

export_enrichment_tables(→ List[str])

Save enrichment result DataFrames to CSV files.

Module Contents

scCS.enrichment.run_enrichment_per_fate(deg_drivers: Dict[str, pandas.DataFrame], fate_names: List[str] | None = None, gene_sets: List[str] | None = None, organism: str = 'mouse', pval_threshold: float = 0.05, logfc_threshold: float = 0.25, plot: bool = True, n_top_pathways: int = 15) Dict[str, Dict[str, pandas.DataFrame]][source]

Run Enrichr ORA on DEG driver genes for each fate arm.

Runs separately for up-regulated and down-regulated genes. Requires gseapy >= 1.0.

Parameters:
  • deg_drivers (dict) – Output of get_deg_drivers(). fate_name -> DataFrame[gene, logfoldchange, pval, pval_adj, significant]

  • fate_names (list of str, optional) – Terminal fate cluster labels (determines iteration order). If omitted (default None), the fate names are inferred from deg_drivers.keys() in their natural insertion order. If provided but missing entries that appear in deg_drivers, a warning is emitted and only the intersection is used.

  • gene_sets (list of str, optional) – Enrichr gene set library names. Defaults to KEGG + GO BP + Reactome for the specified organism.

  • organism (str) – ‘mouse’ or ‘human’. Used for default gene sets and Enrichr organism.

  • pval_threshold (float) – Adjusted p-value threshold for reporting enriched terms.

  • logfc_threshold (float) – Minimum absolute logFC used to split up/down gene lists.

  • plot (bool) – If True, generate dot plots per fate per direction.

  • n_top_pathways (int) – Number of top enriched terms to show in dot plots.

Returns:

dict – Each DataFrame has columns: [Gene_set, Term, Overlap, P-value, Adjusted P-value, Genes] Sorted by Adjusted P-value ascending. Empty DataFrame if no significant terms found.

Return type:

fate_name -> {‘up’: DataFrame, ‘down’: DataFrame}

scCS.enrichment.export_enrichment_tables(enrichment_results: Dict[str, Dict[str, pandas.DataFrame]], output_dir: str = '.', prefix: str = 'enrichment') List[str][source]

Save enrichment result DataFrames to CSV files.

Parameters:
  • enrichment_results (dict) – Output of run_enrichment_per_fate().

  • output_dir (str) – Directory to save files.

  • prefix (str) – Filename prefix.

Returns:

list of str

Return type:

paths of saved files.