scCS.enrichment¶
enrichment.py — Pathway enrichment analysis for scCS fate arms.
Runs Enrichr ORA (over-representation analysis) on DEG driver genes for each fate arm, separately for up- and down-regulated genes.
- Default gene sets (mouse):
KEGG_2019_Mouse
GO_Biological_Process_2021
Reactome_2022
Requires gseapy >= 1.0. Install with: pip install gseapy
Results are returned as DataFrames and optionally visualized as dot plots (dot size = gene ratio, color = -log10 adjusted p-value).
Functions¶
|
Run Enrichr ORA on DEG driver genes for each fate arm. |
|
Save enrichment result DataFrames to CSV files. |
Module Contents¶
- scCS.enrichment.run_enrichment_per_fate(deg_drivers: Dict[str, pandas.DataFrame], fate_names: List[str] | None = None, gene_sets: List[str] | None = None, organism: str = 'mouse', pval_threshold: float = 0.05, logfc_threshold: float = 0.25, plot: bool = True, n_top_pathways: int = 15) Dict[str, Dict[str, pandas.DataFrame]][source]¶
Run Enrichr ORA on DEG driver genes for each fate arm.
Runs separately for up-regulated and down-regulated genes. Requires gseapy >= 1.0.
- Parameters:
deg_drivers (dict) – Output of get_deg_drivers(). fate_name -> DataFrame[gene, logfoldchange, pval, pval_adj, significant]
fate_names (list of str, optional) – Terminal fate cluster labels (determines iteration order). If omitted (default
None), the fate names are inferred fromdeg_drivers.keys()in their natural insertion order. If provided but missing entries that appear indeg_drivers, a warning is emitted and only the intersection is used.gene_sets (list of str, optional) – Enrichr gene set library names. Defaults to KEGG + GO BP + Reactome for the specified organism.
organism (str) – ‘mouse’ or ‘human’. Used for default gene sets and Enrichr organism.
pval_threshold (float) – Adjusted p-value threshold for reporting enriched terms.
logfc_threshold (float) – Minimum absolute logFC used to split up/down gene lists.
plot (bool) – If True, generate dot plots per fate per direction.
n_top_pathways (int) – Number of top enriched terms to show in dot plots.
- Returns:
dict – Each DataFrame has columns: [Gene_set, Term, Overlap, P-value, Adjusted P-value, Genes] Sorted by Adjusted P-value ascending. Empty DataFrame if no significant terms found.
- Return type:
fate_name -> {‘up’: DataFrame, ‘down’: DataFrame}
- scCS.enrichment.export_enrichment_tables(enrichment_results: Dict[str, Dict[str, pandas.DataFrame]], output_dir: str = '.', prefix: str = 'enrichment') List[str][source]¶
Save enrichment result DataFrames to CSV files.
- Parameters:
enrichment_results (dict) – Output of run_enrichment_per_fate().
output_dir (str) – Directory to save files.
prefix (str) – Filename prefix.
- Returns:
list of str
- Return type:
paths of saved files.