scCS.enrichment
===============

.. py:module:: scCS.enrichment

.. autoapi-nested-parse::

   enrichment.py — Pathway enrichment analysis for scCS fate arms.

   Runs Enrichr ORA (over-representation analysis) on DEG driver genes
   for each fate arm, separately for up- and down-regulated genes.

   Default gene sets (mouse):
     - KEGG_2019_Mouse
     - GO_Biological_Process_2021
     - Reactome_2022

   Requires gseapy >= 1.0.  Install with: pip install gseapy

   Results are returned as DataFrames and optionally visualized as dot plots
   (dot size = gene ratio, color = -log10 adjusted p-value).


Functions
---------

.. autoapisummary::

   scCS.enrichment.run_enrichment_per_fate
   scCS.enrichment.export_enrichment_tables


Module Contents
---------------

.. py:function:: run_enrichment_per_fate(deg_drivers: Dict[str, pandas.DataFrame], fate_names: Optional[List[str]] = None, gene_sets: Optional[List[str]] = None, organism: str = 'mouse', pval_threshold: float = 0.05, logfc_threshold: float = 0.25, plot: bool = True, n_top_pathways: int = 15) -> Dict[str, Dict[str, pandas.DataFrame]]

   Run Enrichr ORA on DEG driver genes for each fate arm.

   Runs separately for up-regulated and down-regulated genes.
   Requires gseapy >= 1.0.

   :param deg_drivers: Output of get_deg_drivers().
                       fate_name -> DataFrame[gene, logfoldchange, pval, pval_adj, significant]
   :type deg_drivers: dict
   :param fate_names: Terminal fate cluster labels (determines iteration order).  If
                      omitted (default ``None``), the fate names are inferred from
                      ``deg_drivers.keys()`` in their natural insertion order.  If
                      provided but missing entries that appear in ``deg_drivers``, a
                      warning is emitted and only the intersection is used.
   :type fate_names: list of str, optional
   :param gene_sets: Enrichr gene set library names.  Defaults to KEGG + GO BP + Reactome
                     for the specified organism.
   :type gene_sets: list of str, optional
   :param organism: 'mouse' or 'human'.  Used for default gene sets and Enrichr organism.
   :type organism: str
   :param pval_threshold: Adjusted p-value threshold for reporting enriched terms.
   :type pval_threshold: float
   :param logfc_threshold: Minimum absolute logFC used to split up/down gene lists.
   :type logfc_threshold: float
   :param plot: If True, generate dot plots per fate per direction.
   :type plot: bool
   :param n_top_pathways: Number of top enriched terms to show in dot plots.
   :type n_top_pathways: int

   :returns: **dict** -- Each DataFrame has columns:
             [Gene_set, Term, Overlap, P-value, Adjusted P-value, Genes]
             Sorted by Adjusted P-value ascending.
             Empty DataFrame if no significant terms found.
   :rtype: fate_name -> {'up': DataFrame, 'down': DataFrame}


.. py:function:: export_enrichment_tables(enrichment_results: Dict[str, Dict[str, pandas.DataFrame]], output_dir: str = '.', prefix: str = 'enrichment') -> List[str]

   Save enrichment result DataFrames to CSV files.

   :param enrichment_results: Output of run_enrichment_per_fate().
   :type enrichment_results: dict
   :param output_dir: Directory to save files.
   :type output_dir: str
   :param prefix: Filename prefix.
   :type prefix: str

   :returns: **list of str**
   :rtype: paths of saved files.