scCS.enrichment =============== .. py:module:: scCS.enrichment .. autoapi-nested-parse:: enrichment.py — Pathway enrichment analysis for scCS fate arms. Runs Enrichr ORA (over-representation analysis) on DEG driver genes for each fate arm, separately for up- and down-regulated genes. Default gene sets (mouse): - KEGG_2019_Mouse - GO_Biological_Process_2021 - Reactome_2022 Requires gseapy >= 1.0. Install with: pip install gseapy Results are returned as DataFrames and optionally visualized as dot plots (dot size = gene ratio, color = -log10 adjusted p-value). Functions --------- .. autoapisummary:: scCS.enrichment.run_enrichment_per_fate scCS.enrichment.export_enrichment_tables Module Contents --------------- .. py:function:: run_enrichment_per_fate(deg_drivers: Dict[str, pandas.DataFrame], fate_names: Optional[List[str]] = None, gene_sets: Optional[List[str]] = None, organism: str = 'mouse', pval_threshold: float = 0.05, logfc_threshold: float = 0.25, plot: bool = True, n_top_pathways: int = 15) -> Dict[str, Dict[str, pandas.DataFrame]] Run Enrichr ORA on DEG driver genes for each fate arm. Runs separately for up-regulated and down-regulated genes. Requires gseapy >= 1.0. :param deg_drivers: Output of get_deg_drivers(). fate_name -> DataFrame[gene, logfoldchange, pval, pval_adj, significant] :type deg_drivers: dict :param fate_names: Terminal fate cluster labels (determines iteration order). If omitted (default ``None``), the fate names are inferred from ``deg_drivers.keys()`` in their natural insertion order. If provided but missing entries that appear in ``deg_drivers``, a warning is emitted and only the intersection is used. :type fate_names: list of str, optional :param gene_sets: Enrichr gene set library names. Defaults to KEGG + GO BP + Reactome for the specified organism. :type gene_sets: list of str, optional :param organism: 'mouse' or 'human'. Used for default gene sets and Enrichr organism. :type organism: str :param pval_threshold: Adjusted p-value threshold for reporting enriched terms. :type pval_threshold: float :param logfc_threshold: Minimum absolute logFC used to split up/down gene lists. :type logfc_threshold: float :param plot: If True, generate dot plots per fate per direction. :type plot: bool :param n_top_pathways: Number of top enriched terms to show in dot plots. :type n_top_pathways: int :returns: **dict** -- Each DataFrame has columns: [Gene_set, Term, Overlap, P-value, Adjusted P-value, Genes] Sorted by Adjusted P-value ascending. Empty DataFrame if no significant terms found. :rtype: fate_name -> {'up': DataFrame, 'down': DataFrame} .. py:function:: export_enrichment_tables(enrichment_results: Dict[str, Dict[str, pandas.DataFrame]], output_dir: str = '.', prefix: str = 'enrichment') -> List[str] Save enrichment result DataFrames to CSV files. :param enrichment_results: Output of run_enrichment_per_fate(). :type enrichment_results: dict :param output_dir: Directory to save files. :type output_dir: str :param prefix: Filename prefix. :type prefix: str :returns: **list of str** :rtype: paths of saved files.