# DCR Triage Agent — Architecture Overview

> Last updated: 2026-05-25
> Status: **V0.16 — Pure-CLI release-readiness** (zero Azure / GitHub Models dependency, Claude Opus 4.7 locked, per-step error attribution, live progress polling, clean release hygiene)
> Previous: V0.15 (2026-05-21) — Fresh-clone reliability + ownership-first triage

## V0.16 changes at a glance

| Area | What changed | Where |
|------|--------------|-------|
| **Pure-CLI architecture** | Every LLM call now routes through ``copilot --allow-all --model claude-opus-4.7`` (or ``claude --print``). **All Azure OpenAI and GitHub Models direct-API code paths were ripped out of the web server**: ``_build_cli_engine`` no longer creates ``fast_llm``, ``_get_lesson_pair`` uses ``CliLLMClient``, the legacy ``build_mock_pipeline()`` Azure init was bypassed entirely. Rationale: every PM downloads the agent and uses their own copilot CLI — they don't have an Azure subscription. The IcM / ADO / Bing / MS-Learn MCP tools are reachable only via the CLI subprocess anyway, so Azure-direct was costing us MCP access while gaining a dependency. | ``scripts/server.py`` (Azure block deleted, ``fast_llm = cli_llm`` shorthand); ``src/shared/cli_llm.py`` (locked ``--model claude-opus-4.7``, env override ``DCR_COPILOT_MODEL``); ``scripts/auto_setup.py`` (``--with-azure-openai`` becomes a deprecated no-op); ``start.bat`` / ``start.ps1`` (drop the flag); ``src/agents/qualification_checker.py`` + ``src/tools/knowledge_retriever.py`` (drop hardcoded ``model="gpt-4o"``) |
| **Per-step error attribution** | New ``TriageResult.step_status: dict[str, str]`` field. Each LLM-using step is wrapped in try/except so a single CLI hiccup degrades only that step — the pipeline continues with safe defaults. Each step writes one of ``ok`` / ``ok:<note>`` / ``skipped:<reason>`` / ``failed:<ExceptionClass>:<msg>``. Surfaced verbatim in ``TriageResponse.step_status`` on the API. A grep-friendly summary line ``[icm-X] step_status: 9/9 OK`` is logged at pipeline end (ERROR level if anything failed). | ``src/shared/models.py`` ``TriageResult.step_status``; ``src/orchestration/triage_engine.py`` ``_step_failed()`` + try/except around each step + ``asyncio.gather(return_exceptions=True)``; ``scripts/server.py`` ``TriageResponse.step_status`` |
| **Live progress polling** | Module-level ``_LIVE_TRIAGE_PROGRESS: dict[dcr_id, step_status]`` registry, populated by ``triage()`` and cleared in ``finally``. Each step calls ``_step_start("step2-classify")`` to write ``in_progress`` BEFORE actual work begins. New ``GET /api/triage/progress/{incident_id}`` endpoint returns the live snapshot. Frontend polls every 2 s during a run and updates per-step icons (⬜ / 🔄 / ✅ / ❌ / ⏭) in real time — replaces the old cosmetic timer that pretended to make progress on fake timings. | ``src/orchestration/triage_engine.py`` ``_LIVE_TRIAGE_PROGRESS`` + ``get_live_progress()``; ``scripts/server.py`` ``GET /api/triage/progress/{id}``; ``frontend/index.html`` ``showTriageLoading()`` polling + ``_STEP_META`` 9-row mapping |
| **L1 JSON-repair hardening** | ``_try_loads_repaired`` now handles 13 distinct LLM malformed-JSON patterns (was 4): Python literals (``True``/``False``/``None``), missing commas between key-value pairs, auto-close unbalanced ``{}``/``[]`` (max_tokens truncation), trim unterminated trailing strings, Python-dict-style single-quote bulk swap, prefix-truncation last-resort walk, plus the original ``` ``` ``` fence strip / trailing-comma fix / smart-quote normalize. 13/13 repair on real-world malformed-JSON corpus. | ``src/agents/classifier.py`` ``_try_loads_repaired`` |
| **L1 validator hardening** | New ``_unwrap_llm_value()`` helper handles ``dict`` (extracts ``value`` / ``label`` / ``name`` / ``text`` / ``choice`` / ``answer`` key), ``list`` (takes first element), ``bool`` (True→"true" / False→"false"). Wired into all 5 enum validators (``confidence`` / ``change_type`` / ``initial_judgment`` / ``impact_scope`` / ``urgency``). ``confidence=True`` → ``high``, ``confidence=False`` → ``low``. | ``src/shared/models.py`` ``_unwrap_llm_value`` + 5 validator updates |
| **``need_info_reason`` attribution** | New ``Classification.need_info_reason: Optional[str]`` field tags 5 fallback paths so PMs can tell "LLM legitimately needs info" from "pipeline bailed out": ``empty-judgment-field`` / ``unmapped-judgment:<raw>`` / ``l4-safety-net-total`` / ``infra:classify_stage_b`` / ``exception:<class>``. Exposed at ``classification.need_info_reason`` in API. Frontend shows a yellow ⚠️ Fallback badge when set. | ``src/shared/models.py`` ``Classification.need_info_reason``; ``src/agents/classifier.py`` (5 fallback sites annotated) |
| **Per-DCR "Clear cache & Re-run" button** | New button on the detail-header beside "▶ Run Triage". Wipes both server-side cache (``DELETE /api/triage/cache/{id}``) and localStorage entry, then triggers a fresh CLI triage. Driven by the new cache-management endpoints: ``GET /api/triage/cache`` (list), ``DELETE /api/triage/cache/{id}?variant=all\|cli\|llm``, ``DELETE /api/triage/cache?confirm=yes``. CLI equivalent: ``python -m scripts.clear_cache --list \| --id 770883931 \| --all --yes \| --runtime --yes``. | ``frontend/index.html`` ``clearCacheAndRerun()``; ``scripts/server.py`` 4 cache endpoints; ``scripts/clear_cache.py`` (NEW) |
| **Release hygiene** | ``data/triage_cache/``, ``data/observability/``, ``data/llm_cache/``, ``data/eval_goldens/``, ``data/splits/`` are now ``gitignored``. Previously these accumulated per-PM run artefacts and shipped via clone — PM B would open the UI and see PM A's verdicts. SHIPPED: ``data/icm_cache/`` (team-shared baseline), ``data/feedback/overrides.jsonl`` (training data), ``data/learning/learned_lessons.jsonl`` + ``.regression_p1_snapshot.json`` (PM-Override-distilled rules so new PMs get day-1 intelligence). ``scripts/clear_cache.py --runtime`` provides the equivalent reset for existing clones. | ``.gitignore``; ``scripts/clear_cache.py`` ``cmd_delete_runtime`` |
| **zh-CN console fix** | Server boot used to crash on ``UnicodeEncodeError`` when ``announce_log_file`` printed the 📝 emoji on a gbk-default console (Mellisa's machine and every PM with zh-CN Windows). Fixed at 3 layers: ``scripts/server.py`` reconfigures ``sys.stdout`` / ``sys.stderr`` to UTF-8 + ``errors="replace"`` before any print; ``start.bat`` / ``start.ps1`` set ``chcp 65001`` + ``PYTHONIOENCODING=utf-8`` + ``PYTHONUTF8=1`` + ``[Console]::OutputEncoding`` so emoji actually renders cleanly. | ``scripts/server.py`` (top-of-file UTF-8 reconfigure); ``start.bat`` + ``start.ps1`` |
| **Frontend: 1-9 step numbering + real-time icons** | Loading panel and post-triage status card both show 9 sequential rows (1-9 visual numbering, 1:1 mapped to the 9 backend ``step_status`` keys). The earlier cosmetic timer that pretended to advance through steps on ``stepTimings=[1,5,2,8,30,30,40,30]`` was removed — every PM hit it and complained the UI "stalled at Step 5". Replaced with real polling from ``/api/triage/progress``. On a clean run the post-triage status card stays hidden — only shows when at least one step failed or used a fallback path, with an inline Problem + Fix block. | ``frontend/index.html`` ``_STEP_META`` (9 rows), ``showTriageLoading()`` (real-time poll), ``renderPipelineStatusCard()`` (hidden on clean run), ``_explainStepFailure()`` (problem + fix per error type) |

---

## V0.15 changes at a glance

| Area | What changed | Where |
|------|--------------|-------|
| **Pre-flight pipeline** | 7-layer startup check (PowerShell, settings.yaml, copilot/claude CLI, version, `--allow-all`, `gh auth`, IcM MCP OAuth token) gates server boot. Aborts with the exact fix command on any critical failure. Root-caused the "every fresh-clone triage = pending" failure mode (hardcoded `pwsh.exe` on stock Windows). Since v0.17 the IcM check probes `oauth_token_resolver.resolve_icm_access_token()` (reuses the copilot/claude CLI's IcM MCP OAuth session) — no `az login` required. | `scripts/preflight_check.py`; `start.bat` / `start.ps1` / `start.sh`; `GET /api/preflight` endpoint; frontend top banner |
| **`Classification.infra_error` plumbing** | When the LLM subprocess plumbing breaks, `cli_llm._run_cli_sync` returns a `{"_cli_infra_error": "<cause>"}` sentinel. Classifier short-circuits to a `Classification(..., infra_error=<cause>)` without retry. response_generator skips Stage B. server exposes it. Frontend swaps "Pending" for a red **"❌ Classification skipped — &lt;cause&gt;"** card with the fix command — never the misleading "Pending: Need Info" verdict. | `src/shared/cli_llm.py` (`_detect_powershell`, sentinel); `src/shared/models.py` `Classification.infra_error`; `src/agents/classifier.py` `[CLI-INFRA-ERROR]` marker; `src/agents/response_generator.py` `_is_failed_classification`; `frontend/index.html` setup banner + infra-error card |
| **Step 0a — Ownership cross-check (FIRST)** | Classifier prompt's Decision Precedence promoted re-assign from old Step 4 to **Step 0a — runs BEFORE risk / reject / accept-existing**. Source of truth is `docs/pm-feature-area-mapping.md` (injected as `[ref: pm-ownership-map]`, always-on, mtime-cached so edits take effect on next triage without restart). Covers both intra-team (e.g., audit DCR routed to recording PM, should be Tristan not Melissa) and cross-team (→ Purview / ODSP / Loop / Forms) mismatches. | `config/context/judgment_catalog.yaml` `global_rules.decision_precedence`; `docs/pm-feature-area-mapping.md` |
| **OwnershipRouter bug reversed** | Old code: `external_hits ≥ 2 → needs_reroute=False` (silently suppressed legitimate cross-team reassigns when DCR mentioned compliance/Purview/DLP). New: same hit count → recommends the specific external team (Purview, ODSP, Loop, Forms, Exchange, Entra ID) by signal-to-team map. | `src/agents/ownership_router.py:60-95` |
| **Customer / Tenant chips in UI** | `/api/customer-info/{id}` now exposes `customer_name` + `tenant_id` (from `data/icm_cache/powerbi_import.json` via `PRESERVED_FIELDS`). Frontend renders 🏢 Customer + 🆔 Tenant ID chips in info bar **and** left-side DCR list. | `scripts/server.py` `PRESERVED_FIELDS`, `_patch_local_fields()`, `/api/customer-info`; `frontend/index.html` `_renderCustomerBar` + left-list row |
| **Customer-info perf + timeout** | Cache hit fast path: 40 ms (was 12 s+ when stale cache fell through to MCP). Each MCP call wrapped in `asyncio.wait_for(timeout=3.0)`. Frontend AbortController 8 s safety net replaces eternal "Loading customer info…" spinner with `⚠ customer info unavailable (timeout)`. | `scripts/server.py`; `frontend/index.html` `loadCustomerInfo` |
| **PM-feedback distillation workflow** | Defined the canonical flow: PM correction is auto-captured raw in `data/learning/learned_lessons.jsonl` (audit trail only) → human distills the architectural insight into ONE sentence → adds to `judgment_catalog.yaml` `when_to_use` of the right judgment, citing `(PM &lt;name&gt; &lt;date&gt;, IcM #&lt;id&gt;)`. Do NOT also enrich the lesson or write to `layer3_special.yaml` (keep that for stable platform invariants only). | `config/context/judgment_catalog.yaml` (3 new bullets today: storage-bundle, admin-disclosure, wrong-PM mapping) |
| **Code hygiene** | Deleted `src/shared/mock_llm.py` (155 lines, 0 references). Cleaned 17 unused imports + 2 redefinitions across 14 files. Fixed `preflight_check.check_settings_yaml` missing `def` header (was 100% NameError on boot) + `server.py:985` undefined `logger` (should be `logging.warning`). | (see commit) |

---

## V0.14 changes at a glance

| Area | What changed | Where |
|------|--------------|-------|
| **L2 dynamic MCP discovery** | Startup queries `<cli> mcp list` and injects real server names (`WebSearch`, `icm`, `WorkIQ-*` on Copilot CLI; `mcp__bing_search` etc. on Claude Code) into prompts. Hardcoded Claude-style tool names removed from `mcp_guidance.yaml`. | `src/tools/mcp_discovery.py` (NEW); `src/context/assembler.py` `assemble_mcp_guidance()` |
| **L3 backend MCP retrieval** | Before Step 2 classify runs, the backend spawns a CLI subprocess to invoke the discovered web-search server with the DCR title — results are injected as `mcp:ms-learn` / `mcp:bing` pre-retrieved evidence. LLM no longer decides whether to call MCP. | `src/tools/knowledge_retriever.py` `_search_via_cli_mcp()` |
| **L3 + L4 schema safety net** | `Classification` model has `@field_validator(mode="before")` on every LLM-emitted field — wild values (`confidence="0.55"`, capitalized enums, sentence-as-enum) get normalized instead of crashing the triage. `_safe_build_classification` adds a belt-and-braces layer that progressively strips suspect fields rather than fall back to pending: need-info. | `src/shared/models.py`; `src/agents/classifier.py`; `tests/unit/test_classification_robustness.py` (117-case fuzz) |
| **Empty-judgment retry** | When first-pass classify returns no `initial_judgment`, automatic retry with a focused ~5KB prompt that preserves the critical hard rules (reject_check Q1 patterns + MCP-FIRST). | `src/agents/classifier.py` `classify()` |
| **Prompt slim + cross-source dedup** | system prompt 47 KB → 21 KB (catalog concise mode); classify.user 33 KB → 6 KB; mcp_guidance 5.4 KB → 1.6 KB. Each hard rule now appears in exactly one place. | `src/feedback/catalog_loader.py` `concise=True`; `config/prompts.yaml`; `config/context/mcp_guidance.yaml` |
| **Suggested Response highlighting REQUIRED** | Bold the verdict short phrase + action sentence + 1-2 key product nouns per major paragraph. Self-Eval rule 6c flags drafts with 0 bold or misused bold as DOWNGRADE. | `config/prompts.yaml` `suggest_response.system` Part 1 |
| **Tier-A `summary` REQUIRED** | Every `mcp:*` / `icm-history` EvidenceRef MUST carry a non-empty `summary` so PMs can scan References at a glance. Self-Eval rule 6b enforces this. | `config/prompts.yaml` schema for classify/dedup/escalate; `self_eval.system` |
| **Customer-draft scrubber unjammed** | `_scrub_internal_leakage` was deleting legitimate phrases ("low confidence", "the agent", "will route this", "fallback to", "is not a valid") — when too much got stripped the draft fell to the 455-char generic template with no PM Reference, even though Stage A worked fine. Pattern list shrunk from 20 → 8 (kept only true internal-state phrases like "classification crashed" + a Pydantic stack-trace regex). When scrubbing leaves <40 chars, return the original draft unchanged instead of substituting the static template. Guard test pins the 12 false-positive patterns so they can't be added back without proof. | `src/agents/response_generator.py` `_INTERNAL_LEAK_PATTERNS` + `_scrub_internal_leakage`; `tests/unit/test_response_scrub.py` (14 new tests, including 5 false-positive regressions and a guard against re-adding the deleted patterns) |
| **Azure OpenAI strict-JSON retry** | ~~When CLI Copilot's first-pass classify either fails to parse (trailing comma / unfenced markdown / missing comma like the 5/20 `787719188` failure) or returns empty judgment, the retry call goes to Azure OpenAI (gpt-5.5) with `response_format=json_schema strict:true`.~~ **❌ REMOVED in V0.16** — pure-CLI now, retry uses the SAME CLI engine with a focused ~5KB prompt; re-sampling on an independent subprocess usually fixes the rare bad-JSON cases. JSON repair (next row) absorbs most of them before retry is needed. | (removed) |
| **JSON repair helper** | Before falling through to retry, light repair tries to fix common LLM JSON sins (strip ` ```json ``` ` fences, remove trailing commas, normalize smart quotes). Cuts retry rate by ~70% on observed failure modes. | `src/agents/classifier.py` `_try_loads_repaired()`; `tests/unit/test_classification_robustness.py` 8 new repair tests |
| **Coercion observability** | Module-level `COERCION_EVENTS` counter records every LLM-output coercion; observability writes `coercion_counts` per LLM call so the dashboard can spot emerging schema drift before it becomes a user-visible bug. | `src/shared/models.py`; `src/feedback/observability.py` |

> Last updated: 2026-05-19  (V0.13 historical line, kept for context below)

---

## 一句话概述

输入一个 IcM Incident ID → 实时拉取事件数据 → **渐进式上下文组装** → LLM 7 步分析（**带可追溯证据链** + **Self-Eval 自我监管**）→ 输出分类/去重/建议回复/升级建议，自动映射 IcM DCR Decision 和 How Fixed 字段。每条结论都挂在 `reasoning_steps[].evidence[]` 上，PM 点开就能验证。

---

## 系统架构图

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        Frontend (browser)                              │
│                     http://127.0.0.1:8080                              │
│                      frontend/index.html                               │
│              Single-page HTML + vanilla JS, dark theme                 │
│                                                                         │
│  展示顺序: Judgment Banner (含 Self-Eval ✅/⚠️/❌ + Trust 横条)         │
│           → Suggested Response (纯客户邮件)                              │
│           → References Card (Why / Evidence / Implication 三段)         │
│             ├─ Doc (Tier-A 绿色 verbatim blockquote + Agent summary)   │
│             ├─ Product Behavior (Tier-B 蓝色推理 + "no clickable doc") │
│             └─ Past DCR (Tier-C 灰色, IcM 链接)                         │
│           → Classification (含 PM Feedback 按钮 + reasoning_steps 论证链)│
│           → Escalation (5 factor 各带 evidence)                          │
│           → Safety → Dedup → Routing                                   │
│  客户信息栏: S500/CritSit/SR数/类型/提交人 (实时 IcM API + 磁盘缓存)     │
│  Analytics: 📊 Dashboard 底部 LLM 调用分析 (含 evidence_quality 指标)   │
│  持久化: localStorage + 服务端磁盘缓存                                    │
│  优先展示: CLI 结果 (CLI-only 项目)                                      │
└───────────────────────────┬─────────────────────────────────────────────┘
                            │ POST /api/triage/cli (CLI Mode — 项目唯一路径)
                            │ GET  /api/customer-info/{id} (客户信息)
                            │ GET  /api/analytics  (LLM 可观测性 + evidence_quality)
                            │ POST /api/feedback   (PM 反馈)
                            ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                   FastAPI Server (scripts/server.py)                    │
│                        uvicorn, port 8080                              │
│                                                                         │
│  ┌──────────────┐  ┌───────────────┐  ┌───────────────────────────────┐│
│  │ IcmIntake     │  │ TriageEngine  │  │  Harness Engineering          ││
│  │ (normalize)   │─▶│ (7-step)      │  │  ├ ContextAssembler           ││
│  └──────┬───────┘  │               │  │  │   + get_injection_inventory ││
│         │          │  Step 0~5     │  │  │   + assemble_mcp_guidance   ││
│         │          │  + Step 2b    │  │  ├ ObservabilityLogger        ││
│         │          │  (Self-Eval)  │  │  │   (含 evidence_quality)     ││
│         │          │  + 2nd-pass   │  │  ├ SelfReviewer (tier-aware,  ││
│         │          │  dedup enrich │  │  │   MCP-bypass detect)        ││
│         │          └──────┬────────┘  │  └ FeedbackStore              ││
│         │                 │            └───────────────────────────────┘│
│         │          ┌──────┴────────┐                                   │
│         │          │ Triage Cache   │                                   │
│         │          │ + Customer Cache│                                  │
│         │          └───────────────┘                                   │
└─────────┼───────────────────────────────────────────────────────────────┘
          │
          ▼
┌──────────────────┐  ┌──────────────────────────────────────────────────┐
│   IcM MCP Server │  │         LLM 引擎 (CLI-only)                       │
│   (real-time)    │  │                                                  │
│                  │  │  核心推理:  Copilot CLI / Claude Code             │
│  PROD endpoint:  │  │    → CliLLMClient (subprocess per call)          │
│  icm-mcp-prod.   │  │    → Tier-A 证据通过 LLM 内调 MCP 工具检索       │
│  azure-api.net   │  │      (mcp:ms-learn / mcp:bing / mcp:ado-wiki     │
│  /v1/            │  │       / mcp:bluebird / mcp:icm-incident / ...)   │
│                  │  │                                                  │
│  Auth: Azure AD  │  │  V0.16+: 每一步全走 CLI Copilot (Claude Opus 4.7)│
│  JSON-RPC 2.0    │  │    无 Azure OpenAI / GitHub Models 直连依赖      │
│  over SSE        │  │    PM 用自己的 copilot/claude CLI 即可，无需订阅 │
└──────────────────┘  └──────────────────────────────────────────────────┘
```

---

## 7 步 Triage Pipeline

```
IcM Incident
     │
     ▼
┌─ Step 0: Safety Boundary ──────────────────────────── 规则引擎 (无 LLM) ─┐
│  检查 Sev0/1、S500/ACE + 广泛影响、合规/法律/隐私、Declared Outage         │
│  触发 → 强制 PM Review                                                   │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 0b: Qualification Check ────────────────── 规则 + LLM (可选) ──────┐
│  检查描述是否足够、是否缺客户症状、S500 严重级别是否合理                     │
│  不足 → 生成澄清问题                                                     │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 0c (legacy name) / Step 0a (v0.15) — Ownership Cross-Check ───────┐
│  ★ v0.15: 升为 Decision Precedence Step 0a，先于 risk / reject / accept  │
│  Source of truth: `docs/pm-feature-area-mapping.md` 的 AGENT-CONTEXT 块   │
│    （永远注入到 system prompt 的 `[ref: pm-ownership-map]` 段；           │
│    mtime-cached — 编辑保存后下次 triage 自动生效，无需重启）              │
│  匹配 DCR 实际话题 → 找映射表对应 row → 比对 assignee →                  │
│    mismatch 立即 `re-assign`，覆盖组内（如 audit DCR → Tristan）+         │
│    跨团队（→ Purview / ODSP / Loop / Forms / Exchange / Entra）          │
│  OwnershipRouter (`src/agents/ownership_router.py`) 在后端运行，         │
│    `external signal → team` 映射主动推荐外部团队                          │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 1: Doc-Answerable Pre-Check + L3 Backend MCP Retrieval ───────────┐
│  Layer 1: "有没有 existing document 可以帮他解答" (Tristan 访谈)           │
│  搜索本地文档 (`doc/` 和 `docs/` Markdown 兜底)                            │
│                                                                            │
│  ★ v0.14: 同步触发 L3 backend MCP retrieval                                │
│  → KnowledgeRetriever._search_via_cli_mcp() 调启动时发现的 web-search 服务 │
│    (Copilot CLI: `WebSearch`; Claude Code: `mcp__bing_search`)             │
│  → query = "<DCR title> Microsoft Teams site:learn.microsoft.com"          │
│  → 结果以 `mcp:ms-learn` / `mcp:bing` 形式注入 classify prompt 的         │
│    "Pre-Retrieved External Evidence" 段                                    │
│  → 每个 DCR 至少有 1 条外部证据保底，不再依赖 LLM 自觉调 MCP              │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 2: Stage A — Classification ─────────────── LLM Call #1 ──────────┐
│  ★ 渐进式上下文注入 (ContextAssembler):                                    │
│    Layer 0 (始终): 角色身份 + 判断优先级规则                                │
│    Layer 1 (始终): Change Type 枚举                                        │
│    Layer 1.5 (SSOT): judgment_catalog.yaml — 9 judgment + decision_tree   │
│                       + risk_tier + reject_evidence_bar + rate_caps        │
│                       + seed_examples (每条带 [ref: ...] 标签)            │
│    Layer 2 (按需): 匹配到的 PM profile (1-2个) + 路由信号                   │
│    Layer 3 (条件): 安全/合规上下文 (仅 safety 触发时)                       │
│    ★ MCP Guidance (无条件追加): MCP-FIRST HARD CONSTRAINT                  │
│       budget classify ≤12, MIN 1, 跳过 MCP 走 Tier-B → Self-Eval FAIL    │
│    ★ v0.14: MCP guidance 前面**自动 prepend** L2 动态发现的工具清单         │
│       (`copilot mcp list` / `claude mcp list` 解析得到) — LLM 看到的工具   │
│       名是它**实际能调**的，不再依赖硬编码                                  │
│    ★ v0.14: Pre-Retrieved External Evidence 段把 L3 后端检索的 doc 注入   │
│       prompt — LLM 直接引用，无需自己再调 MCP（但可以追加调用）            │
│                                                                            │
│  ★ v0.13 起 Stage A 只输出 classification，不写客户邮件 (移到 Stage B)     │
│  ★ v0.14 起 Stage A 空 judgment 自动 retry 一次（~5KB 紧凑 prompt 保留   │
│       reject_check Q1 + MCP-FIRST 关键规则），救空 judgment 失败           │
│                                                                            │
│  判断优先级: PRIMARY=独立分析DCR内容 > SECONDARY=PM历史模式(仅参考语气)     │
│                                                                            │
│  Step 0 前置判断: "这是 DCR 吗？"                                          │
│  → Bug/回归 → non-dcr (转 bug triage)                                     │
│  → 客户误解 → non-dcr (解释预期行为)                                       │
│  → 是 DCR → 继续分类                                                      │
│                                                                            │
│  输出 (Stage A JSON, ~2K tokens):                                          │
│  • change_type: 8种 (feature-request, interaction-change, accessibility..)│
│  • initial_judgment: 9种，映射到 DCR Decision (Accepted/Rejected/Re-assign/Pending):│
│      Accepted:  accepted: accept-backlog, accepted: accept-existing               │
│      Rejected:  rejected: reject-by-design, rejected: reject-technical,           │
│                 rejected: non-dcr, rejected: close-no-response                    │
│      Re-assign: re-assign: re-assign                                              │
│      Pending:   pending: need-info, pending: discuss (≤5%)                        │
│  • dcr_decision / how_fixed: 自动映射到 IcM 系统字段                               │
│  • impact_scope / urgency / confidence / reasoning                        │
│  • ★ reasoning_steps[]: 论证链，每步含 claim + inference_type +            │
│       evidence[]: list[EvidenceRef]                                       │
│       — EvidenceRef 三类:                                                  │
│         Tier-A: icm-history + mcp:* (含 verbatim quote + URL)             │
│         Tier-B: product-behavior / design-pattern (agent 推理, 可选 summary)│
│         Internal: catalog-rule / lesson / pm-pattern 等 (PM 不可见)        │
│                                                                            │
│  ObservabilityLogger: 记录 tokens/latency/judgment/confidence/layers      │
│                      + evidence_quality (external/inferential/mcp counts) │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 2b: Self-Eval (Actor-Critic QA 审查) ────── CLI LLM (strict mode) ┐
│  独立 Critic LLM 验证 Actor 的 triage 输出质量，不产生新判断只校准:        │
│                                                                            │
│  1) Evidence 反查:                                                         │
│     • 内部 A 类 ref → 反查 ContextAssembler.get_injection_inventory()      │
│       不在 inventory 中 → unknown_refs                                     │
│     • MCP B 类 ref → 校验 mcp_tool 在 config/mcp_tools.yaml 白名单 +      │
│       URL 格式 + quote ↔ claim 语义对齐                                    │
│     • 7 条 anti-paraphrase 启发式（v0.12）: 检测 quote 是否真 verbatim     │
│       (list-structure / 第三人称自指 / section-label / 动词领导抽象 /      │
│       Agent 第一人称 / back-derived / 笼统短句) → hallucinated_quotes     │
│     • MCP-bypass 检测（v0.12）: 0 mcp + 0 attempt_log + Tier-B-only →     │
│       lazy bypass → DOWNGRADE/FAIL                                        │
│     • ★ v0.14 6b Tier-A summary 检查: mcp:* / icm-history 缺 summary →    │
│       weak_mcp_refs + DOWNGRADE（≥3 缺 → FAIL）                          │
│     • ★ v0.14 6c response_draft 加粗检查: 0 bold / 加粗滥用 / 整段加粗 →  │
│       DOWNGRADE                                                            │
│                                                                            │
│  2) TrustReport 输出:                                                      │
│     evidence_strength / external_evidence_count / inferential_count /     │
│     assumption_ratio / unknown_refs / weak_mcp / hallucinated_quotes      │
│                                                                            │
│  3) 强制 verdict (SelfReviewer._derive_verdict — tier-aware):              │
│     • FAIL: hallucinated_quotes / assumption>0.6 / ≥3 issues /            │
│             external==0 AND inferential==0 (v0.12)                         │
│     • DOWNGRADE: 1-2 issues / assumption 0.4-0.6 / Tier-B-only            │
│     • PASS: 否则                                                           │
│                                                                            │
│  前端: Trust 横条 + Summary ✅/⚠️/❌ 状态徽章                              │
│        Classification 卡片可展开查看 issues                                 │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 3: Dedup + 二次证据折叠 ─────────────── LLM (并行 Step 2) ────────┐
│  Layer 2: 对比历史 DCR，检查重复/相似                                      │
│  数据源: IcM get_similar_incidents API + 本地 icm_cache                   │
│  防误关联: 严格匹配核心诉求，同关键词≠同需求                               │
│  输出: duplicate / similar / new + matches[].evidence[]                   │
│                                                                            │
│  ★ TriageEngine._enrich_classification_with_dedup (v0.11+):               │
│    classify+dedup 并行完成后，把 similarity≥0.6 的未引用 match            │
│    折叠成 trailing pattern step (icm-history evidence with PM-resolution │
│    quote), 幂等                                                            │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 4: Stage B — Response Agent ──────────────── LLM Call #2 ─────────┐
│  ★ v0.13 起独立 LLM call (与 Step 5 并行)。输入是 Stage A 的全量            │
│  classification + reasoning_steps + evidence + dedup; 输出仅客户邮件:      │
│                                                                            │
│  Stage B 的 system prompt 强制:                                            │
│  • ★ STAGE A AUTHORITY ★: 不能 re-decide; judgment 必须沿用 Stage A         │
│  • ★ INLINE LINK DISCIPLINE ★: 公开 URL 只能从 Stage A reasoning_steps     │
│      evidence 复用 (Self-Eval 会捕捉 mismatch → HARD FAIL)                  │
│  • 9 judgment 各自的客户面 template (Reject by-design / Accept-Backlog 等)  │
│  • 客户面禁止内部 jargon (DCR / S500 / ADO / IcM ID / 内部 PM 真名)         │
│                                                                            │
│  Stage B Output → response_generator.generate() 返回前后处理:               │
│  • _scrub_internal_leakage: 二次防内部 jargon 泄漏                          │
│  • enrich_with_dedup(draft, dedup, classification, dcr.metadata):         │
│    拆分 customer_text vs PM-Reference appendix                            │
│    → SuggestedResponse:                                                   │
│      draft (纯客户邮件，无 PM-Reference)                                  │
│      external_references: Tier-A (Doc + Past DCR 桶)                      │
│      inferential_references: Tier-B (Product Behavior 桶)                 │
│      no_external_note: 智能措辞 (有 Tier-B 蓝 ℹ / 无 Tier-B 黄 ⚠)       │
│      decision_logic[]: Why 段 (来自 reasoning_steps[].claim)              │
│      implications[]: Implication 段 (路由建议 + 已存在 doc + 缺信息)      │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
┌─ Step 5: Escalation Recommendation ──────────── LLM (并行 Step 4) ──────┐
│  Layer 3: 评估是否需要 PM 判断                                             │
│  输出: auto-close / pm-review / escalate / safety-boundary               │
│  5 维因素打分: impact_scope, cross_team, design_conflict,                 │
│              conflicting_opinions, urgency                                │
│  ★ 每个 factor 带独立 *_evidence: list[EvidenceRef] (v0.11+)              │
│    score ≥ 3 必须有 evidence; score = 5 必须有 catalog-rule + 数据点      │
│                                                                            │
│  极度保守策略 (基于 IcM 官方升级标准):                                      │
│  - ~95% 为 auto-close + pm-review                                         │
│  - escalate ≤5%，需满足: 多PM必须协调 + 无先例设计冲突 + 单PM无法独立决定   │
│  - safety-boundary 仅限: Sev0/1+广泛影响 / 合规 / Declared Outage         │
│                                                                            │
│  MCP guidance: escalate budget ≤3                                          │
└──────────────────────────────────────────────────────────────────────────┘
     │
     ▼
  TriageResult (JSON) → 前端渲染 / CLI 输出
```

---

## 推理透明度架构 (v0.11 → v0.12)

> 核心设计原则: **让每条 Agent 结论都能被 PM 在 10 秒内点开验证。**
> "reasoning 是散文" → "reasoning 是带证据指针的论证链" → "每条引用 PM 都能 Ctrl-F 找到"。

### Evidence 三层框架

```
┌─────────────────────────────────────────────────────────────────────┐
│  Tier A — EXTERNAL (PM 点开链接读原文; 最强证据)                       │
│  ├─ icm-history (portal.microsofticm.com)                          │
│  └─ mcp:ms-learn / mcp:bing / mcp:ado-wiki / mcp:bluebird /        │
│     mcp:icm-incident / mcp:ado-workitem / mcp:ado-comments /       │
│     mcp:sharepoint / mcp:workiq-mail / mcp:workiq-teams /          │
│     mcp:workiq-calendar                                             │
│  ★ quote 必须 character-for-character verbatim from source           │
│  ★ self-eval 7 条 anti-paraphrase 启发式护栏 (命中 → FAIL)           │
│                                                                     │
│  Tier B — INFERENTIAL (Agent 推理产物; 中等; v0.12 新增)              │
│  ├─ product-behavior  (e.g. teams.meeting-notes.permission-model)  │
│  └─ design-pattern    (e.g. m365.file-layer-vs-meeting-layer-perm) │
│  ★ 不要求 URL; quote 字段写产品架构精确陈述 (不是 verbatim doc)        │
│  ★ 只能补充 Tier-A, 不能替代 — Tier-B-only → DOWNGRADE                │
│                                                                     │
│  INTERNAL anchors (Agent 推理脚手架; 不显示给 PM)                      │
│  └─ catalog-rule / catalog-example / pm-pattern / lesson /          │
│     layer3 / safety-rule / dcr-field / doc-retrieved /              │
│     qualification / routing-signal / routing-check /                │
│     agree-stats / recent-judgments                                  │
└─────────────────────────────────────────────────────────────────────┘
```

### EvidenceRef 数据结构

```python
class EvidenceRef(BaseModel):
    source_type: EvidenceSourceType  # 上述 25 种之一
    source_id: str                    # e.g. "mcp:ms-learn:loop-storage"
    quote: str = ""                   # 字面 verbatim from source (Tier-A 必填)
    summary: str = ""                 # 可选 agent 自己话总结 (v0.12 新增)
    url: str = ""                     # PM 可点开的 URL
    mcp_tool: str = ""                # MCP-only: canonical tool name
    retrieved_at: str = ""            # MCP-only: ISO 8601 timestamp
```

**Quote vs Summary (v0.12 关键分离):**
- `quote` = 原文一字不差，PM 必须能 Ctrl-F 在 URL 里找到
- `summary` = Agent 自己话总结，**可选**，UI 紫色独立框 "Agent summary"

### MCP-FIRST HARD CONSTRAINT

LLM 在写 final JSON 之前**必须**至少调 1 次 MCP，搜不到也要在 `trust.unknown_refs`
留 attempt log。跳过 MCP 直接走 Tier-B → Self-Eval 判 DOWNGRADE/FAIL。

```
MCP budget: classify ≤12 / escalate ≤3 / dedup ≤3 (MIN 1, MAX 软上限)
后端 max_items: external ≤30 / inferential ≤15 (硬 safety cap)
```

### SelfReviewer tier-aware verdict 矩阵

| 触发条件 | Verdict |
|---|---|
| hallucinated_quotes 非空 (7 条 anti-paraphrase 任一命中) | **FAIL** |
| assumption_ratio > 0.6 | **FAIL** |
| (unknown_refs + malformed_mcp + weak_mcp) ≥ 3 | **FAIL** |
| external_count == 0 AND inferential_count == 0 (无任何证据) | **FAIL** |
| MCP-bypass (no mcp + no attempt log + tier-B only) | **DOWNGRADE/FAIL** |
| Tier-B only (no external, but MCP attempted) | **DOWNGRADE** |
| 1-2 issues / assumption 0.4-0.6 | **DOWNGRADE** |
| 否则 | **PASS** |

### 前端渲染 (frontend/index.html)

References Card 三段:
1. **💡 Why (Decision Logic)** — 来自 `reasoning_steps[].claim`
2. **📚 Evidence** — 按 PM 友好类别分组:
   - `📄 Doc` (绿色 verbatim blockquote + 可选 Agent summary 紫色框)
   - `🧩 Product Behavior` (蓝色推理 + "agent's grounded reasoning" caveat)
   - `📋 Past DCR` (灰色 IcM 链接)
3. **⚙️ Implication** — 路由建议 / 已存在 doc 提示 / 缺信息提问

**Doc=0 时智能措辞:**
- 有 Tier-B 内容: 蓝色 ℹ "No official doc retrieved. Reasoning is grounded in the Product Behavior evidence below."
- 无 Tier-B 内容: 黄色 ⚠ "No explicit documentation found. PM should verify against own knowledge."

---

## LLM 调用细节

### CLI 引擎 (项目唯一推理路径)

**CLI Mode**: Copilot CLI / Claude Code (深度推理，每步 20-120s)

**V0.16 起：纯 CLI 架构** —— 所有步骤都走 ``copilot --allow-all --model claude-opus-4.7``（或 ``claude --print``）。
- **核心推理**（classify/dedup/escalate/self_eval）→ CLI LLM
  - LLM 在子进程内访问自己的 MCP 工具集（Microsoft Learn / ADO / Bluebird / IcM / WorkIQ / Bing 等）
  - 直接把 mcp:* 引用写进 reasoning_steps[].evidence[] 的 EvidenceRef
- **预检**（qualification/doc-answerable/lesson 提取）→ 同 CLI LLM（不再走 Azure OpenAI）
- **规则**（safety/routing）→ 无 LLM
- **Classifier retry**（JSON 解析失败 / 空 judgment）→ 同 CLI LLM，用 focused ~5KB prompt，独立子进程重采样

V0.16 起 `src/shared/llm_config.py` / `copilot_llm.py` / `github_llm.py` 已从代码库删除，`requirements.txt` 不再依赖 `openai`；v0.17.2 (2026-05-26) 进一步把 `azure-identity` 也移除（IcM 走 `oauth_token_resolver` 复用 copilot/claude CLI 的 OAuth token，已经不需要 AAD SDK）。下文 "Azure OpenAI 集成" 一节仅作为 V0.13–V0.15 的历史记录保留，整个项目零 Azure OpenAI / Azure Identity 依赖。

注意：v0.12 起项目已无 "LLM Mode" — 之前的 `POST /api/triage` 接口仅做兼容，所有路径
都收敛到 `POST /api/triage/cli`。

### ~~Azure OpenAI 集成~~ (历史，V0.16 已移除；v0.17.2 已彻底清除)

下面这段保留作为 V0.13–V0.15 的历史记录。V0.16 起 web server 不再创建 Azure 客户端；v0.17.2 起 `src/shared/llm_config.py` 等老 LLM 适配层全部被删除，源码树里再无任何 Azure OpenAI 代码；同期 `azure-identity` 也从依赖中移出（IcM 改走 OAuth-from-CLI，见下文「IcM MCP 认证」节）。

```python
from openai import AsyncAzureOpenAI

client = AsyncAzureOpenAI(
     azure_endpoint="https://liuhanlin-openai-01.openai.azure.com/",  # Azure OpenAI endpoint
    azure_ad_token=token,                                          # Azure AD auth
)

response = await client.chat.completions.create(
    model="gpt-5.5",                    # deployment name from settings.yaml
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
    response_format={"type": "json_object"},
    temperature=0.1,
    max_tokens=2000,
)
```

**~~认证链~~ (历史)**:
```
Azure AD (DefaultAzureCredential) → AzureCliCredential (az login) → 失败
Tenant: settings.yaml → azure_openai.tenant_id
```

**每次 triage 的 LLM 调用次数 (V0.16)**: 7-8 次（全部走 CLI）
- Step 0 / 0c: safety + ownership routing — 规则，0 次 LLM
- Step 0b: qualification (CLI, ~3-30s) — 规则未命中时才调 LLM
- Step 1:  doc-answerable + knowledge retrieval (CLI, ~20-60s)
- Step 2:  **Stage A — classify (CLI, 60-180s)** — 只产 classification JSON
- Step 3:  dedup (CLI, 30-90s, 与 classify 并行)
- Step 2b: self-eval (CLI, 20-60s)
- Step 4:  **Stage B — response (CLI, 20-60s, 与 escalation 并行)** — 只产客户邮件
- Step 5:  escalation (CLI, 20-60s, 与 Stage B 并行)

v0.13 起 single-call 模式已废弃 — 实测 26K tokens 合并 prompt 让 LLM
attention 撕裂 (10/12 DCR 返回空 classification 但有完整 response_draft)。
拆 2-stage 后每次 LLM call 焦点单一，attention 不再竞争。详见 AGENTS.md §3.8.1。

**Fallback 链** (V0.16): CLI 调用失败 → ``_try_loads_repaired`` 修烂 JSON → focused-prompt retry (同 CLI) → ``_safe_build_classification`` L4 兜底（``need_info_reason`` 标记 fallback 路径） → 永远返回一个 Classification，永不抛错

### CLI LLM 集成 (`src/shared/cli_llm.py`)

```python
# CLI Mode: 每次 LLM 调用 spawn 一个 CLI 子进程
# 支持 Copilot CLI 和 Claude Code
_run_cli_sync(prompt)  # pwsh → copilot/claude → 读取输出 → 清洗 ANSI/工具行
```

**调用流程**: prompt.txt → pwsh subprocess → answer.txt → 清洗 → JSON 提取

---

## IcM MCP 打通细节

**完整的 sync / auth 链路、自愈机制、和 v0.16 → v0.17 OAuth-from-CLI 迁移的详细说明**：见 **[`docs/icm-sync-mechanism.md`](icm-sync-mechanism.md)** + 源码 `src/tools/oauth_token_resolver.py` 顶部 docstring。

### 协议: Streamable HTTP MCP (JSON-RPC 2.0 over SSE)

```python
# src/tools/icm_client.py

# 1. 获取 IcM MCP OAuth token (v0.17+: 复用本机 copilot/claude CLI 的 OAuth 缓存)
from src.tools.oauth_token_resolver import resolve_icm_token
token = resolve_icm_token()   # 读 ~/.copilot/mcp-oauth-config/<hash>.tokens.json
                              # 或 ~/.claude/.credentials.json；过期自动 refresh

# 2. MCP 握手 (initialize)
POST https://icm-mcp-prod.azure-api.net/v1/
Headers: { Authorization: Bearer <token>, Content-Type: application/json }
Body: {
    "jsonrpc": "2.0", "id": 1,
    "method": "initialize",
    "params": {
        "protocolVersion": "2024-11-05",
        "clientInfo": {"name": "dcr-triage-agent", "version": "0.4.0"}
    }
}
Response: SSE format → "event: message\ndata: {result: {protocolVersion: ...}}\n"

# 3. 发送 initialized 通知
POST  Body: {"jsonrpc": "2.0", "method": "notifications/initialized"}

# 4. 调用工具
POST  Body: {
    "jsonrpc": "2.0", "id": 2,
    "method": "tools/call",
    "params": {
        "name": "get_incident_details_by_id",
        "arguments": {"incidentId": 740941410}
    }
}
Response: SSE → parse "data:" line → JSON → extract content[0].text → parse as JSON

# 5. 401/audience 漂移自动重试一次 force_refresh；refresh_token 失效
#    则提示用户跑 `python -m scripts.auto_setup --repair-icm-mcp`
```

**认证（v0.17+）**: 见 [`docs/icm-sync-mechanism.md`](icm-sync-mechanism.md) + `src/tools/oauth_token_resolver.py` 模块 docstring。简言之，复用 `copilot` / `claude` CLI 已经协商好的 IcM MCP OAuth session（access + refresh token），过期前 5 分钟自动 `refresh_token` grant 续票。**不再依赖 `az login` 或 `DefaultAzureCredential`**。

### 使用的 IcM MCP Tools

| Tool | 用途 | Pipeline 步骤 |
|------|------|---------------|
| `get_incident_details_by_id` | 拉取事件详情 | Intake |
| `get_ai_summary` | AI 生成的事件摘要 | Intake (enrichment) |
| `get_similar_incidents` | 找相似事件 | Step 3 (Dedup) |
| `get_impacted_s500_customers` | S500 客户影响 | Step 0 (Safety) |
| `get_impacted_ace_customers` | ACE 客户影响 | Step 0 (Safety) |
| `search_incidents_by_owning_team_id` | 团队事件搜索 | 数据缓存 |

---

## 项目文件结构

```
dcr-triage-agent/
├── config/
│   ├── categories.yaml          # 安全边界规则定义
│   ├── prompts.yaml             # LLM prompt 模板 (含 ★ QUOTE vs SUMMARY ★ +
│   │                              Three-Tier Evidence Framework + MCP-FIRST
│   │                              HARD CONSTRAINT + 7 anti-paraphrase heuristics)
│   ├── mcp_tools.yaml           # ★ MCP 工具白名单 (B 类 source_type ↔ tool name)
│   ├── settings.yaml            # 运行时配置 — git ignored
│   ├── settings.example.yaml    # 配置模板
│   └── context/                 # ★ 上下文工程 (渐进式披露)
│       ├── layer0_identity.yaml #   角色身份 + 判断优先级
│       ├── layer1_rules.yaml    #   Change Type 枚举
│       ├── judgment_catalog.yaml# ★ SSOT: 9 judgment + decision_tree +
│       │                          risk_tier + reject_evidence_bar +
│       │                          rate_caps + seed_examples (含 [ref:] 标签)
│       ├── layer3_special.yaml  #   安全/合规/跨团队深层上下文
│       ├── mcp_guidance.yaml    # ★ MCP 调用指南 (classify ≤12 / esc ≤3 /
│       │                          dedup ≤3, MIN 1, MCP-FIRST hard constraint)
│       └── pm_profiles/         #   PM 决策模式 (每人独立文件)
│           ├── tristan_xia.yaml #   Forms/Polls/Green Room/Notes App
│           ├── melissa_ma.yaml  #   Recording/Transcription/UFD
│           ├── harin_lee.yaml   #   Devices/Audio
│           ├── anqi_chen.yaml   #   Video/Camera/Background
│           ├── weizhong_xue.yaml#   Recap/AI Notes/Loop/Meeting Notes/Copilot
│           ├── fei_zuo.yaml     #   Transcript/Captions/CART
│           └── routing_signals.yaml # 跨团队路由关键词
│
├── src/
│   ├── agents/                  # 6 个 Agent (每个负责一个 pipeline 步骤)
│   │   ├── classifier.py        #   Step 2: DCR 多维分类
│   │   │                            + _parse_evidence_list (支持 product-behavior /
│   │   │                              design-pattern + summary 字段)
│   │   │                            + _parse_reasoning_steps / _parse_trust_report
│   │   ├── deduplicator.py      #   Step 3: 重复/相似检测 + matches.evidence 解析
│   │   ├── response_generator.py#   Step 4: References Card 数据构造
│   │   │                            (_collect_external_evidence / _collect_inferential_evidence
│   │   │                             / _derive_decision_logic / _derive_metadata_implications)
│   │   ├── escalation_advisor.py#   Step 5: 5 factor 各带 _evidence 解析
│   │   ├── qualification_checker.py # Step 0b
│   │   └── ownership_router.py  #   Step 0c
│   │
│   ├── context/                 # ★ 上下文工程模块
│   │   └── assembler.py         #   ContextAssembler: 渐进式上下文
│   │                                + get_injection_inventory()
│   │                                + assemble_mcp_guidance(step) (CLI-only)
│   │
│   ├── feedback/                # ★ 反馈循环模块
│   │   ├── observability.py     #   LLM 调用可观测性 + _derive_evidence_metrics
│   │   │                            (external/inferential/mcp counts)
│   │   ├── self_reviewer.py     #   Actor-Critic + tier-aware _derive_verdict
│   │   │                            + _looks_like_mcp_attempt_log (MCP-bypass)
│   │   ├── feedback_store.py    #   PM 反馈记录
│   │   ├── catalog_loader.py    #   judgment_catalog 渲染 + valid_ref_ids()
│   │   └── lessons.py           #   学习的 lessons 加载 (含 [ref:] 标签)
│   │
│   ├── orchestration/           # 编排层
│   │   ├── triage_engine.py     # ★ 7 步 pipeline + _enrich_classification_with_dedup
│   │   └── intake.py            #   IcM → NormalizedDCR 转换
│   │
│   ├── shared/                  # 共享组件
│   │   ├── models.py            # ★ EvidenceRef (含 quote + summary) / ReasoningStep /
│   │   │                            TrustReport / SuggestedResponse / Classification
│   │   │                            (含 5 个 @field_validator(mode="before") 把 LLM
│   │   │                             怪输出归一化到 schema — v0.14 L3) /
│   │   │                            COERCION_EVENTS 全局计数器
│   │   ├── cli_llm.py           # ★ Copilot CLI / Claude Code 客户端（唯一 LLM 客户端）
│   │   ├── llm_logging.py       #   LoggingLLMClient 包装 + 后端识别（CLI Copilot / CLI Claude）
│   │   └── logging_setup.py     #   统一日志格式 + 噪声压制
│   │
│   └── tools/                   # 外部工具集成
│       ├── icm_client.py        # ★ IcM MCP HTTP 客户端 (SSE + JSON-RPC)
│       ├── cached_icm.py        #   IcM 本地 JSON 缓存
│       ├── knowledge_retriever.py # ★ 文档检索 — v0.14 加 _search_via_cli_mcp()
│       │                            后端主动调 web-search MCP 注入 mcp:ms-learn /
│       │                            mcp:bing 证据；load_documents_from_dir() 兜底本地
│       ├── mcp_discovery.py     # ★ v0.14 新增 — 启动跑 `<cli> mcp list` 解析 MCP
│       │                            inventory；render_for_prompt() 把真实工具名注入
│       │                            assembler.assemble_mcp_guidance() 的开头
│       └── notification_service.py # PM 通知服务
│
├── data/
│   ├── icm_cache/               # IcM 数据缓存 + 客户信息缓存
│   ├── triage_cache/            # Triage 结果缓存 (*_cli.json)
│   ├── observability/           # ★ LLM 调用日志 (llm_calls.jsonl, 含 evidence_quality)
│   ├── feedback/                # ★ PM 反馈日志 (overrides.jsonl)
│   └── learning/                # ★ 自学习 lessons (learned_lessons.jsonl)
│
├── frontend/
│   └── index.html               # ★ Web UI (单文件, 暗色主题)
│                                   含 References Card (Why/Evidence/Implication)
│                                   + Tier-A 绿色 verbatim + Tier-B 蓝色 inferential
│                                   + Agent summary 紫色独立框
│
├── scripts/
│   └── server.py                # FastAPI Web 服务 (前端 API)
│
├── tests/
│   ├── unit/
│   │   ├── test_evidence_parsing.py  # 30+ tests (含 summary 字段 4 个新测试)
│   │   └── test_backward_compat.py   # 老 JSON 形状兼容性
│   └── integration/
│       └── test_self_eval_evidence.py # 19+ tests (含 tier-aware verdict)
│
├── AGENTS.md                    # ★ 员工手册 (项目唯一入口文档, v0.12)
├── README.md                    # 用户面 Quick Start
├── start.ps1 / start.bat        # 一键启动脚本
└── requirements.txt             # Python 依赖
```

---

## 数据流 (一次完整 triage)

```
1. 用户输入 IcM ID: 785438586
         │
2. IcmIntake.fetch_and_normalize()
         ├─ IcmMcpClient.get_incident(785438586)        ← HTTP POST to IcM MCP
         ├─ IcmMcpClient.get_ai_summary("785438586")    ← 获取 AI 摘要
         ├─ IcmMcpClient.extract_impact_signals()       ← S500/ACE/合规检测
         └─→ NormalizedDCR (Pydantic model)
                  │
3. TriageEngine.triage(dcr)
         ├─ _check_safety_boundaries()                  ← 纯规则, 无 LLM
         ├─ QualificationChecker.check()                ← CLI (V0.16+, was Azure)
         ├─ OwnershipRouter.suggest_owner()             ← 关键词匹配
         ├─ KnowledgeRetriever.check_doc_answerable()   ← Azure OpenAI fast path
         ├─ ContextAssembler.assemble_for_classify()    ← ★ 渐进式上下文 + injection_inventory
         │     ├─ Layer 0 + 1 + judgment_catalog (含 [ref:] 标签)
         │     ├─ Layer 2 (PM profile + routing_signals) + Layer 3 (条件)
         │     ├─ MCP guidance 末尾追加 (classify ≤12, MCP-FIRST hard constraint)
         │     └─ _last_injection_inventory: set[str]  (Self-Eval 用)
         ├─ Classifier.classify() + Deduplicator.check (★ 并行)
         │     ├─ LLM 在 CLI 子进程内调 mcp:* 工具 → 写入 reasoning_steps[].evidence[]
         │     ├─ ★ v0.13: Stage A 只输出 classification JSON (不写 response_draft)
         │     └─ ObservabilityLogger.log() ← 含 evidence_quality 指标
         ├─ _enrich_classification_with_dedup()         ← ★ 第二pass: 折叠未引用的 dedup
         │     └─ similarity≥0.6 的 dedup match → trailing pattern step (icm-history)
         ├─ SelfReviewer.evaluate()                     ← ★ Critic LLM strict mode
         │     ├─ 接 injection_inventory + mcp_tool_whitelist
         │     ├─ Critic 输出 trust 块 (unknown/malformed/weak/hallucinated)
         │     ├─ _derive_verdict tier-aware:
         │     │     hallucinated_quotes / assumption>0.6 / ≥3 issues   → FAIL
         │     │     external==0 AND inferential==0                     → FAIL
         │     │     MCP-bypass (no mcp + no attempt log + tier-B only) → DOWNGRADE/FAIL
         │     │     Tier-B only (no external)                          → DOWNGRADE
         │     └─ trust 写回 classification.trust
         ├─ ResponseGenerator.generate() (Stage B)      ← ★ v0.13: 独立 LLM call
         │     ├─ Input: classification + reasoning_steps + evidence + dedup
         │     ├─ Stage B prompt 强制: STAGE A AUTHORITY + INLINE LINK DISCIPLINE
         │     └─ Output: customer-facing draft → enrich_with_dedup 重组成
         │                SuggestedResponse{draft / external_references /
         │                inferential_references / no_external_note /
         │                decision_logic / implications}
         └─ EscalationAdvisor.evaluate()                ← ★ LLM call (与 Stage B 并行)
                  │
4. TriageResult → JSON → 前端渲染
         └─ References Card 三段 (Why / Evidence / Implication)
            含 Tier-A 绿色 verbatim + Tier-B 蓝色推理 + Agent summary 紫色框
```

---

## 2-stage Agent Chain 设计 (v0.13)

> 完整说明见 `AGENTS.md` §3.8.1。这里只列要点。

**问题：** v0.11/12 用 single-call 模式 — classifier 一次 LLM call 同时产
classification + response_draft。实测 10/12 DCR 失败：26K tokens 的合并 prompt
让 LLM 注意力撕裂（同时干 JSON 输出 + 自由格式邮件），常出现空 classification
但有完整 response_draft 的 case。

**解法：** 拆 2 stage，每个 LLM call 焦点单一：

| Stage | Input | Output | Prompt size |
|---|---|---|---|
| **A — Triage** (classifier.classify) | catalog + classify.user + supplement | 只产 classification JSON (~2K tokens) | ~23K tokens IN |
| **B — Response** (response_generator.generate) | suggest_response.system + Stage A 的 classification | 只产客户邮件 (~2K tokens) | **~3K tokens IN** |

**一致性靠现有 Self-Eval 保障**（不引入新代码层）：
- Stage B prompt 顶部 **★ STAGE A AUTHORITY ★**：不能 re-decide，judgment 必须沿用
- Stage B prompt 顶部 **★ INLINE LINK DISCIPLINE ★**：客户邮件 URL 只能从 Stage A evidence 复用
- Self-Eval inline-link mirroring (v0.12 已有)：捕捉 Stage B 用 Stage A 没引用的 URL → HARD FAIL
- Self-Eval judgment ↔ reasoning_steps 一致性 (v0.11 已有)：捕捉 judgment / reasoning 不一致

**实测改进** (重跑同一批 DCR)：
- 空 classification 失败率: 83% → 应趋近 0
- 总 latency: 170-330s (不稳) → ~130s (Stage A ~100s + Stage B ~30s)
- doc 数量稳定性: 同 DCR 重跑不再飘忽

---

## 运行方式

```bash
# Web 模式（唯一支持的入口，v0.17.2 起）
python -m scripts.server
# 打开 http://127.0.0.1:8080

# Windows：直接双击 start.bat（会先跑 auto_setup 再 launch server）
# PowerShell：./start.ps1
```

> v0.17.2 移除了 `scripts/run_triage.py` / `scripts/run_eval.py` / `scripts/run_cli_triage_to_md.py` 等 dev runner（它们当时是 Azure OpenAI 时代的本地 fallback）。所有 triage 都通过 Web UI 或者 `POST /api/triage/cli` 触发。批量 / 自动化场景可以脚本化 curl 这个 endpoint。

## 缓存架构

```
triage 结果持久化 (三层):
  1. 浏览器 localStorage (dcr_triage_cache)
     - 页面刷新不丢失
     - 结果以 {id}_cli 为 key

  2. 服务端磁盘 (data/triage_cache/)
     - {id}_cli.json — triage 结果

  3. 内存 triageCache 对象 (运行时)
     - 优先从 localStorage 恢复
     - 缺失时从服务端 fetch 补充

客户信息缓存:
  - 前端: _customerCache 内存缓存 (切换 incident 秒出)
  - 服务端: data/icm_cache/customer_{id}.json (磁盘缓存，重启不重新调 API)
  - 首次: IcM MCP API 实时获取 → 写入磁盘缓存

加载优先级: localStorage → 服务端 fetch → IcM API
展示路径: 项目 v0.12 起已为 CLI-only，单一 `_cli.json` cache (无 _eval 形态)
```

---

## Harness Engineering (v0.9 引入 / v0.12 升级)

### 支柱 1: 上下文工程 — 渐进式披露

```
┌─ Layer 0 (始终注入) ─── 角色身份 + 判断优先级 ─── layer0_identity.yaml ─┐
├─ Layer 1 (始终注入) ─── Change Type 枚举 ────── layer1_rules.yaml ────┤
├─ Layer 1.5 SSOT ──── 9 judgment + decision_tree + risk_tier +        │
│   reject_evidence_bar + rate_caps + seed_examples (含 [ref:] 标签)   │
│   ─── judgment_catalog.yaml                                          ┤
├─ Layer 2 (按需注入) ─── 匹配的 PM profile ────── pm_profiles/*.yaml ───┤
│   只注入跟当前 DCR 关键词匹配的 1-2 个 PM 模式，不注入全部 6 个          │
├─ Layer 3 (条件触发) ─── 安全/合规上下文 ──────── layer3_special.yaml ───┤
│   仅在 safety boundary 触发时注入                                       │
├─ MCP Guidance (无条件追加, CLI-only) ──── mcp_guidance.yaml ───────────┤
│   classify ≤12 / escalate ≤3 / dedup ≤3, MCP-FIRST hard constraint   │
└─ Short-term Memory ─── 最近 10 条 judgment 分布，discuss rate 异常告警 ─┘
```

**判断优先级**: PRIMARY=独立分析 DCR 内容 > SECONDARY=PM 历史模式(仅参考语气)

### 支柱 2: Evidence-Driven 推理 (v0.11 引入 / v0.12 升级)

详见上文 "推理透明度架构" 段。三层证据 (Tier A/B/C) + Quote/Summary 分离 +
MCP-FIRST HARD CONSTRAINT + 7 条 anti-paraphrase 启发式。

### 支柱 3: 反馈循环

| 组件 | 文件 | 职责 |
|------|------|------|
| ObservabilityLogger | `src/feedback/observability.py` | 记录每个 LLM 调用元数据 + `_derive_evidence_metrics` (external/inferential/mcp counts) |
| SelfReviewer | `src/feedback/self_reviewer.py` | Actor-Critic Self-Eval + tier-aware `_derive_verdict` + MCP-bypass 检测 |
| FeedbackStore | `src/feedback/feedback_store.py` | PM 反馈 (Agree/Override) 记录 |
| CatalogLoader | `src/feedback/catalog_loader.py` | 渲染 judgment_catalog 为 prompt 文本 + 暴露 valid_ref_ids() |
| LessonStore | `src/feedback/lessons.py` | 学习的 lessons (含 [ref:] 标签) |

**Self-Eval 流程**:
```
CLI Triage → result_v1 → Self-Eval (Critic LLM, strict mode)
                          ├─ 证据反查 (injection_inventory + mcp_tool_whitelist)
                          ├─ 7 条 anti-paraphrase 启发式 → hallucinated_quotes
                          ├─ MCP-bypass 检测 → DOWNGRADE/FAIL
                          ├─ tier-aware verdict 强制覆盖 Critic 软判决:
                          │     ✅ PASS → 确认
                          │     ⚠️ DOWNGRADE → 降级置信度
                          │     ❌ FAIL → 标记 PM 需二审 + issues 说明
                          └─ trust 写回 classification.trust
```

### API 端点

| 端点 | 方法 | 用途 |
|------|------|------|
| `/api/triage/cli` | POST | CLI 主路径 triage |
| `/api/customer-info/{id}` | GET | 客户信息 (IcM API + 缓存) |
| `/api/incidents` | GET | IcM 事件列表 |
| `/api/settings/team-mapping` | GET | 团队-PM 映射 |
| `/api/lessons` | GET | 学习的 lessons 列表 |
| `/api/analytics` | GET | LLM 可观测性统计 (含 evidence_quality) |
| `/api/feedback` | POST | PM 反馈记录 |
| `/api/feedback/stats` | GET | 反馈统计 |

## 依赖

```
pydantic >= 2.5.0    # 数据模型
pyyaml               # 配置加载
rich                 # CLI 格式化输出
httpx                # IcM MCP HTTP 调用 + OAuth refresh
fastapi + uvicorn    # Web API 服务
numpy                # 简单数值/相似度计算
watchfiles           # uvicorn --reload
```

> v0.17.2 起 **不再依赖 `openai` / `azure-identity`**。所有 LLM 调用走本机 `copilot` / `claude` CLI 子进程；所有 IcM 认证走 `oauth_token_resolver`（复用 CLI 自己的 OAuth 缓存）。详见 [`docs/icm-sync-mechanism.md`](icm-sync-mechanism.md) 和 `src/tools/oauth_token_resolver.py` 模块 docstring。
