[New · Release Gate] 2026-05-16 / 04:43 · OPERATOR: anonymous

新增上版閘門

Multi-bot evaluation dossier. Each gate compares the production canary against the production baseline across every bot flagged in_release_eval=true.

Identify

[Gate name]

Visible in the run index and verdict log. Pick a name your future self will recognise.

Mode

[Dispatch mode]

Configure

[Profile · Fleet]

依 in_release_eval 旗標 fan out 到所有 production bot × {baseline, canary},profile 決定每個 kind 抽幾個 case。

[Fleet · 0 dispatchable bots] has eval suite AND in_release_eval=true — production bots default-on
[+ Auto-prep · 327 bots] IN 但缺 suite — 送出後背景補上 cases ~$73.58 · 545 min · 981 generations

Compare

[Comparison · fixed]
Baseline · 穩定版
production-baseline
Candidate · 待測版
production-canary
[Verdict rule]

Candidate fleet pass rate 須達 ≥85% 才會 suggest SHIP — operator 可隨時覆寫。 Target pair 在所有 gate 模式下都固定 — fleet gates 永遠是 production canary 對比 production baseline。

Dispatch

[Submit]

送出後系統會立即建立兩個 EvalRun(每個 target 一個)並開始評測。