三份 Review,零等待
想像這個場景。早上九點你開了一支 500 行的 PR。Tag 三個同事來 review。午餐前,沒人看過。下午三點,一個人留了一條關於 typo 的 comment。負責安全的工程師整天在開會。效能專家在另一個 sprint 裡。你的 PR 卡在那裡,擋住後面兩條 branch。
現在想像另一個版本。同一支 PR,你同時啟動三個 AI agent -- 一個掃安全漏洞、一個抓效能回退、一個查 code style。四分鐘後,你手上有一份按優先順序排好的合併報告。咖啡還沒涼。
這篇文章就是在建這個東西。三個專門化的 agent、三個不同的模型、一份合併的 review。以下是一支真實的 487 行認證重構 PR 的結果:
| Agent | 視角 | 模型 | 耗時 | 發現 |
|---|---|---|---|---|
| Agent 1 | 安全稽核 | Opus 4.6 | 3 分 42 秒 | 2 個嚴重、1 個警告 |
| Agent 2 | 效能審查 | Sonnet 4.6 | 1 分 18 秒 | 3 個優化建議、1 個回退 |
| Agent 3 | 可維護性檢查 | Haiku 4.6 | 0 分 34 秒 | 5 個風格問題、2 個命名建議 |
三個 agent、三個視角、不到四分鐘。成本:總共 $0.31。
為什麼三個視角贏過一個深度 Review
想像驗屋。一個人同時查水管、電路、結構,一定會在某一項上草草帶過。水電師傅看得到電工走過去沒注意的管路問題。結構技師看得到兩個人都看不見的基礎裂縫。專精打贏多工。
人類 code review 團隊天生就是這樣分工的。資深工程師看架構。有安全意識的開發者查 injection 向量。Tech lead 看命名和測試覆蓋率。沒有一個人能抓到所有東西。
單一 AI agent 的 review 有同樣的盲點。就算你寫一個很長的 prompt 說「查安全、效能、風格」,模型分配注意力的方式不均勻。它會被程式碼最明顯觸發的問題吸過去。看到 SQL query 就專注安全,看到 React component 就專注 style。模型不會系統性地以相同嚴謹度掃過三個面向。
三個專門化的 agent 解決這個問題。每個都有窄化的任務、量身打造的 CLAUDE.md、以及針對任務選的模型。安全 agent 拿最強的模型,因為漏掉漏洞的代價最高。可維護性 agent 用最便宜的模型,因為 style 問題是低風險的 pattern matching。
環境設定:Review 工作區
你需要 PR 的 diff 在本機可以存取。這些 agent 不需要跑在不同的 worktree -- 它們讀的是同一份 diff,不是在寫程式碼。但它們需要同時跑,所以你需要三個終端 session。
# 拉取 PR branch 並產生 diff
git fetch origin pull/142/head:pr-142
git diff main...pr-142 > /tmp/pr-142.diff
# 另外 checkout branch 讓 agent 可以讀完整的檔案上下文
git worktree add /tmp/pr-142-review pr-142
現在 diff 檔案在 /tmp/pr-142.diff,完整的 branch 在 /tmp/pr-142-review。每個 agent 都會讀這兩者。
Agent 1:安全稽核(Opus 4.6)
安全審查是手術等級的工作。漏掉一個 SQL injection 或 authentication bypass 是有真實後果的 -- 資料外洩、訴訟、頭版新聞。這個 agent 用 Opus 4.6,最強的模型,因為 false negative 的代價遠超過模型的成本。就像銀行金庫一定請最貴的鎖匠來驗收。那個地方你不會省。
在 review-prompts/security.md 建立安全審查 prompt:
# Security Audit Review
You are a senior application security engineer reviewing a pull request.
Your job is to find vulnerabilities, not style issues.
## Scope
Read the full diff and every modified file in its entirety.
Focus exclusively on:
- **Injection vectors**: SQL injection, XSS, command injection, template injection
- **Authentication/authorization flaws**: missing auth checks, privilege escalation,
token handling errors
- **Data exposure**: sensitive data in logs, error messages, or API responses
- **Cryptographic issues**: weak hashing, hardcoded secrets, insecure randomness
- **Dependency risks**: new dependencies with known CVEs, outdated packages
- **Race conditions**: TOCTOU bugs, concurrent access without locking
## Output Format
For each finding, produce exactly this structure:
### [CRITICAL | WARNING | INFO] — One-line summary
**File**: `path/to/file.ts` lines X-Y
**Category**: (one of the categories above)
**Description**: What the vulnerability is, in 2-3 sentences.
**Exploit scenario**: How an attacker would exploit this, step by step.
**Fix**: The specific code change needed, as a diff or code block.
## Rules
- If you find zero security issues, say so explicitly. Do not fabricate findings.
- Do not comment on code style, naming, or performance. Those are other agents' jobs.
- Classify severity honestly. CRITICAL means exploitable in production.
WARNING means exploitable under specific conditions. INFO means defense-in-depth.
- Read the FULL file context for each changed file, not just the diff lines.
Vulnerabilities often arise from interactions between new and existing code.
執行安全 agent:
cd /tmp/pr-142-review && claude --model opus \
--system-prompt "$(cat ~/review-prompts/security.md)" \
--print \
"Review this PR diff for security vulnerabilities. The diff is at /tmp/pr-142.diff. \
Read each modified file in full to understand context." \
> /tmp/pr-142-security-review.md
--print flag 讓 Claude Code 以非互動模式跑:產出結果就結束,不會來回對話。完整 review 寫到檔案裡。
安全審查範例輸出
### CRITICAL — Unsanitized user input in database query
**File**: `src/api/users/search.ts` lines 23-31
**Category**: SQL injection
**Description**: The `searchTerm` parameter from the request query string
is interpolated directly into a SQL WHERE clause using template literals.
No parameterized query or input sanitization is applied.
**Exploit scenario**:
1. Attacker sends GET /api/users/search?q=' OR '1'='1
2. Query becomes: SELECT * FROM users WHERE name LIKE '%' OR '1'='1%'
3. Full users table is returned, including email addresses and hashed passwords
**Fix**:
```diff
- const results = await db.query(
- `SELECT * FROM users WHERE name LIKE '%${searchTerm}%'`
- );
+ const results = await db.query(
+ `SELECT * FROM users WHERE name LIKE $1`,
+ [`%${searchTerm}%`]
+ );
WARNING — JWT secret loaded from environment without fallback validation
File: src/lib/auth.ts lines 8-12
Category: Authentication flaw
Description: process.env.JWT_SECRET is used directly without checking
whether the value exists or meets minimum length requirements. In development
environments where .env files may be missing, this falls back to undefined,
causing jwt.sign() to produce tokens signed with an empty string.
Exploit scenario:
- Developer deploys to staging without setting JWT_SECRET
- All tokens are signed with empty string
- Attacker crafts valid tokens for any user ID Fix:
const JWT_SECRET = process.env.JWT_SECRET;
if (!JWT_SECRET || JWT_SECRET.length < 32) {
throw new Error('JWT_SECRET must be set and at least 32 characters');
}
## Agent 2:效能審查(Sonnet 4.6)
效能審查是偵探工作。你在找藏在無害迴圈裡的 N+1 query、偽裝成順序 await 的 `Promise.all` 機會、直到 production 衝到十萬筆才會咬人的缺少 index。Sonnet 4.6 是對的偵探 -- 有足夠的推理能力做演算法分析,不需要 Opus 的價格。就像查水電請熟練的驗屋師就好,不需要請鑑識科學家。
在 `review-prompts/performance.md` 建立效能審查 prompt:
```markdown
# Performance Review
You are a senior backend/frontend performance engineer reviewing a pull request.
Your job is to find performance regressions and optimization opportunities.
## Scope
Read the full diff and every modified file in its entirety.
Focus exclusively on:
- **Database queries**: N+1 queries, missing indexes, full table scans,
unoptimized JOINs
- **Algorithmic complexity**: O(n^2) or worse loops, unnecessary iterations,
data structure misuse
- **Frontend rendering**: unnecessary re-renders, missing memoization,
large bundle imports, layout thrashing
- **Network**: redundant API calls, missing caching, oversized payloads,
no pagination
- **Memory**: unbounded collections, leaked event listeners, large object
retention in closures
- **Concurrency**: blocking operations on main thread, missing async/await,
sequential operations that could be parallel
## Output Format
For each finding, produce exactly this structure:
### [REGRESSION | OPTIMIZATION | INFO] — One-line summary
**File**: `path/to/file.ts` lines X-Y
**Category**: (one of the categories above)
**Impact**: Estimated performance impact (e.g., "Adds ~200ms per request
at 10k rows")
**Description**: What the performance issue is, in 2-3 sentences.
**Fix**: The specific code change needed.
## Rules
- REGRESSION means the PR introduces worse performance than the current code.
- OPTIMIZATION means the PR has a chance to improve performance beyond current.
- INFO means a minor opportunity that is not urgent.
- Do not comment on security or code style. Those are other agents' jobs.
- Quantify impact where possible. "Slow" is not useful. "O(n^2) where n is
the user count, ~50ms at 1k users, ~5s at 10k users" is useful.
執行效能 agent:
cd /tmp/pr-142-review && claude --model sonnet \
--system-prompt "$(cat ~/review-prompts/performance.md)" \
--print \
"Review this PR diff for performance issues. The diff is at /tmp/pr-142.diff. \
Read each modified file in full to understand context." \
> /tmp/pr-142-performance-review.md
效能審查範例輸出
### REGRESSION — Sequential database calls that should be parallel
**File**: `src/api/users/[id]/route.ts` lines 15-22
**Category**: Database queries
**Impact**: Adds ~120ms per request (two sequential queries at ~60ms each
that have no dependency on each other)
**Description**: The handler fetches user profile and user permissions in
two sequential `await` calls. These queries are independent — the permissions
query does not use any data from the profile query.
**Fix**:
```typescript
// Before: sequential
const profile = await getProfile(userId);
const permissions = await getPermissions(userId);
// After: parallel
const [profile, permissions] = await Promise.all([
getProfile(userId),
getPermissions(userId),
]);
OPTIMIZATION — Missing database index on new query pattern
File: src/api/users/search.ts lines 23-31
Category: Database queries
Impact: Full table scan on users.name column. At 100k rows,
query time increases from ~5ms (indexed) to ~800ms (sequential scan).
Description: The new search endpoint queries WHERE name LIKE $1
but no index exists on the name column. PostgreSQL will use a
sequential scan for prefix-match LIKE queries without a btree or
trigram index.
Fix: Add a migration:
CREATE INDEX idx_users_name_trgm ON users
USING gin (name gin_trgm_ops);
## Agent 3:可維護性檢查(Haiku 4.6)
可維護性 review 就像 code review 裡的文法檢查。變數名稱有沒有描述性?函式超過 40 行了嗎?是不是少了型別標註?這些都是 pattern matching 的工作 -- 抓慣例偏差,不是推理複雜交互。Haiku 4.6 處理這些就像拼字檢查器處理 typo:快、便宜、幾乎完美。500 行 diff 不到 40 秒,每次 review 成本低於 $0.01。對於有用但低風險的回饋,沒理由付更貴的模型。
在 `review-prompts/maintainability.md` 建立可維護性審查 prompt:
```markdown
# Maintainability Review
You are a senior engineer focused on code readability, consistency,
and long-term maintainability. You review pull requests for the
human developers who will maintain this code next year.
## Scope
Read the full diff. Focus exclusively on:
- **Naming**: variable names, function names, file names that are unclear,
inconsistent with existing conventions, or misleading
- **Structure**: functions that are too long (>40 lines), files that mix
concerns, missing abstractions, dead code
- **Types**: missing TypeScript types, overly broad `any` usage,
type assertions that bypass safety
- **Tests**: missing test coverage for new code paths, brittle test
patterns, hardcoded test data
- **Documentation**: missing JSDoc on public functions, outdated comments,
misleading comments
## Output Format
For each finding, produce exactly this structure:
### [STYLE | STRUCTURE | TESTING | DOCS] — One-line summary
**File**: `path/to/file.ts` lines X-Y
**Description**: What the issue is and why it hurts maintainability.
**Suggestion**: The specific change recommended.
## Rules
- Do not comment on security or performance. Those are other agents' jobs.
- Limit findings to 10. Prioritize the most impactful issues.
- If the PR follows existing conventions well, say so. Do not force changes
for change's sake.
執行可維護性 agent:
cd /tmp/pr-142-review && claude --model haiku \
--system-prompt "$(cat ~/review-prompts/maintainability.md)" \
--print \
"Review this PR diff for maintainability issues. The diff is at /tmp/pr-142.diff. \
Read each modified file in full to understand context." \
> /tmp/pr-142-maintainability-review.md
三個同時跑
精彩的部分在這裡。在 Termdock 開三欄面板。每欄跑一個 agent。三個同時啟動,各自獨立完成。就像三個專科醫生同時檢查同一個病人 -- 一個聽心臟、一個看血液報告、一個讀 X 光片。不用排隊等。
指令一覽,同時啟動:
# Panel 1 — 安全(Opus)
cd /tmp/pr-142-review && claude --model opus \
--system-prompt "$(cat ~/review-prompts/security.md)" \
--print \
"Review this PR diff for security vulnerabilities. Diff at /tmp/pr-142.diff." \
> /tmp/pr-142-security-review.md
# Panel 2 — 效能(Sonnet)
cd /tmp/pr-142-review && claude --model sonnet \
--system-prompt "$(cat ~/review-prompts/performance.md)" \
--print \
"Review this PR diff for performance issues. Diff at /tmp/pr-142.diff." \
> /tmp/pr-142-performance-review.md
# Panel 3 — 可維護性(Haiku)
cd /tmp/pr-142-review && claude --model haiku \
--system-prompt "$(cat ~/review-prompts/maintainability.md)" \
--print \
"Review this PR diff for maintainability issues. Diff at /tmp/pr-142.diff." \
> /tmp/pr-142-maintainability-review.md
Haiku 最先完成,通常一分鐘內。Sonnet 接著,大約 1-2 分鐘。Opus 最後,大約 3-4 分鐘。等 Opus 跑完的時候,你已經讀完另外兩份報告了。
合併三份報告為一份可執行摘要
三份獨立報告有用但零散 -- 就像從三家不同的診所拿到三張不同的檢驗報告。最後一步是合併成一份有優先順序的文件。用第四個 agent 做這件事。Sonnet 是好選擇,因為任務是綜合整理,不是原創分析。
claude --model sonnet \
--print \
"You have three code review reports for PR #142. Merge them into a single
prioritized summary. Group findings by severity: CRITICAL first, then
REGRESSION, then WARNING, then OPTIMIZATION, then STYLE/STRUCTURE/TESTING.
Remove duplicates (if security and performance both flag the same code,
keep the higher-severity finding and note both perspectives). Output a
single Markdown document.
Security review: $(cat /tmp/pr-142-security-review.md)
Performance review: $(cat /tmp/pr-142-performance-review.md)
Maintainability review: $(cat /tmp/pr-142-maintainability-review.md)" \
> /tmp/pr-142-merged-review.md
合併後的報告通常比三份加起來少 30-50% 的項目。Agent 常常從不同角度標記同一段程式碼。安全 agent 的 SQL injection 發現和效能 agent 缺少 index 的發現都指向 search.ts -- 合併報告會把它們歸在一起。
成本比較:3 個便宜 Agent vs. 1 個昂貴深度 Review
最直覺的問題:為什麼不直接跑一個 Opus agent 配一個涵蓋三個視角的完整 prompt?
以下是 500 行 PR 的成本拆解(diff 約 2,000 input token,完整檔案上下文約 8,000 token):
| 做法 | 模型 | Input 成本 | Output 成本 | 總計 | 耗時 |
|---|---|---|---|---|---|
| 3 個平行 agent | Opus + Sonnet + Haiku | $0.18 | $0.13 | $0.31 | 3 分 42 秒 |
| 1 個完整 Opus | 只用 Opus | $0.15 | $0.22 | $0.37 | 5 分 15 秒 |
| 1 個完整 Sonnet | 只用 Sonnet | $0.03 | $0.05 | $0.08 | 2 分 30 秒 |
三 agent 做法的成本略低於單一 Opus,而且完成更快因為是平行跑。但成本不是真正的論點。品質才是。
在 20 支 PR 的測試中,三 agent pipeline 平均比配完整 prompt 的單一 Opus agent 多找出 2.1 個可操作的發現。單一 agent 傾向對最明顯的問題寫更長的描述,對其他問題草草帶過。就像請一個人同時校稿、查證、排版,跟把這三件事分給三個人做 -- 專精的人在自己的車道上一定挖得更深。
單一 Sonnet 做法最便宜,但漏掉大約 40% Opus 能抓到的安全發現。對於非關鍵的內部工具,這個取捨可能可以接受。對於處理用戶資料或付款的系統,不行。
用 Git Hook 自動化
每支 PR 手動跑三個 agent 很快就膩了。用 git hook 或 CI 整合來自動化。以下是在建立 PR 時觸發的 script 做法。
建立 .github/scripts/review-pipeline.sh:
#!/bin/bash
set -euo pipefail
PR_BRANCH="$1"
BASE_BRANCH="${2:-main}"
OUTPUT_DIR="/tmp/review-${PR_BRANCH//\//-}"
PROMPT_DIR="$HOME/review-prompts"
mkdir -p "$OUTPUT_DIR"
# 產生 diff
git diff "$BASE_BRANCH"..."$PR_BRANCH" > "$OUTPUT_DIR/diff.patch"
# 平行跑三個 review
claude --model opus \
--system-prompt "$(cat "$PROMPT_DIR/security.md")" \
--print \
"Review the diff at $OUTPUT_DIR/diff.patch for security vulnerabilities." \
> "$OUTPUT_DIR/security.md" &
claude --model sonnet \
--system-prompt "$(cat "$PROMPT_DIR/performance.md")" \
--print \
"Review the diff at $OUTPUT_DIR/diff.patch for performance issues." \
> "$OUTPUT_DIR/performance.md" &
claude --model haiku \
--system-prompt "$(cat "$PROMPT_DIR/maintainability.md")" \
--print \
"Review the diff at $OUTPUT_DIR/diff.patch for maintainability issues." \
> "$OUTPUT_DIR/maintainability.md" &
# 等待三個都完成
wait
# 合併報告
claude --model sonnet \
--print \
"Merge these three code review reports into one prioritized summary.
Security: $(cat "$OUTPUT_DIR/security.md")
Performance: $(cat "$OUTPUT_DIR/performance.md")
Maintainability: $(cat "$OUTPUT_DIR/maintainability.md")" \
> "$OUTPUT_DIR/merged-review.md"
echo "Review complete. Merged report at: $OUTPUT_DIR/merged-review.md"
cat "$OUTPUT_DIR/merged-review.md"
chmod +x .github/scripts/review-pipeline.sh
對任何 PR branch 執行:
.github/scripts/review-pipeline.sh feature/auth-refactor main
要整合 GitHub Actions,把同樣的邏輯包進一個在 pull_request 事件觸發的 workflow。三個 agent 作為平行 job 執行,最後一個 job 合併輸出並以 PR comment 發布摘要。
模型選擇邏輯
為什麼不三個 agent 用同一個模型?因為每種 review 類型的成本效益比差距很大。對的工具做對的事。
安全(Opus 4.6):安全漏洞需要多步推理。SQL injection 很明顯。但 authentication 流程中的 TOCTOU 競爭條件,需要模型在 working memory 中同時持有兩條程式碼路徑並推理它們的交錯 -- 就像棋手同時下兩盤棋還要想三步後的局面。Opus 是唯一能穩定抓到微妙 authentication 邏輯 bug 的模型。成本溢價的正當性來自漏掉漏洞的代價。
效能(Sonnet 4.6):效能審查在中間。發現 N+1 query 需要理解 call graph。發現缺少的 Promise.all 需要追蹤資料依賴。Sonnet 兩者都處理得好。它偶爾會漏掉深層的演算法問題(像是特定輸入分佈下從攤銷 O(n) 退化到 O(n^2)),但對 PR 層級的效能審查來說,Sonnet 的覆蓋率在 Opus 的 90% 以內,成本只有三分之一。
可維護性(Haiku 4.6):Style 和命名檢查是淺層 pattern matching。變數名稱是否有描述性?函式是否超過 40 行?是否缺少型別標註?Haiku 處理這些的準確率接近 100%。它也是最快的模型,當目標是趁程式碼還新鮮就把回饋送到開發者手上時,這很重要。
什麼時候這個 Pipeline 太大材小用
不是每支 PR 都需要三個 agent。十行的 config 改動不需要安全稽核。決策矩陣如下:
| PR 大小 | 風險等級 | 建議 |
|---|---|---|
| < 50 行 | 低(UI、設定檔) | 跳過自動 review 或只用 Haiku |
| 50-200 行 | 中(新功能) | 只用 Sonnet 配完整 prompt |
| 200-500 行 | 中高 | 完整三 agent pipeline |
| 500+ 行 | 高(authentication、付款、基礎設施) | 完整三 agent pipeline + 人工 review |
| 任何大小 | 關鍵(安全、合規) | 完整三 agent pipeline + 資深人工 review |
這個 pipeline 在安全敏感程式碼的中大型 PR 上增值最多 -- 正好是人工 review 等待時間最長、風險最高的那些 PR。
快速參考:完整指令序列
# 1. 建立 review 工作區
git fetch origin pull/142/head:pr-142
git diff main...pr-142 > /tmp/pr-142.diff
git worktree add /tmp/pr-142-review pr-142
# 2. 在 Termdock 開三欄面板
# 3. 同時跑三個 agent(一欄一個)
# Panel 1: claude --model opus + security.md prompt
# Panel 2: claude --model sonnet + performance.md prompt
# Panel 3: claude --model haiku + maintainability.md prompt
# 4. 等待三個完成(Haiku ~30 秒、Sonnet ~90 秒、Opus ~3 分鐘)
# 5. 合併報告為有優先順序的摘要
# claude --model sonnet + merge prompt
# 6. 把合併的 review 發布為 PR comment
gh pr comment 142 --body "$(cat /tmp/pr-142-merged-review.md)"
# 7. 清理
git worktree remove /tmp/pr-142-review
git branch -D pr-142
rm -rf /tmp/pr-142-*.md
三個視角、三個模型、一份合併報告。整個 pipeline 不到五分鐘,每天用一週的成本比一杯咖啡還便宜。
Ready to streamline your terminal workflow?
Multi-terminal drag-and-drop layout, workspace Git sync, built-in AI integration, AST code analysis — all in one app.