Three Reviewers, Zero Wait Time
Picture this. You open a 500-line pull request at 9 AM. You tag three teammates for review. By lunch, nobody has looked at it. By 3 PM, one person left a comment about a typo. The security-focused engineer is in meetings all day. The performance expert is on another sprint. Your PR sits in limbo, blocking two other branches.
Now picture the alternative. You open the same PR and launch three AI agents simultaneously -- one scanning for security holes, one hunting performance regressions, one checking code style. Four minutes later, you have a merged report with prioritized findings. Before your coffee cools.
That is what this article builds. Three specialized agents, three different models, one merged review. Here is what it looks like on a real 487-line authentication refactor:
| Agent | Perspective | Model | Time | Findings |
|---|---|---|---|---|
| Agent 1 | Security audit | Opus 4.6 | 3 min 42 sec | 2 critical, 1 warning |
| Agent 2 | Performance review | Sonnet 4.6 | 1 min 18 sec | 3 optimizations, 1 regression |
| Agent 3 | Maintainability check | Haiku 4.6 | 0 min 34 sec | 5 style issues, 2 naming suggestions |
Three agents. Three perspectives. Under four minutes. Cost: $0.31 total.
Why Three Perspectives Beat One Deep Review
Think of code review like a house inspection. One person checking plumbing, electrical, and structural integrity will inevitably rush through at least one. A plumber catches pipe issues the electrician walks right past. A structural engineer sees foundation cracks invisible to both. Specialization beats multitasking.
Human review teams work the same way. The senior engineer looks at architecture. The security-minded developer checks for injection vectors. The team lead looks at naming and test coverage. No single person catches everything.
Single-agent AI review has the same blind spot. Even with a long prompt that says "check security AND performance AND style," the model spreads its attention unevenly. It gravitates toward whichever concern the code most obviously triggers. A SQL query gets security attention. A React component gets style attention. The model does not systematically sweep all three dimensions with equal rigor.
Three specialized agents fix this. Each one has a narrow mandate, a tailored CLAUDE.md, and a model chosen for the task. The security agent gets the most capable model because missing a vulnerability is the highest-cost failure. The maintainability agent gets the cheapest model because style issues are low-stakes pattern matching.
Setup: The Review Environment
You need the PR diff available locally. The agents do not need separate worktrees -- they are reading the same diff, not writing code. But they need to run simultaneously, which means three terminal sessions.
# Fetch the PR branch and generate the diff
git fetch origin pull/142/head:pr-142
git diff main...pr-142 > /tmp/pr-142.diff
# Also check out the branch so agents can read full file context
git worktree add /tmp/pr-142-review pr-142
You now have the diff file at /tmp/pr-142.diff and the full branch checked out at /tmp/pr-142-review. Each agent will read both.
Agent 1: Security Audit (Opus 4.6)
Security review is surgery. Missing an SQL injection or an authentication bypass has real consequences -- data breaches, lawsuits, front-page headlines. This agent gets Opus 4.6, the most capable model, because the cost of a false negative dwarfs the cost of the model. Think of it as hiring the most expensive lock inspector for your bank vault. You do not cut corners there.
Create the security review prompt at review-prompts/security.md:
# Security Audit Review
You are a senior application security engineer reviewing a pull request.
Your job is to find vulnerabilities, not style issues.
## Scope
Read the full diff and every modified file in its entirety.
Focus exclusively on:
- **Injection vectors**: SQL injection, XSS, command injection, template injection
- **Authentication/authorization flaws**: missing auth checks, privilege escalation,
token handling errors
- **Data exposure**: sensitive data in logs, error messages, or API responses
- **Cryptographic issues**: weak hashing, hardcoded secrets, insecure randomness
- **Dependency risks**: new dependencies with known CVEs, outdated packages
- **Race conditions**: TOCTOU bugs, concurrent access without locking
## Output Format
For each finding, produce exactly this structure:
### [CRITICAL | WARNING | INFO] — One-line summary
**File**: `path/to/file.ts` lines X-Y
**Category**: (one of the categories above)
**Description**: What the vulnerability is, in 2-3 sentences.
**Exploit scenario**: How an attacker would exploit this, step by step.
**Fix**: The specific code change needed, as a diff or code block.
## Rules
- If you find zero security issues, say so explicitly. Do not fabricate findings.
- Do not comment on code style, naming, or performance. Those are other agents' jobs.
- Classify severity honestly. CRITICAL means exploitable in production.
WARNING means exploitable under specific conditions. INFO means defense-in-depth.
- Read the FULL file context for each changed file, not just the diff lines.
Vulnerabilities often arise from interactions between new and existing code.
Run the security agent:
cd /tmp/pr-142-review && claude --model opus \
--system-prompt "$(cat ~/review-prompts/security.md)" \
--print \
"Review this PR diff for security vulnerabilities. The diff is at /tmp/pr-142.diff. \
Read each modified file in full to understand context." \
> /tmp/pr-142-security-review.md
The --print flag runs Claude Code in non-interactive mode: it produces the output and exits. No back-and-forth. The full review lands in a file.
Sample Security Review Output
### CRITICAL — Unsanitized user input in database query
**File**: `src/api/users/search.ts` lines 23-31
**Category**: SQL injection
**Description**: The `searchTerm` parameter from the request query string
is interpolated directly into a SQL WHERE clause using template literals.
No parameterized query or input sanitization is applied.
**Exploit scenario**:
1. Attacker sends GET /api/users/search?q=' OR '1'='1
2. Query becomes: SELECT * FROM users WHERE name LIKE '%' OR '1'='1%'
3. Full users table is returned, including email addresses and hashed passwords
**Fix**:
```diff
- const results = await db.query(
- `SELECT * FROM users WHERE name LIKE '%${searchTerm}%'`
- );
+ const results = await db.query(
+ `SELECT * FROM users WHERE name LIKE $1`,
+ [`%${searchTerm}%`]
+ );
WARNING — JWT secret loaded from environment without fallback validation
File: src/lib/auth.ts lines 8-12
Category: Authentication flaw
Description: process.env.JWT_SECRET is used directly without checking
whether the value exists or meets minimum length requirements. In development
environments where .env files may be missing, this falls back to undefined,
causing jwt.sign() to produce tokens signed with an empty string.
Exploit scenario:
- Developer deploys to staging without setting JWT_SECRET
- All tokens are signed with empty string
- Attacker crafts valid tokens for any user ID Fix:
const JWT_SECRET = process.env.JWT_SECRET;
if (!JWT_SECRET || JWT_SECRET.length < 32) {
throw new Error('JWT_SECRET must be set and at least 32 characters');
}
## Agent 2: Performance Review (Sonnet 4.6)
Performance review is detective work. You are looking for N+1 queries hiding inside innocent-looking loops, `Promise.all` opportunities masked as sequential awaits, missing database indexes that will not bite until production hits 100k rows. Sonnet 4.6 is the right detective here -- it has enough reasoning capacity for algorithmic analysis without the cost premium of Opus. Like hiring a skilled inspector instead of a forensic scientist when the job is checking wiring, not solving a murder.
Create the performance review prompt at `review-prompts/performance.md`:
```markdown
# Performance Review
You are a senior backend/frontend performance engineer reviewing a pull request.
Your job is to find performance regressions and optimization opportunities.
## Scope
Read the full diff and every modified file in its entirety.
Focus exclusively on:
- **Database queries**: N+1 queries, missing indexes, full table scans,
unoptimized JOINs
- **Algorithmic complexity**: O(n^2) or worse loops, unnecessary iterations,
data structure misuse
- **Frontend rendering**: unnecessary re-renders, missing memoization,
large bundle imports, layout thrashing
- **Network**: redundant API calls, missing caching, oversized payloads,
no pagination
- **Memory**: unbounded collections, leaked event listeners, large object
retention in closures
- **Concurrency**: blocking operations on main thread, missing async/await,
sequential operations that could be parallel
## Output Format
For each finding, produce exactly this structure:
### [REGRESSION | OPTIMIZATION | INFO] — One-line summary
**File**: `path/to/file.ts` lines X-Y
**Category**: (one of the categories above)
**Impact**: Estimated performance impact (e.g., "Adds ~200ms per request
at 10k rows")
**Description**: What the performance issue is, in 2-3 sentences.
**Fix**: The specific code change needed.
## Rules
- REGRESSION means the PR introduces worse performance than the current code.
- OPTIMIZATION means the PR has a chance to improve performance beyond current.
- INFO means a minor opportunity that is not urgent.
- Do not comment on security or code style. Those are other agents' jobs.
- Quantify impact where possible. "Slow" is not useful. "O(n^2) where n is
the user count, ~50ms at 1k users, ~5s at 10k users" is useful.
Run the performance agent:
cd /tmp/pr-142-review && claude --model sonnet \
--system-prompt "$(cat ~/review-prompts/performance.md)" \
--print \
"Review this PR diff for performance issues. The diff is at /tmp/pr-142.diff. \
Read each modified file in full to understand context." \
> /tmp/pr-142-performance-review.md
Sample Performance Review Output
### REGRESSION — Sequential database calls that should be parallel
**File**: `src/api/users/[id]/route.ts` lines 15-22
**Category**: Database queries
**Impact**: Adds ~120ms per request (two sequential queries at ~60ms each
that have no dependency on each other)
**Description**: The handler fetches user profile and user permissions in
two sequential `await` calls. These queries are independent — the permissions
query does not use any data from the profile query.
**Fix**:
```typescript
// Before: sequential
const profile = await getProfile(userId);
const permissions = await getPermissions(userId);
// After: parallel
const [profile, permissions] = await Promise.all([
getProfile(userId),
getPermissions(userId),
]);
OPTIMIZATION — Missing database index on new query pattern
File: src/api/users/search.ts lines 23-31
Category: Database queries
Impact: Full table scan on users.name column. At 100k rows,
query time increases from ~5ms (indexed) to ~800ms (sequential scan).
Description: The new search endpoint queries WHERE name LIKE $1
but no index exists on the name column. PostgreSQL will use a
sequential scan for prefix-match LIKE queries without a btree or
trigram index.
Fix: Add a migration:
CREATE INDEX idx_users_name_trgm ON users
USING gin (name gin_trgm_ops);
## Agent 3: Maintainability Check (Haiku 4.6)
Maintainability review is the grammar check of code review. Is the variable name descriptive? Does the function exceed 40 lines? Is there a missing type annotation? These are pattern-matching tasks -- spotting deviations from convention, not reasoning about complex interactions. Haiku 4.6 handles this like a spell checker handles typos: fast, cheap, and nearly perfect. It processes a 500-line diff in under 40 seconds and costs less than $0.01 per review. For feedback that is useful but low-stakes, there is no reason to pay for a heavier model.
Create the maintainability review prompt at `review-prompts/maintainability.md`:
```markdown
# Maintainability Review
You are a senior engineer focused on code readability, consistency,
and long-term maintainability. You review pull requests for the
human developers who will maintain this code next year.
## Scope
Read the full diff. Focus exclusively on:
- **Naming**: variable names, function names, file names that are unclear,
inconsistent with existing conventions, or misleading
- **Structure**: functions that are too long (>40 lines), files that mix
concerns, missing abstractions, dead code
- **Types**: missing TypeScript types, overly broad `any` usage,
type assertions that bypass safety
- **Tests**: missing test coverage for new code paths, brittle test
patterns, hardcoded test data
- **Documentation**: missing JSDoc on public functions, outdated comments,
misleading comments
## Output Format
For each finding, produce exactly this structure:
### [STYLE | STRUCTURE | TESTING | DOCS] — One-line summary
**File**: `path/to/file.ts` lines X-Y
**Description**: What the issue is and why it hurts maintainability.
**Suggestion**: The specific change recommended.
## Rules
- Do not comment on security or performance. Those are other agents' jobs.
- Limit findings to 10. Prioritize the most impactful issues.
- If the PR follows existing conventions well, say so. Do not force changes
for change's sake.
Run the maintainability agent:
cd /tmp/pr-142-review && claude --model haiku \
--system-prompt "$(cat ~/review-prompts/maintainability.md)" \
--print \
"Review this PR diff for maintainability issues. The diff is at /tmp/pr-142.diff. \
Read each modified file in full to understand context." \
> /tmp/pr-142-maintainability-review.md
Running All Three in Parallel
Here is where the magic happens. In Termdock, open a three-panel layout. Each panel runs one agent. All three start at the same time and finish independently. It is like having three specialists examine the same patient simultaneously -- one listening to the heart, one checking bloodwork, one reading the X-ray. No waiting in sequence.
The commands, all launched simultaneously:
# Panel 1 — Security (Opus)
cd /tmp/pr-142-review && claude --model opus \
--system-prompt "$(cat ~/review-prompts/security.md)" \
--print \
"Review this PR diff for security vulnerabilities. Diff at /tmp/pr-142.diff." \
> /tmp/pr-142-security-review.md
# Panel 2 — Performance (Sonnet)
cd /tmp/pr-142-review && claude --model sonnet \
--system-prompt "$(cat ~/review-prompts/performance.md)" \
--print \
"Review this PR diff for performance issues. Diff at /tmp/pr-142.diff." \
> /tmp/pr-142-performance-review.md
# Panel 3 — Maintainability (Haiku)
cd /tmp/pr-142-review && claude --model haiku \
--system-prompt "$(cat ~/review-prompts/maintainability.md)" \
--print \
"Review this PR diff for maintainability issues. Diff at /tmp/pr-142.diff." \
> /tmp/pr-142-maintainability-review.md
Haiku finishes first, usually in under a minute. Sonnet finishes next, around 1-2 minutes. Opus finishes last, around 3-4 minutes. By the time Opus completes, you have already read the other two reports.
Merging Three Reports into One Actionable Summary
Three separate reports are useful but scattered -- like getting lab results on three different pieces of paper from three different clinics. The final step is to merge them into a single prioritized document. Use a fourth agent for this. Sonnet is a good choice because the task is synthesis, not original analysis.
claude --model sonnet \
--print \
"You have three code review reports for PR #142. Merge them into a single
prioritized summary. Group findings by severity: CRITICAL first, then
REGRESSION, then WARNING, then OPTIMIZATION, then STYLE/STRUCTURE/TESTING.
Remove duplicates (if security and performance both flag the same code,
keep the higher-severity finding and note both perspectives). Output a
single Markdown document.
Security review: $(cat /tmp/pr-142-security-review.md)
Performance review: $(cat /tmp/pr-142-performance-review.md)
Maintainability review: $(cat /tmp/pr-142-maintainability-review.md)" \
> /tmp/pr-142-merged-review.md
The merged report typically has 30-50% fewer items than the three reports combined. Agents often flag the same code region from different angles. The SQL injection finding from the security agent and the missing index finding from the performance agent both point to search.ts -- the merged report groups them together.
Cost Comparison: 3 Cheap Agents vs. 1 Expensive Deep Review
The obvious question: why not just run one Opus agent with a comprehensive prompt that covers all three perspectives?
Here is the cost breakdown for a 500-line PR (~2,000 input tokens for the diff, ~8,000 tokens for full file context):
| Approach | Model(s) | Input Cost | Output Cost | Total | Time |
|---|---|---|---|---|---|
| 3 parallel agents | Opus + Sonnet + Haiku | $0.18 | $0.13 | $0.31 | 3 min 42 sec |
| 1 comprehensive Opus | Opus only | $0.15 | $0.22 | $0.37 | 5 min 15 sec |
| 1 comprehensive Sonnet | Sonnet only | $0.03 | $0.05 | $0.08 | 2 min 30 sec |
The three-agent approach costs slightly less than a single Opus pass and finishes faster because the agents run in parallel. But cost is not the real argument. Quality is.
In testing across 20 PRs, the three-agent pipeline found an average of 2.1 more actionable findings per PR than a single Opus agent with a comprehensive prompt. The single agent tended to write longer descriptions of fewer findings -- it went deep on the most obvious issue and gave cursory treatment to the rest. Think of it like asking one person to proofread, fact-check, and typeset a document versus splitting those into three jobs. The specialist always goes deeper on their lane.
The single Sonnet approach is the cheapest but misses roughly 40% of the security findings that Opus catches. For non-critical internal tooling, that tradeoff might be acceptable. For anything handling user data or payments, it is not.
Automating with a Git Hook
Running three agents manually on every PR gets old fast. Automate it with a git hook or CI integration. Here is a script approach that triggers on PR creation.
Create .github/scripts/review-pipeline.sh:
#!/bin/bash
set -euo pipefail
PR_BRANCH="$1"
BASE_BRANCH="${2:-main}"
OUTPUT_DIR="/tmp/review-${PR_BRANCH//\//-}"
PROMPT_DIR="$HOME/review-prompts"
mkdir -p "$OUTPUT_DIR"
# Generate diff
git diff "$BASE_BRANCH"..."$PR_BRANCH" > "$OUTPUT_DIR/diff.patch"
# Run all three reviews in parallel
claude --model opus \
--system-prompt "$(cat "$PROMPT_DIR/security.md")" \
--print \
"Review the diff at $OUTPUT_DIR/diff.patch for security vulnerabilities." \
> "$OUTPUT_DIR/security.md" &
claude --model sonnet \
--system-prompt "$(cat "$PROMPT_DIR/performance.md")" \
--print \
"Review the diff at $OUTPUT_DIR/diff.patch for performance issues." \
> "$OUTPUT_DIR/performance.md" &
claude --model haiku \
--system-prompt "$(cat "$PROMPT_DIR/maintainability.md")" \
--print \
"Review the diff at $OUTPUT_DIR/diff.patch for maintainability issues." \
> "$OUTPUT_DIR/maintainability.md" &
# Wait for all three to finish
wait
# Merge reports
claude --model sonnet \
--print \
"Merge these three code review reports into one prioritized summary.
Security: $(cat "$OUTPUT_DIR/security.md")
Performance: $(cat "$OUTPUT_DIR/performance.md")
Maintainability: $(cat "$OUTPUT_DIR/maintainability.md")" \
> "$OUTPUT_DIR/merged-review.md"
echo "Review complete. Merged report at: $OUTPUT_DIR/merged-review.md"
cat "$OUTPUT_DIR/merged-review.md"
chmod +x .github/scripts/review-pipeline.sh
Run it on any PR branch:
.github/scripts/review-pipeline.sh feature/auth-refactor main
For GitHub Actions integration, wrap the same logic in a workflow that triggers on pull_request events. The three agents run as parallel jobs, and a final job merges the outputs and posts the summary as a PR comment.
Model Selection Rationale
Why not use the same model for all three agents? Because the cost-to-value ratio differs dramatically by review type. Use the right tool for the right job.
Security (Opus 4.6): Security vulnerabilities require multi-step reasoning. An SQL injection is obvious. A TOCTOU race condition in an authentication flow requires the model to hold two code paths in working memory and reason about their interleaving -- like a chess player thinking three moves ahead while tracking two games simultaneously. Opus is the only model that reliably catches subtle authentication logic bugs. The cost premium is justified by the cost of a missed vulnerability.
Performance (Sonnet 4.6): Performance review sits in the middle. Spotting an N+1 query requires understanding the call graph. Spotting a missing Promise.all requires tracking data dependencies. Sonnet handles both well. It occasionally misses deep algorithmic issues (like an amortized O(n) that degrades to O(n^2) under specific input distributions), but for PR-level performance review, Sonnet's coverage is within 90% of Opus at one-third the cost.
Maintainability (Haiku 4.6): Style and naming checks are shallow pattern matching. Is the variable name descriptive? Does the function exceed 40 lines? Is there a missing type annotation? Haiku handles this with near-100% accuracy. It is also the fastest model, which matters when the goal is getting feedback to developers while the code is still fresh in their heads.
When This Pipeline Is Overkill
Not every PR needs three agents. A 10-line config change does not need a security audit. Here is the decision matrix:
| PR Size | PR Risk | Recommendation |
|---|---|---|
| < 50 lines | Low (UI, config) | Skip automated review or use Haiku only |
| 50-200 lines | Medium (new feature) | Sonnet only with comprehensive prompt |
| 200-500 lines | Medium-High | Full 3-agent pipeline |
| 500+ lines | High (auth, payments, infra) | Full 3-agent pipeline + human review |
| Any size | Critical (security, compliance) | Full 3-agent pipeline + senior human review |
The pipeline adds the most value on medium-to-large PRs in security-sensitive code -- exactly the PRs where human review has the longest wait times and the highest stakes.
Quick Reference: The Full Command Sequence
# 1. Set up the review workspace
git fetch origin pull/142/head:pr-142
git diff main...pr-142 > /tmp/pr-142.diff
git worktree add /tmp/pr-142-review pr-142
# 2. Open Termdock with 3-panel layout
# 3. Run three agents simultaneously (one per panel)
# Panel 1: claude --model opus + security.md prompt
# Panel 2: claude --model sonnet + performance.md prompt
# Panel 3: claude --model haiku + maintainability.md prompt
# 4. Wait for all three (Haiku ~30s, Sonnet ~90s, Opus ~3min)
# 5. Merge reports into prioritized summary
# claude --model sonnet + merge prompt
# 6. Post merged review as PR comment
gh pr comment 142 --body "$(cat /tmp/pr-142-merged-review.md)"
# 7. Clean up
git worktree remove /tmp/pr-142-review
git branch -D pr-142
rm -rf /tmp/pr-142-*.md
Three perspectives. Three models. One merged report. The entire pipeline runs in under five minutes and costs less than a cup of coffee per week of daily use.
Ready to streamline your terminal workflow?
Multi-terminal drag-and-drop layout, workspace Git sync, built-in AI integration, AST code analysis — all in one app.