The $200 Problem
You open your Claude Code session at 9 AM. By 3 PM, the rate limit bar is blinking red. You have been asking Opus 4.6 to rename variables, look up function signatures, and answer "what does this do?" questions. That is like hiring a brain surgeon to apply band-aids.
Claude Code Max 20x costs $200/month. That buys roughly 240-480 hours of Sonnet 4.6 per week and 24-40 hours of Opus 4.6 — more than enough for heavy professional use. The problem is not the budget. The problem is that most developers burn Opus-level reasoning tokens on tasks that do not need Opus-level reasoning.
I tracked my own usage for four weeks. The breakdown: roughly 40% of interactions were simple — file lookups, small edits, one-line fixes, "what does this function do?" questions. Another 25% were moderate — test writing, code reviews, documentation. Only 35% genuinely needed deep multi-step reasoning: complex refactors, architectural decisions, multi-file debugging sessions.
65% of my token budget went to tasks that could have been cheaper, faster, or both. Here are five methods I tested, with real before/after numbers.
Method 1: Use /effort to Control Reasoning Depth
Think of /effort as a volume knob on Claude Code's brain. Turn it down for simple tasks. Turn it up when the problem is genuinely hard. Stop running everything at maximum volume.
Four levels: low, medium, high (the default), and max (Opus 4.6 only). Lower effort means fewer thinking tokens consumed per interaction.
How It Works
Run /effort low in your session. Every subsequent response uses minimal reasoning — Claude skips extended thinking and responds directly. Run /effort high to switch back. /effort auto lets Claude decide based on query complexity.
What to Route Where
| Effort Level | Task Type | Examples |
|---|---|---|
| Low | Lookups and simple edits | "What does this hook return?" / "Add a console.log here" / "Rename this variable" |
| Medium | Moderate complexity | Writing tests for existing functions / Code review of a single file / Generating boilerplate |
| High | Complex reasoning (default) | Multi-file refactors / Debugging race conditions / Architecture decisions |
| Max | Maximum depth (Opus only) | System design sessions / Complex algorithm implementation / Cross-service debugging |
Before/After
Before: 100% of interactions at high effort. Hit rate limits by mid-afternoon on Pro.
After: 40% at low, 25% at medium, 35% at high. Same Pro plan lasts through the full workday.
Estimated savings: 30-40% of total token usage.
Method 2: Write a Good CLAUDE.md to Kill Wasted Iterations
Every wrong answer costs you twice. Once for the bad output. Once for the correction prompt. A misunderstood convention, a wrong test framework, code formatted in a style you reject — each is a round trip you pay for in tokens.
A well-crafted CLAUDE.md prevents this. Not documentation. A concise instruction set that heads off the most common misunderstandings before they happen.
What Belongs in CLAUDE.md
# Project: my-app
## Architecture
- Next.js 15 App Router, TypeScript strict mode
- Database: PostgreSQL via Drizzle ORM
- Styling: Tailwind CSS v4, no CSS modules
## Conventions
- Components: named exports, no default exports
- Tests: Vitest, co-located in __tests__ directories
- Error handling: Result pattern, never throw in business logic
## Active Context
- Currently refactoring auth flow from NextAuth to custom JWT
- Migration in progress: /src/lib/auth/ is the new path
## Do NOT
- Use default exports
- Add console.log (use project logger at src/lib/logger.ts)
- Create new API routes under /pages/api (deprecated)
Why This Saves Money
Without CLAUDE.md, Claude Code scans multiple files per task to infer conventions — often guessing wrong on the first attempt. Each correction is a full round trip: your prompt, Claude's response, your correction, Claude's corrected response.
With a good CLAUDE.md: conventions land on first read. First-try accuracy goes up. File reads go down.
Before/After
Before: Average 2.3 iterations per task. Claude frequently uses wrong patterns. After: Average 1.4 iterations per task. Corrections only for genuinely ambiguous requirements.
Estimated savings: 25-35% of total token usage. Compounding matters — fewer wasted iterations means fewer tokens per task means more tasks within your rate limit.
Keep CLAUDE.md under 500 lines. Every token in it gets loaded every session. Bloated context files defeat the purpose. The AI CLI Tools Complete Guide covers CLAUDE.md best practices in depth.
Method 3: Offload Simple Tasks to Gemini CLI (Free)
The single highest-impact change. Gemini CLI is free — 1,000 model requests per day, 60 per minute, Gemini 2.5 Pro with a 1 million token context window. No credit card. No trial period.
That 40% of simple tasks? Gemini CLI handles them fine. Not as well as Claude Code on complex work — but for straightforward tasks, the quality difference is negligible. The cost difference is $200 vs. $0.
The Routing Rule
One question before every Claude Code prompt: Does this task require multi-step reasoning across multiple files?
- Yes — Use Claude Code.
- No — Use Gemini CLI.
This single heuristic handles 90% of routing decisions. The dual-tool strategy guide covers the full framework, but the one-question version gets you 80% of the savings.
What Gemini CLI Handles Well
- Explaining unfamiliar code
- Writing unit tests for a single function
- Generating boilerplate (components, API routes, config files)
- Quick code reviews of small changes
- Documentation drafts
- Simple refactors within a single file
- "How do I do X in framework Y?" questions
What Still Needs Claude Code
- Multi-file refactors with cascading dependencies
- Debugging subtle bugs spanning multiple modules
- Architectural decisions requiring deep codebase understanding
- Complex git operations and merge conflict resolution
- Tasks requiring tool use chains (read, edit, test, fix)
Before/After
Before: All tasks through Claude Code. Max 20x at $200/month, still hitting rate limits on heavy days. After: 40-50% of tasks routed to Gemini CLI. Usage drops enough to consider Max 5x at $100/month — or even Pro at $20/month with discipline.
Estimated savings: $100-180/month (plan downgrade) or 40-50% of token budget (same plan, more headroom).
Method 4: Use Subagents Strategically (Not for Everything)
Subagents are Claude Code's parallel processing system. Powerful for exploration — searching a large codebase, investigating multiple root causes, researching API documentation. But they are not free.
Each subagent is a separate Claude instance with its own context window. A main agent that spawns 3 subagents consumes roughly 4x the tokens of a single session. Spawning subagents for trivial tasks is like hiring four contractors to change a lightbulb.
When Subagents Save Money
They save money when the alternative is worse: manually searching 20 files in a single session (every file read adding to context), or grinding through trial-and-error because you skipped the exploration phase.
Good use cases:
- Searching a large codebase for all usages of a deprecated API
- Investigating 3 potential root causes in parallel
- Gathering context from multiple documentation sources before an architectural decision
- Running tests in a separate context while you continue development
Bad use cases:
- Reading a single file (just read it directly)
- Simple search-and-replace
- Any task with fewer than 3 files to examine
- Tasks where you already know what to do
The 3-File Rule
Fewer than 3 files of exploration? Do it in your main session. Three or more? Consider a subagent. Simple threshold, prevents the most common overuse.
Before/After
Before: Subagents for nearly every task. Token consumption 3-5x higher than needed. After: Subagents only for genuine exploration. Token consumption drops 40-60% on subagent-heavy workflows.
Estimated savings: 20-30% of total token usage for developers who use subagents.
Method 5: Context Management — /compact and /clear
Claude Code's context window is a running cost meter. Every message, every file read, every tool output stays in context and gets retransmitted with every subsequent prompt. A 2-hour session can accumulate 100k+ tokens, and every new interaction pays for carrying all of it.
/compact — Summarize and Continue
/compact summarizes the conversation into a shorter form, preserving key decisions while discarding verbose intermediate steps. Use it when your context meter hits 60-70%.
Add custom preservation instructions:
/compact preserve the list of modified files and the test results
/compact keep only the architectural decisions, drop all debugging attempts
Critical because Claude Code's default compaction preserves everything equally. Twenty messages of debugging dead-ends are worth zero tokens after compaction — tell Claude to drop them.
/clear — Start Fresh
/clear wipes context entirely. Use it when switching to an unrelated task. A context window full of auth refactoring is pure noise when you start working on payment integration.
The common mistake: continuing the same session across unrelated tasks. By hour 3, context is bloated with irrelevant history, and every new interaction carries that dead weight.
The Workflow
- Start a task in a fresh session or after
/clear - Work until context meter hits 60-70%
- Run
/compactwith specific preservation instructions - Continue working
- When the task is done,
/clearbefore starting the next one
Before/After
Before: Single continuous sessions running 3-4 hours. Later interactions 3-5x more expensive than early ones. After: Compact at 70%, clear between tasks. Average context size stays 40-60% lower across a workday.
Estimated savings: 20-35% of total token usage.
Combined Impact: The Full Stack
These five methods stack. Here is the combined impact:
| Method | Savings Estimate | Applies To |
|---|---|---|
| /effort tuning | 30-40% token reduction | All users |
| Good CLAUDE.md | 25-35% fewer wasted iterations | All users |
| Gemini CLI offloading | 40-50% fewer Claude Code tasks | All users |
| Strategic subagents | 20-30% token reduction | Subagent users |
| Context management | 20-35% token reduction | All users |
The savings compound. Gemini CLI handles 40% of tasks. /effort reduces tokens on the remaining 60%. Good CLAUDE.md cuts wasted iterations within those. Context management keeps sessions lean. Combined effect: typically 50-60% reduction in Claude Code usage.
For Max 20x at $200/month, that means dropping to Max 5x at $100/month. For Max 5x at $100/month, dropping to Pro at $20/month. The AI CLI cost optimization guide covers even more strategies including free tier stacking and budget templates.
The Bottom Line
Claude Code is the best agentic coding tool available. $200/month is not the problem — using it wastefully is. These five methods are not workarounds. They are how Claude Code is designed to be used: right effort level for each task, clear project context, complementary tools for simple work, disciplined subagent use, and active context management.
Apply all five, track usage for two weeks, then decide whether your current tier is still the right one. Most developers find they can drop at least one tier without losing productivity.
Ready to streamline your terminal workflow?
Multi-terminal drag-and-drop layout, workspace Git sync, built-in AI integration, AST code analysis — all in one app.