March 17, 20269 min readai-cli-tools

Claude Code Costs $200/mo — Cut It in Half

Five tested methods to slash Claude Code spending by 50% or more. Covers /effort tuning, CLAUDE.md optimization, Gemini CLI offloading, strategic subagent use, and context management — with before/after cost estimates for each.

DH
Danny Huang

The $200 Problem

You open your Claude Code session at 9 AM. By 3 PM, the rate limit bar is blinking red. You have been asking Opus 4.6 to rename variables, look up function signatures, and answer "what does this do?" questions. That is like hiring a brain surgeon to apply band-aids.

Claude Code Max 20x costs $200/month. That buys roughly 240-480 hours of Sonnet 4.6 per week and 24-40 hours of Opus 4.6 — more than enough for heavy professional use. The problem is not the budget. The problem is that most developers burn Opus-level reasoning tokens on tasks that do not need Opus-level reasoning.

I tracked my own usage for four weeks. The breakdown: roughly 40% of interactions were simple — file lookups, small edits, one-line fixes, "what does this function do?" questions. Another 25% were moderate — test writing, code reviews, documentation. Only 35% genuinely needed deep multi-step reasoning: complex refactors, architectural decisions, multi-file debugging sessions.

65% of my token budget went to tasks that could have been cheaper, faster, or both. Here are five methods I tested, with real before/after numbers.

Method 1: Use /effort to Control Reasoning Depth

Think of /effort as a volume knob on Claude Code's brain. Turn it down for simple tasks. Turn it up when the problem is genuinely hard. Stop running everything at maximum volume.

Four levels: low, medium, high (the default), and max (Opus 4.6 only). Lower effort means fewer thinking tokens consumed per interaction.

How It Works

Run /effort low in your session. Every subsequent response uses minimal reasoning — Claude skips extended thinking and responds directly. Run /effort high to switch back. /effort auto lets Claude decide based on query complexity.

What to Route Where

Effort LevelTask TypeExamples
LowLookups and simple edits"What does this hook return?" / "Add a console.log here" / "Rename this variable"
MediumModerate complexityWriting tests for existing functions / Code review of a single file / Generating boilerplate
HighComplex reasoning (default)Multi-file refactors / Debugging race conditions / Architecture decisions
MaxMaximum depth (Opus only)System design sessions / Complex algorithm implementation / Cross-service debugging

Before/After

Before: 100% of interactions at high effort. Hit rate limits by mid-afternoon on Pro. After: 40% at low, 25% at medium, 35% at high. Same Pro plan lasts through the full workday.

Estimated savings: 30-40% of total token usage.

Method 2: Write a Good CLAUDE.md to Kill Wasted Iterations

Every wrong answer costs you twice. Once for the bad output. Once for the correction prompt. A misunderstood convention, a wrong test framework, code formatted in a style you reject — each is a round trip you pay for in tokens.

A well-crafted CLAUDE.md prevents this. Not documentation. A concise instruction set that heads off the most common misunderstandings before they happen.

What Belongs in CLAUDE.md

# Project: my-app

## Architecture
- Next.js 15 App Router, TypeScript strict mode
- Database: PostgreSQL via Drizzle ORM
- Styling: Tailwind CSS v4, no CSS modules

## Conventions
- Components: named exports, no default exports
- Tests: Vitest, co-located in __tests__ directories
- Error handling: Result pattern, never throw in business logic

## Active Context
- Currently refactoring auth flow from NextAuth to custom JWT
- Migration in progress: /src/lib/auth/ is the new path

## Do NOT
- Use default exports
- Add console.log (use project logger at src/lib/logger.ts)
- Create new API routes under /pages/api (deprecated)

Why This Saves Money

Without CLAUDE.md, Claude Code scans multiple files per task to infer conventions — often guessing wrong on the first attempt. Each correction is a full round trip: your prompt, Claude's response, your correction, Claude's corrected response.

With a good CLAUDE.md: conventions land on first read. First-try accuracy goes up. File reads go down.

Before/After

Before: Average 2.3 iterations per task. Claude frequently uses wrong patterns. After: Average 1.4 iterations per task. Corrections only for genuinely ambiguous requirements.

Estimated savings: 25-35% of total token usage. Compounding matters — fewer wasted iterations means fewer tokens per task means more tasks within your rate limit.

Keep CLAUDE.md under 500 lines. Every token in it gets loaded every session. Bloated context files defeat the purpose. The AI CLI Tools Complete Guide covers CLAUDE.md best practices in depth.

Method 3: Offload Simple Tasks to Gemini CLI (Free)

The single highest-impact change. Gemini CLI is free — 1,000 model requests per day, 60 per minute, Gemini 2.5 Pro with a 1 million token context window. No credit card. No trial period.

That 40% of simple tasks? Gemini CLI handles them fine. Not as well as Claude Code on complex work — but for straightforward tasks, the quality difference is negligible. The cost difference is $200 vs. $0.

The Routing Rule

One question before every Claude Code prompt: Does this task require multi-step reasoning across multiple files?

  • Yes — Use Claude Code.
  • No — Use Gemini CLI.

This single heuristic handles 90% of routing decisions. The dual-tool strategy guide covers the full framework, but the one-question version gets you 80% of the savings.

What Gemini CLI Handles Well

  • Explaining unfamiliar code
  • Writing unit tests for a single function
  • Generating boilerplate (components, API routes, config files)
  • Quick code reviews of small changes
  • Documentation drafts
  • Simple refactors within a single file
  • "How do I do X in framework Y?" questions

What Still Needs Claude Code

  • Multi-file refactors with cascading dependencies
  • Debugging subtle bugs spanning multiple modules
  • Architectural decisions requiring deep codebase understanding
  • Complex git operations and merge conflict resolution
  • Tasks requiring tool use chains (read, edit, test, fix)

Before/After

Before: All tasks through Claude Code. Max 20x at $200/month, still hitting rate limits on heavy days. After: 40-50% of tasks routed to Gemini CLI. Usage drops enough to consider Max 5x at $100/month — or even Pro at $20/month with discipline.

Estimated savings: $100-180/month (plan downgrade) or 40-50% of token budget (same plan, more headroom).

Try Termdock Ai Agent Monitoring works out of the box. Free download →

Method 4: Use Subagents Strategically (Not for Everything)

Subagents are Claude Code's parallel processing system. Powerful for exploration — searching a large codebase, investigating multiple root causes, researching API documentation. But they are not free.

Each subagent is a separate Claude instance with its own context window. A main agent that spawns 3 subagents consumes roughly 4x the tokens of a single session. Spawning subagents for trivial tasks is like hiring four contractors to change a lightbulb.

When Subagents Save Money

They save money when the alternative is worse: manually searching 20 files in a single session (every file read adding to context), or grinding through trial-and-error because you skipped the exploration phase.

Good use cases:

  • Searching a large codebase for all usages of a deprecated API
  • Investigating 3 potential root causes in parallel
  • Gathering context from multiple documentation sources before an architectural decision
  • Running tests in a separate context while you continue development

Bad use cases:

  • Reading a single file (just read it directly)
  • Simple search-and-replace
  • Any task with fewer than 3 files to examine
  • Tasks where you already know what to do

The 3-File Rule

Fewer than 3 files of exploration? Do it in your main session. Three or more? Consider a subagent. Simple threshold, prevents the most common overuse.

Before/After

Before: Subagents for nearly every task. Token consumption 3-5x higher than needed. After: Subagents only for genuine exploration. Token consumption drops 40-60% on subagent-heavy workflows.

Estimated savings: 20-30% of total token usage for developers who use subagents.

Method 5: Context Management — /compact and /clear

Claude Code's context window is a running cost meter. Every message, every file read, every tool output stays in context and gets retransmitted with every subsequent prompt. A 2-hour session can accumulate 100k+ tokens, and every new interaction pays for carrying all of it.

/compact — Summarize and Continue

/compact summarizes the conversation into a shorter form, preserving key decisions while discarding verbose intermediate steps. Use it when your context meter hits 60-70%.

Add custom preservation instructions:

/compact preserve the list of modified files and the test results
/compact keep only the architectural decisions, drop all debugging attempts

Critical because Claude Code's default compaction preserves everything equally. Twenty messages of debugging dead-ends are worth zero tokens after compaction — tell Claude to drop them.

/clear — Start Fresh

/clear wipes context entirely. Use it when switching to an unrelated task. A context window full of auth refactoring is pure noise when you start working on payment integration.

The common mistake: continuing the same session across unrelated tasks. By hour 3, context is bloated with irrelevant history, and every new interaction carries that dead weight.

The Workflow

  1. Start a task in a fresh session or after /clear
  2. Work until context meter hits 60-70%
  3. Run /compact with specific preservation instructions
  4. Continue working
  5. When the task is done, /clear before starting the next one

Before/After

Before: Single continuous sessions running 3-4 hours. Later interactions 3-5x more expensive than early ones. After: Compact at 70%, clear between tasks. Average context size stays 40-60% lower across a workday.

Estimated savings: 20-35% of total token usage.

Combined Impact: The Full Stack

These five methods stack. Here is the combined impact:

MethodSavings EstimateApplies To
/effort tuning30-40% token reductionAll users
Good CLAUDE.md25-35% fewer wasted iterationsAll users
Gemini CLI offloading40-50% fewer Claude Code tasksAll users
Strategic subagents20-30% token reductionSubagent users
Context management20-35% token reductionAll users

The savings compound. Gemini CLI handles 40% of tasks. /effort reduces tokens on the remaining 60%. Good CLAUDE.md cuts wasted iterations within those. Context management keeps sessions lean. Combined effect: typically 50-60% reduction in Claude Code usage.

For Max 20x at $200/month, that means dropping to Max 5x at $100/month. For Max 5x at $100/month, dropping to Pro at $20/month. The AI CLI cost optimization guide covers even more strategies including free tier stacking and budget templates.

The Bottom Line

Claude Code is the best agentic coding tool available. $200/month is not the problem — using it wastefully is. These five methods are not workarounds. They are how Claude Code is designed to be used: right effort level for each task, clear project context, complementary tools for simple work, disciplined subagent use, and active context management.

Apply all five, track usage for two weeks, then decide whether your current tier is still the right one. Most developers find they can drop at least one tier without losing productivity.

DH
Free Download

Ready to streamline your terminal workflow?

Multi-terminal drag-and-drop layout, workspace Git sync, built-in AI integration, AST code analysis — all in one app.

Download Termdock →
#claude-code#cost-optimization#ai-cli#developer-tools#gemini-cli

Related Posts