Cost Optimization Guide
Master Claude Code's cost control techniques to accomplish more with fewer tokens
Cost Optimization Guide¶
Writing code with Claude Code is enjoyable, but if you're not careful with usage, your bill might give you a "surprise". This guide helps you understand how costs are generated and how to control expenses reasonably without sacrificing efficiency.
Understanding Token Billing¶
Before optimizing costs, you need to understand the underlying logic of billing—Tokens.
What is a Token¶
A Token is the smallest unit of text processed by an AI model. You can think of it as the model's "word":
- English: 1 token ≈ 4 characters, or 0.75 words
- Chinese: 1 Chinese character ≈ 1-2 tokens (average ~1.5 tokens)
- Code: Variable names, keywords, and symbols each take different numbers of tokens
Quick estimation reference:
| Content | Approximate Characters/Lines | Approximate Token Count |
|---|---|---|
| A brief Chinese requirement description | 100 characters | ~150 tokens |
| A 200-line TypeScript file | ~5,000 characters | ~1,500 tokens |
| A typical conversational input with context | — | 5,000-20,000 tokens |
| Claude Code system prompt | — | ~8,000 tokens |
Input vs Output Tokens¶
Each interaction with Claude consists of two parts for billing:
Total Cost = Input tokens × Input rate + Output tokens × Output rate
Key point: Output tokens are typically 5x more expensive than input tokens. Using Sonnet 4.6 as an example:
| Type | Price (USD/million tokens) | Cost per 1,000 tokens |
|---|---|---|
| Input | $3.00 | $0.003 |
| Output | $15.00 | $0.015 |
| Cache Read | $0.30 | $0.0003 |
This means: having Claude generate lengthy content is more expensive than you providing more context.
How Many Tokens Does a Typical Conversation Consume¶
Using Sonnet 4.6 as an example, cost estimates for several common scenarios:
| Scenario | Input tokens | Output tokens | Estimated Cost |
|---|---|---|---|
| Simple Q&A (explaining a code snippet) | 3,000 | 500 | $0.017 |
| Modifying a function | 8,000 | 1,500 | $0.047 |
| Creating a new component (including file reading) | 15,000 | 3,000 | $0.090 |
| Complex feature development (multi-turn conversation) | 50,000 | 15,000 | $0.375 |
| Large-scale refactoring (10+ files) | 200,000 | 50,000 | $1.350 |
The above numbers are for reference only. Actual costs depend on your code volume, conversation turns, and context size.
Model Selection Strategy¶
Choosing the right model is the most direct way to save money. The price difference between different models is significant.
Three Models' Positioning¶
| Model | Input Price | Output Price | Positioning |
|---|---|---|---|
| Haiku 4.5 | $1.00 | $5.00 | Lightweight and fast, for simple daily tasks |
| Sonnet 4.6 | $3.00 | $15.00 | Best balance, primary model |
| Opus 4.6 | $5.00 | $25.00 | Most powerful, for complex tasks |
Cost Comparison: Completing the same moderately complex task (10K input, 3K output tokens):
- Haiku: $0.025
- Sonnet: $0.075
- Opus: $0.125
Sonnet is 3x Haiku, Opus is 5x Haiku.
Model Selection Principles¶
Use Sonnet as your daily default (80% of tasks can be handled by it)
Only switch to Opus for:
- Complex architecture design and technical decisions
- Large-scale code refactoring
- Cross-module modifications involving multiple systems
- Difficult bugs requiring deep reasoning
Only switch to Haiku for:
- Simple format conversions
- Generating repetitive code (like CRUD endpoints)
- Quick Q&A (questions answerable in one sentence)
- Code comments and documentation generation
Use the /model command to switch anytime:
/model sonnet # Switch to Sonnet
/model opus # Switch to Opus
/model haiku # Switch to Haiku
Context Management Cost-Saving Tips¶
Context management is core to cost optimization. Each conversation turn, Claude re-sends all previous conversation history, meaning the longer the conversation, the more input tokens per turn.
/compact to Compress History¶
When the conversation exceeds 10 turns, or you feel the response has slowed down, use /compact:
/compact
/compact makes Claude summarize previous conversation into a concise summary, replacing the complete history. This significantly reduces input tokens for the next turn.
When to use:
- Conversation exceeds 10 turns
- Completed a sub-task and ready to start the next
- Feeling Claude's responses are degrading (sign of context overload)
/clear to Start Fresh¶
When switching to a completely different topic, use /clear without hesitation:
/clear
When to use:
- Switching from Feature A development to Feature B
- Debugging complete, starting to write new code
- Conversation has deviated from the topic, want to start over
Cost of not using /clear: If your previous conversation history has 50K tokens, every turn in the new topic pays for those 50K tokens unnecessarily. At Sonnet rates, that's $0.15 extra per turn. 10 turns = $1.5 wasted.
Precise @ Referencing¶
The cost difference between letting Claude search files itself vs you directly referencing files is significant:
# Expensive! Claude may search dozens of files to find relevant code
"Help me modify the email validation part of user registration logic"
# Cheap! Directly tell Claude where to look
"Modify the regex in the validateEmail method in @src/services/auth-service.ts"
When you know which file to modify, always use @ for direct reference. Letting Claude search on its own is not only more expensive but also less accurate.
.claudeignore to Exclude Large Files¶
Ensure the following types of files are in .claudeignore:
# These files consume massive tokens if read by Claude
node_modules/ # Tens of thousands of files
dist/ # Build output
*.min.js # Minified JS
*.sql # Database dump
*.csv # Data files
package-lock.json # Lock file (enormous content)
pnpm-lock.yaml # Lock file
yarn.lock # Lock file
Real case: A user found abnormally high costs and discovered Claude had read a 50MB SQL dump file while searching code, consuming ~15M tokens in one shot (~$45 in input costs).
Leveraging Caching¶
Prompt Caching is a very important cost-saving feature of the Claude API.
What is Prompt Caching¶
When you have consecutive multi-turn conversations, every turn re-sends all previous content. Without caching, every time is billed at full input price. With caching:
- First send: Billed at normal input price (cache write is slightly higher)
- Subsequent sends: Same content hits cache, billed at cache read price (only 1/10)
Using Sonnet 4.6 as an example:
| Type | Price/million tokens | Compared to normal input |
|---|---|---|
| Normal input | $3.00 | 100% |
| Cache write | $3.75 | 125% |
| Cache read | $0.30 | 10% |
That means cache hits cost only 1/10 the price.
How to Maximize Cache Benefits¶
Caching is automatic, but you can improve cache hit rate through usage habits:
Keep conversations coherent:
# Good habit: Continue developing the same feature in one conversation
> "Create UserService basic structure"
> "Add getUserById method" # Previous content hits cache
> "Add updateUser method" # Previous content hits cache
> "Add unit tests" # Previous content hits cache
# Bad habit: /clear after every sentence
> "Create UserService basic structure"
> /clear
> "Add getUserById method to UserService" # Cannot leverage cache
> /clear
> "Add updateUser method to UserService" # Cannot leverage cache
Don't modify CLAUDE.md frequently:
CLAUDE.md content is sent every conversation turn. If the content is stable, it will hit cache long-term. Frequent modifications cause cache invalidation.
Reasonable /compact timing:
/compact replaces history, causing cache invalidation. So don't use it too frequently—only when the conversation is truly too long.
How Much Does Caching Help¶
Assuming a 10-turn conversation, each turn has ~20K input tokens (including history):
| No Cache | With Cache (90% hit rate) | |
|---|---|---|
| Total input tokens | 200K | 200K |
| Billed at normal price | 200K | 20K |
| Billed at cache price | 0 | 180K |
| Total input cost (Sonnet) | $0.60 | $0.114 |
| Savings | — | $0.486 (81%) |
Caching can save ~80% on input costs.
Monitoring and Budgeting¶
Develop the habit of monitoring costs to avoid overspending unknowingly.
/cost to View Current Session¶
At any time in Claude Code, enter:
/cost
It will show tokens consumed and estimated cost for the current session. Check it after completing each task.
QCode.cc Dashboard for History¶
Log in to QCode.cc Console and on the "Usage Statistics" page you can see:
- Model call details: Model, token count, and cost for each call
- Daily/monthly summaries: Cost trend charts
- Plan consumption progress: Current plan quota usage percentage
Develop the habit of checking the Dashboard daily to catch abnormal consumption early.
Cost Control Recommendations¶
| Usage Level | Recommended Model Strategy | Monthly Budget Reference |
|---|---|---|
| Light use (1-2 hours daily) | Primarily Sonnet | $30-80 |
| Moderate use (3-5 hours daily) | Sonnet + occasional Opus | $80-200 |
| Heavy use (full-day development) | Sonnet primary + Opus for architecture decisions | $200-500 |
Common Cost Pitfalls¶
Here are the situations that waste tokens most easily. Check if any apply to you:
Pitfall 1: Pasting Large Files Directly¶
# Wrong approach: Pasting file content into the conversation
> "Here's my code, help me review it:
[pasted 500 lines of code]"
# Correct approach: Use @ reference
> "Review the createOrder method in @src/services/order-service.ts"
Difference: Pasting 500 lines is ~1,500 input tokens, and these are re-sent every conversation turn. Using @ to reference the same file, Claude only reads it when needed, and the read content is cached.
Pitfall 2: Repeatedly Retrying the Same Failed Command¶
# Wrong approach: Asking Claude to run the same failed command repeatedly
> "Run npm run build"
# Failed
> "Try again"
# Still failed
> "One more time"
# Correct approach: Analyze the failure reason and try a different approach
> "Run npm run build"
# Failed
> "Look at the error logs, analyze the failure reason, fix the issue, then build"
Each retry sends the complete conversation history (including all previous failed outputs), and token consumption accumulates rapidly.
Pitfall 3: Forgetting /clear Causing Context Bloat¶
This is the most hidden cost pitfall:
# Morning: Fixed a bug (conversation accumulated 30K tokens of history)
# Afternoon: Started writing a new feature, but no /clear
# Every turn of the new conversation carries the morning's 30K tokens unnecessarily
# Fix: Develop the habit of /clear when switching tasks
/clear
> "Now starting development of the new feature..."
Pitfall 4: Using Opus for Simple Tasks¶
# Wasteful: Using Opus to generate a simple interface
/model opus
> "Help me define a User interface with id, name, email fields"
# Savings: Haiku is sufficient for simple tasks
/model haiku
> "Help me define a User interface with id, name, email fields"
The output from both models is almost identical for this task, but Opus costs 5x more.
Pitfall 5: Letting Claude Search the Entire Project¶
# Expensive: Claude may read dozens of files
> "Find the code that handles payments in the project"
# Cheap: Tell it roughly where to look
> "Find the service handling payments in @src/services/"
# Cheapest: Specify the file directly
> "Check @src/services/payment-service.ts"
Cost Optimization Checklist¶
Use this checklist each time you use Claude Code:
- [ ] Selected the appropriate model (use Sonnet for most tasks)
- [ ] Using
@references instead of pasting file content - [ ] Executed
/clearwhen switching topics - [ ] Considering
/compactwhen conversation exceeds 10 turns - [ ] Configured
.claudeignoreto exclude large files - [ ] Requirements description is clear enough (avoid rework due to misunderstandings)
- [ ] Checked
/costto understand current consumption
After developing these habits, you'll find costs can be reduced by 30-50% while development efficiency remains unaffected.