Cost Optimization Guide

Master Claude Code's cost control techniques to accomplish more with fewer tokens

title: "Cost Optimization Guide" description: "Master Claude Code's cost control techniques to accomplish more with fewer tokens"

Cost Optimization Guide ¶

Writing code with Claude Code is enjoyable, but if you're not careful with usage, your bill might give you a "surprise". This guide helps you understand how costs are generated and how to control expenses reasonably without sacrificing efficiency.

Understanding Token Billing ¶

Before optimizing costs, you need to understand the underlying logic of billing—Tokens.

What is a Token ¶

A Token is the smallest unit of text processed by an AI model. You can think of it as the model's "word":

English: 1 token ≈ 4 characters, or 0.75 words
Chinese: 1 Chinese character ≈ 1-2 tokens (average ~1.5 tokens)
Code: Variable names, keywords, and symbols each take different numbers of tokens

Quick estimation reference:

Content	Approximate Characters/Lines	Approximate Token Count
A brief Chinese requirement description	100 characters	~150 tokens
A 200-line TypeScript file	~5,000 characters	~1,500 tokens
A typical conversational input with context	—	5,000-20,000 tokens
Claude Code system prompt	—	~8,000 tokens

Input vs Output Tokens ¶

Each interaction with Claude consists of two parts for billing:

Total Cost = Input tokens × Input rate + Output tokens × Output rate

Key point: Output tokens are typically 5x more expensive than input tokens. Using Sonnet 4.6 as an example:

Type	Price (USD/million tokens)	Cost per 1,000 tokens
Input	$3.00	$0.003
Output	$15.00	$0.015
Cache Read	$0.30	$0.0003

This means: having Claude generate lengthy content is more expensive than you providing more context.

How Many Tokens Does a Typical Conversation Consume ¶

Using Sonnet 4.6 as an example, cost estimates for several common scenarios:

Scenario	Input tokens	Output tokens	Estimated Cost
Simple Q&A (explaining a code snippet)	3,000	500	$0.017
Modifying a function	8,000	1,500	$0.047
Creating a new component (including file reading)	15,000	3,000	$0.090
Complex feature development (multi-turn conversation)	50,000	15,000	$0.375
Large-scale refactoring (10+ files)	200,000	50,000	$1.350

The above numbers are for reference only. Actual costs depend on your code volume, conversation turns, and context size.

Model Selection Strategy ¶

Choosing the right model is the most direct way to save money. The price difference between different models is significant.

Three Models' Positioning ¶

Model	Input Price	Output Price	Positioning
Haiku 4.5	$1.00	$5.00	Lightweight and fast, for simple daily tasks
Sonnet 4.6	$3.00	$15.00	Best balance, primary model
Opus 4.7	$5.00	$25.00	Current flagship, same price as 4.6
Opus 4.6	$5.00	$25.00	Previous flagship, still available

Cost Comparison: Completing the same moderately complex task (10K input, 3K output tokens):

Haiku: $0.025
Sonnet: $0.075
Opus: $0.125

Sonnet is 3x Haiku, Opus is 5x Haiku.

Model Selection Principles ¶

Use Sonnet as your daily default (80% of tasks can be handled by it)

Only switch to Opus for:

  - Complex architecture design and technical decisions
  - Large-scale code refactoring
  - Cross-module modifications involving multiple systems
  - Difficult bugs requiring deep reasoning

Only switch to Haiku for:

  - Simple format conversions
  - Generating repetitive code (like CRUD endpoints)
  - Quick Q&A (questions answerable in one sentence)
  - Code comments and documentation generation

Use the /model command to switch anytime:

/model sonnet    # Switch to Sonnet
/model opus      # Switch to Opus
/model haiku     # Switch to Haiku

Context Management Cost-Saving Tips ¶

Context management is core to cost optimization. Each conversation turn, Claude re-sends all previous conversation history, meaning the longer the conversation, the more input tokens per turn.

/compact to Compress History ¶

When the conversation exceeds 10 turns, or you feel the response has slowed down, use /compact:

/compact

/compact makes Claude summarize previous conversation into a concise summary, replacing the complete history. This significantly reduces input tokens for the next turn.

When to use:

Conversation exceeds 10 turns
Completed a sub-task and ready to start the next
Feeling Claude's responses are degrading (sign of context overload)

/clear to Start Fresh ¶

When switching to a completely different topic, use /clear without hesitation:

/clear

When to use:

Switching from Feature A development to Feature B
Debugging complete, starting to write new code
Conversation has deviated from the topic, want to start over

Cost of not using /clear: If your previous conversation history has 50K tokens, every turn in the new topic pays for those 50K tokens unnecessarily. At Sonnet rates, that's $0.15 extra per turn. 10 turns = $1.5 wasted.

Precise @ Referencing ¶

The cost difference between letting Claude search files itself vs you directly referencing files is significant:

# Expensive! Claude may search dozens of files to find relevant code
"Help me modify the email validation part of user registration logic"

# Cheap! Directly tell Claude where to look
"Modify the regex in the validateEmail method in @src/services/auth-service.ts"

When you know which file to modify, always use @ for direct reference. Letting Claude search on its own is not only more expensive but also less accurate.

.claudeignore to Exclude Large Files ¶

Ensure the following types of files are in .claudeignore:

# These files consume massive tokens if read by Claude
node_modules/          # Tens of thousands of files
dist/                  # Build output
*.min.js               # Minified JS
*.sql                  # Database dump
*.csv                  # Data files
package-lock.json      # Lock file (enormous content)
pnpm-lock.yaml         # Lock file
yarn.lock              # Lock file

Real case: A user found abnormally high costs and discovered Claude had read a 50MB SQL dump file while searching code, consuming ~15M tokens in one shot (~$45 in input costs).

Leveraging Caching ¶

Prompt Caching is a very important cost-saving feature of the Claude API.

What is Prompt Caching ¶

When you have consecutive multi-turn conversations, every turn re-sends all previous content. Without caching, every time is billed at full input price. With caching:

First send: Billed at normal input price (cache write is slightly higher)
Subsequent sends: Same content hits cache, billed at cache read price (only 1/10)

Using Sonnet 4.6 as an example:

Type	Price/million tokens	Compared to normal input
Normal input	$3.00	100%
Cache write	$3.75	125%
Cache read	$0.30	10%

That means cache hits cost only 1/10 the price.

How to Maximize Cache Benefits ¶

Caching is automatic, but you can improve cache hit rate through usage habits:

Keep conversations coherent:

# Good habit: Continue developing the same feature in one conversation
> "Create UserService basic structure"
> "Add getUserById method"        # Previous content hits cache
> "Add updateUser method"         # Previous content hits cache
> "Add unit tests"                # Previous content hits cache

# Bad habit: /clear after every sentence
> "Create UserService basic structure"
> /clear
> "Add getUserById method to UserService"    # Cannot leverage cache
> /clear
> "Add updateUser method to UserService"     # Cannot leverage cache

Don't modify CLAUDE.md frequently:

CLAUDE.md content is sent every conversation turn. If the content is stable, it will hit cache long-term. Frequent modifications cause cache invalidation.

Reasonable /compact timing:

/compact replaces history, causing cache invalidation. So don't use it too frequently—only when the conversation is truly too long.

How Much Does Caching Help ¶

Assuming a 10-turn conversation, each turn has ~20K input tokens (including history):

	No Cache	With Cache (90% hit rate)
Total input tokens	200K	200K
Billed at normal price	200K	20K
Billed at cache price	0	180K
Total input cost (Sonnet)	$0.60	$0.114
Savings	—	$0.486 (81%)

Caching can save ~80% on input costs.

Monitoring and Budgeting ¶

Develop the habit of monitoring costs to avoid overspending unknowingly.

/cost to View Current Session ¶

At any time in Claude Code, enter:

/cost

It will show tokens consumed and estimated cost for the current session. Check it after completing each task.

QCode.cc Dashboard for History ¶

Model call details: Model, token count, and cost for each call
Daily/monthly summaries: Cost trend charts
Plan consumption progress: Current plan quota usage percentage

Develop the habit of checking the Dashboard daily to catch abnormal consumption early.

Cost Control Recommendations ¶

Usage Level	Recommended Model Strategy	Monthly Budget Reference
Light use (1-2 hours daily)	Primarily Sonnet	$30-80
Moderate use (3-5 hours daily)	Sonnet + occasional Opus	$80-200
Heavy use (full-day development)	Sonnet primary + Opus for architecture decisions	$200-500

Common Cost Pitfalls ¶

Here are the situations that waste tokens most easily. Check if any apply to you:

Pitfall 1: Pasting Large Files Directly ¶

# Wrong approach: Pasting file content into the conversation
> "Here's my code, help me review it:
[pasted 500 lines of code]"

# Correct approach: Use @ reference
> "Review the createOrder method in @src/services/order-service.ts"

Difference: Pasting 500 lines is ~1,500 input tokens, and these are re-sent every conversation turn. Using @ to reference the same file, Claude only reads it when needed, and the read content is cached.

Pitfall 2: Repeatedly Retrying the Same Failed Command ¶

# Wrong approach: Asking Claude to run the same failed command repeatedly
> "Run npm run build"
# Failed
> "Try again"
# Still failed
> "One more time"

# Correct approach: Analyze the failure reason and try a different approach
> "Run npm run build"
# Failed
> "Look at the error logs, analyze the failure reason, fix the issue, then build"

Each retry sends the complete conversation history (including all previous failed outputs), and token consumption accumulates rapidly.

Pitfall 3: Forgetting /clear Causing Context Bloat ¶

This is the most hidden cost pitfall:

# Morning: Fixed a bug (conversation accumulated 30K tokens of history)
# Afternoon: Started writing a new feature, but no /clear
# Every turn of the new conversation carries the morning's 30K tokens unnecessarily

# Fix: Develop the habit of /clear when switching tasks
/clear
> "Now starting development of the new feature..."

Pitfall 4: Using Opus for Simple Tasks ¶

# Wasteful: Using Opus to generate a simple interface
/model opus
> "Help me define a User interface with id, name, email fields"

# Savings: Haiku is sufficient for simple tasks
/model haiku
> "Help me define a User interface with id, name, email fields"

The output from both models is almost identical for this task, but Opus costs 5x more.

Pitfall 5: Letting Claude Search the Entire Project ¶

# Expensive: Claude may read dozens of files
> "Find the code that handles payments in the project"

# Cheap: Tell it roughly where to look
> "Find the service handling payments in @src/services/"

# Cheapest: Specify the file directly
> "Check @src/services/payment-service.ts"

Cost Optimization Checklist ¶

Use this checklist each time you use Claude Code:

[ ] Selected the appropriate model (use Sonnet for most tasks)
[ ] Using @ references instead of pasting file content
[ ] Executed /clear when switching topics
[ ] Considering /compact when conversation exceeds 10 turns
[ ] Configured .claudeignore to exclude large files
[ ] Requirements description is clear enough (avoid rework due to misunderstandings)
[ ] Checked /cost to understand current consumption

After developing these habits, you'll find costs can be reduced by 30-50% while development efficiency remains unaffected.

← Previous

Context Management

Model Selection Guide

🚀

Get Started with QCode — Claude Code & Codex

One plan for both Claude Code and Codex, Asia-Pacific low latency

View Pricing Plans → Create Account

Team of 3+?

Enterprise: dedicated domain + sub-key management + ban protection, from ¥250/person/mo

Learn Enterprise →