Adaptive Thinking Configuration Guide
Adaptive thinking on Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5: thinking + effort parameters, differences from the legacy budget_tokens, and quality vs. cost trade-offs
Adaptive Thinking Configuration Guide¶
Adaptive Thinking is the next-generation reasoning mode in the Claude 4.6/4.7 family: the model itself decides when to invest in reasoning and how many tokens to spend, with no fixed budget set up front by the developer. The effort parameter offers coarse-grained control over reasoning depth. This guide covers the related API usage, the model support matrix, and the quality vs. cost trade-offs.
Key Differences vs. Legacy Extended Thinking¶
| Dimension | Legacy Extended Thinking (4.5 and earlier) | Adaptive Thinking (4.6 / 4.7) |
|---|---|---|
| Trigger parameter | thinking={"type": "enabled", "budget_tokens": N} |
thinking={"type": "adaptive"} |
| Budget control | Fixed token cap (budget_tokens must be < max_tokens) |
Model self-regulates, optionally tuned by effort |
| Cross-tool-call behavior | Not interleaved by default (requires the interleaved-thinking-2025-05-14 beta header) |
Interleaved automatically (no beta header needed) |
| Opus 4.7 compatibility | ❌ 400 error (budget_tokens has been removed) |
✅ The only supported mode |
| Sonnet 4.6 compatibility | ⚠️ deprecated (still works but not recommended) | ✅ Recommended |
Short rule: use adaptive for the 4.6 / 4.7 model family; only use budget_tokens for 4.5 and earlier.
Model Support Matrix¶
| Model | Adaptive thinking | budget_tokens | effort parameter |
|---|---|---|---|
claude-opus-4-7 |
✅ Only way to enable | ❌ 400 error | ✅ low / medium / high / xhigh / max |
claude-sonnet-4-6 |
✅ Recommended | ⚠️ Still works but deprecated | ✅ low / medium / high |
claude-opus-4-6 |
✅ Recommended | ⚠️ Still works but deprecated | ✅ low / medium / high / max |
claude-haiku-4-5 |
✅ Supported | ❌ Not supported | ❌ effort parameter will error |
claude-sonnet-4-5 and earlier |
❌ Not supported | ✅ Use this | ❌ |
When accessing through the QCode.cc gateway, all of the above models are available (verified empirically).
The effort Parameter: A Coarse Knob for Reasoning Depth¶
effort controls the overall intensity of adaptive thinking (covering thinking tokens, tool-call count, and answer thoroughness).
| effort | Best for | Approximate cost multiplier |
|---|---|---|
low |
Simple Q&A, sub-agents, latency-sensitive paths | 1× |
medium |
Routine coding, doc lookup (most everyday tasks) | 1.5-2× |
high |
Complex reasoning, multi-file refactoring (default) | 2-3× |
xhigh |
Long-horizon agent workflows (Opus 4.7 exclusive, default in Claude Code) | 3-4× |
max |
Maximum quality, cost is secondary (Opus 4.6 / 4.7 only) | 5× |
Rule of thumb: high is the sweet spot for most interactive coding tasks; xhigh offers the best price-performance for long agent loops (e.g., multi-step code review, deep research); max is worth it only when correctness is far more important than cost (e.g., production code generation, critical architecture decisions).
API Usage Examples¶
Python (anthropic SDK)¶
import anthropic
client = anthropic.Anthropic(
base_url="http://103.236.53.153/api",
api_key="cr_xxxxxxxx",
)
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "xhigh"},
messages=[{"role": "user", "content": "重构 user service 模块的认证逻辑..."}],
)
for block in response.content:
if block.type == "thinking":
print(f"[思考] {block.thinking}")
elif block.type == "text":
print(f"[回答] {block.text}")
TypeScript (@anthropic-ai/sdk)¶
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
baseURL: "http://103.236.53.153/api",
apiKey: "cr_xxxxxxxx",
});
const response = await client.messages.create({
model: "claude-opus-4-7",
max_tokens: 16000,
thinking: { type: "adaptive" },
output_config: { effort: "xhigh" },
messages: [{ role: "user", content: "重构 user service..." }],
});
Claude Code CLI¶
In the CLI, toggle effort with the Tab key; the current effort level is shown in the status bar. Opus 4.7 defaults to xhigh — drop down to medium manually to save tokens.
Rendering Thinking Content¶
Opus 4.7 does not display thinking text by default (thinking.display: "omitted"); reasoning is internal-only. To surface the thinking trace in your UI:
thinking={"type": "adaptive", "display": "summarized"}
display: "summarized" tells the model to emit a thinking summary (visible thinking blocks); omitted only counts thinking tokens without producing visible text. Sonnet 4.6 / Haiku 4.5 default to summarized.
Task Budgets (Opus 4.7 Beta)¶
If you are running an agent loop and want a global cap for the whole turn (covering thinking + tools + final output), use task_budget:
response = client.beta.messages.create(
betas=["task-budgets-2026-03-13"],
model="claude-opus-4-7",
max_tokens=64000,
thinking={"type": "adaptive"},
output_config={
"effort": "high",
"task_budget": {"type": "tokens", "total": 128000},
},
messages=[...],
)
The model sees a countdown of the remaining budget and self-regulates to avoid overshoot. The minimum total is 20,000.
Cost Observability Tips¶
Once adaptive thinking is enabled, both input and output tokens trend higher; monitor accordingly:
print(response.usage)
# Watch: input_tokens, output_tokens, cache_read_input_tokens
# Adaptive thinking does not bill thinking tokens separately — they roll into output_tokens
For QCode.cc users, requests are reported to probe.qcode.cc (Shenzhen direct endpoint 103.236.53.153/api only) — you can see the input / output / thinking ratio per request there.
FAQ¶
How do I convert from budget_tokens to adaptive thinking?¶
There is no linear conversion. budget_tokens=8000 does not map to a particular effort level. Start with effort: "medium" and tune based on answer quality and token usage. If you must enforce a hard cap, use task_budget (see above).
My request to Opus 4.7 returns 400 saying budget_tokens is not allowed¶
Replace every thinking={"type": "enabled", "budget_tokens": N} with thinking={"type": "adaptive"}. The budget_tokens field has been fully removed on Opus 4.7 (any value returns 400).
Should I keep using budget_tokens on Sonnet 4.6?¶
Not recommended. Use adaptive thinking for all new code. budget_tokens still works on 4.6 but is on a deprecated path and will be removed in a future version.
How do I change effort in the Claude Code CLI?¶
Toggle the effort shown in the CLI status bar with Tab; the persisted default lives under the "effort" field in ~/.claude/settings.json.
Next Steps¶
- Model Selection Guide — pricing and capability comparison across models
- Best Practices — prompting techniques for large contexts and agent workflows
- Endpoints and API Paths — full reference for QCode.cc's ingress domains
- Claude Code Tutorial — full CLI feature reference