Adaptive Thinking Configuration Guide

Adaptive thinking on Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5: thinking + effort parameters, differences from the legacy budget_tokens, and quality vs. cost trade-offs

Adaptive Thinking Configuration Guide

Adaptive Thinking is the next-generation reasoning mode in the Claude 4.6/4.7 family: the model itself decides when to invest in reasoning and how many tokens to spend, with no fixed budget set up front by the developer. The effort parameter offers coarse-grained control over reasoning depth. This guide covers the related API usage, the model support matrix, and the quality vs. cost trade-offs.

Key Differences vs. Legacy Extended Thinking

Dimension Legacy Extended Thinking (4.5 and earlier) Adaptive Thinking (4.6 / 4.7)
Trigger parameter thinking={"type": "enabled", "budget_tokens": N} thinking={"type": "adaptive"}
Budget control Fixed token cap (budget_tokens must be < max_tokens) Model self-regulates, optionally tuned by effort
Cross-tool-call behavior Not interleaved by default (requires the interleaved-thinking-2025-05-14 beta header) Interleaved automatically (no beta header needed)
Opus 4.7 compatibility 400 error (budget_tokens has been removed) ✅ The only supported mode
Sonnet 4.6 compatibility ⚠️ deprecated (still works but not recommended) ✅ Recommended

Short rule: use adaptive for the 4.6 / 4.7 model family; only use budget_tokens for 4.5 and earlier.

Model Support Matrix

Model Adaptive thinking budget_tokens effort parameter
claude-opus-4-7 ✅ Only way to enable ❌ 400 error ✅ low / medium / high / xhigh / max
claude-sonnet-4-6 ✅ Recommended ⚠️ Still works but deprecated ✅ low / medium / high
claude-opus-4-6 ✅ Recommended ⚠️ Still works but deprecated ✅ low / medium / high / max
claude-haiku-4-5 ✅ Supported ❌ Not supported ❌ effort parameter will error
claude-sonnet-4-5 and earlier ❌ Not supported ✅ Use this

When accessing through the QCode.cc gateway, all of the above models are available (verified empirically).

The effort Parameter: A Coarse Knob for Reasoning Depth

effort controls the overall intensity of adaptive thinking (covering thinking tokens, tool-call count, and answer thoroughness).

effort Best for Approximate cost multiplier
low Simple Q&A, sub-agents, latency-sensitive paths
medium Routine coding, doc lookup (most everyday tasks) 1.5-2×
high Complex reasoning, multi-file refactoring (default) 2-3×
xhigh Long-horizon agent workflows (Opus 4.7 exclusive, default in Claude Code) 3-4×
max Maximum quality, cost is secondary (Opus 4.6 / 4.7 only)

Rule of thumb: high is the sweet spot for most interactive coding tasks; xhigh offers the best price-performance for long agent loops (e.g., multi-step code review, deep research); max is worth it only when correctness is far more important than cost (e.g., production code generation, critical architecture decisions).

API Usage Examples

Python (anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    base_url="http://103.236.53.153/api",
    api_key="cr_xxxxxxxx",
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[{"role": "user", "content": "重构 user service 模块的认证逻辑..."}],
)

for block in response.content:
    if block.type == "thinking":
        print(f"[思考] {block.thinking}")
    elif block.type == "text":
        print(f"[回答] {block.text}")

TypeScript (@anthropic-ai/sdk)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "http://103.236.53.153/api",
  apiKey: "cr_xxxxxxxx",
});

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 16000,
  thinking: { type: "adaptive" },
  output_config: { effort: "xhigh" },
  messages: [{ role: "user", content: "重构 user service..." }],
});

Claude Code CLI

In the CLI, toggle effort with the Tab key; the current effort level is shown in the status bar. Opus 4.7 defaults to xhigh — drop down to medium manually to save tokens.

Rendering Thinking Content

Opus 4.7 does not display thinking text by default (thinking.display: "omitted"); reasoning is internal-only. To surface the thinking trace in your UI:

thinking={"type": "adaptive", "display": "summarized"}

display: "summarized" tells the model to emit a thinking summary (visible thinking blocks); omitted only counts thinking tokens without producing visible text. Sonnet 4.6 / Haiku 4.5 default to summarized.

Task Budgets (Opus 4.7 Beta)

If you are running an agent loop and want a global cap for the whole turn (covering thinking + tools + final output), use task_budget:

response = client.beta.messages.create(
    betas=["task-budgets-2026-03-13"],
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 128000},
    },
    messages=[...],
)

The model sees a countdown of the remaining budget and self-regulates to avoid overshoot. The minimum total is 20,000.

Cost Observability Tips

Once adaptive thinking is enabled, both input and output tokens trend higher; monitor accordingly:

print(response.usage)
# Watch: input_tokens, output_tokens, cache_read_input_tokens
# Adaptive thinking does not bill thinking tokens separately — they roll into output_tokens

For QCode.cc users, requests are reported to probe.qcode.cc (Shenzhen direct endpoint 103.236.53.153/api only) — you can see the input / output / thinking ratio per request there.

FAQ

How do I convert from budget_tokens to adaptive thinking?

There is no linear conversion. budget_tokens=8000 does not map to a particular effort level. Start with effort: "medium" and tune based on answer quality and token usage. If you must enforce a hard cap, use task_budget (see above).

My request to Opus 4.7 returns 400 saying budget_tokens is not allowed

Replace every thinking={"type": "enabled", "budget_tokens": N} with thinking={"type": "adaptive"}. The budget_tokens field has been fully removed on Opus 4.7 (any value returns 400).

Should I keep using budget_tokens on Sonnet 4.6?

Not recommended. Use adaptive thinking for all new code. budget_tokens still works on 4.6 but is on a deprecated path and will be removed in a future version.

How do I change effort in the Claude Code CLI?

Toggle the effort shown in the CLI status bar with Tab; the persisted default lives under the "effort" field in ~/.claude/settings.json.

Next Steps

🚀
Get Started with QCode — Claude Code & Codex
One plan for both Claude Code and Codex, Asia-Pacific low latency
View Pricing Plans → Create Account
Team of 3+?
Enterprise: dedicated domain + sub-key management + ban protection, from ¥250/person/mo
Learn Enterprise →