Adaptive Thinking Configuration Guide

Adaptive thinking on Claude Opus 4.7 / Sonnet 4.6 / Haiku 4.5: thinking + effort parameters, differences from the legacy budget_tokens, and quality vs. cost trade-offs

Adaptive Thinking Configuration Guide ¶

Adaptive Thinking is the next-generation reasoning mode in the Claude 4.6/4.7 family: the model itself decides when to invest in reasoning and how many tokens to spend, with no fixed budget set up front by the developer. The effort parameter offers coarse-grained control over reasoning depth. This guide covers the related API usage, the model support matrix, and the quality vs. cost trade-offs.

Key Differences vs. Legacy Extended Thinking ¶

Dimension	Legacy Extended Thinking (4.5 and earlier)	Adaptive Thinking (4.6 / 4.7)
Trigger parameter	`thinking={"type": "enabled", "budget_tokens": N}`	`thinking={"type": "adaptive"}`
Budget control	Fixed token cap (`budget_tokens` must be < `max_tokens`)	Model self-regulates, optionally tuned by `effort`
Cross-tool-call behavior	Not interleaved by default (requires the `interleaved-thinking-2025-05-14` beta header)	Interleaved automatically (no beta header needed)
Opus 4.7 compatibility	❌ 400 error (`budget_tokens` has been removed)	✅ The only supported mode
Sonnet 4.6 compatibility	⚠️ deprecated (still works but not recommended)	✅ Recommended

Short rule: use adaptive for the 4.6 / 4.7 model family; only use budget_tokens for 4.5 and earlier.

Model Support Matrix ¶

Model	Adaptive thinking	budget_tokens	effort parameter
`claude-opus-4-7`	✅ Only way to enable	❌ 400 error	✅ low / medium / high / xhigh / max
`claude-sonnet-4-6`	✅ Recommended	⚠️ Still works but deprecated	✅ low / medium / high
`claude-opus-4-6`	✅ Recommended	⚠️ Still works but deprecated	✅ low / medium / high / max
`claude-haiku-4-5`	✅ Supported	❌ Not supported	❌ effort parameter will error
`claude-sonnet-4-5` and earlier	❌ Not supported	✅ Use this	❌

When accessing through the QCode.cc gateway, all of the above models are available (verified empirically).

The effort Parameter: A Coarse Knob for Reasoning Depth ¶

effort controls the overall intensity of adaptive thinking (covering thinking tokens, tool-call count, and answer thoroughness).

effort	Best for	Approximate cost multiplier
`low`	Simple Q&A, sub-agents, latency-sensitive paths	1×
`medium`	Routine coding, doc lookup (most everyday tasks)	1.5-2×
`high`	Complex reasoning, multi-file refactoring (default)	2-3×
`xhigh`	Long-horizon agent workflows (Opus 4.7 exclusive, default in Claude Code)	3-4×
`max`	Maximum quality, cost is secondary (Opus 4.6 / 4.7 only)	5×

Rule of thumb: high is the sweet spot for most interactive coding tasks; xhigh offers the best price-performance for long agent loops (e.g., multi-step code review, deep research); max is worth it only when correctness is far more important than cost (e.g., production code generation, critical architecture decisions).

API Usage Examples ¶

Python (anthropic SDK)¶

import anthropic

client = anthropic.Anthropic(
    base_url="http://103.236.53.153/api",
    api_key="cr_xxxxxxxx",
)

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[{"role": "user", "content": "重构 user service 模块的认证逻辑..."}],
)

for block in response.content:
    if block.type == "thinking":
        print(f"[思考] {block.thinking}")
    elif block.type == "text":
        print(f"[回答] {block.text}")

TypeScript (@anthropic-ai/sdk)¶

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  baseURL: "http://103.236.53.153/api",
  apiKey: "cr_xxxxxxxx",
});

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 16000,
  thinking: { type: "adaptive" },
  output_config: { effort: "xhigh" },
  messages: [{ role: "user", content: "重构 user service..." }],
});

Claude Code CLI ¶

In the CLI, toggle effort with the Tab key; the current effort level is shown in the status bar. Opus 4.7 defaults to xhigh — drop down to medium manually to save tokens.

Rendering Thinking Content ¶

Opus 4.7 does not display thinking text by default (thinking.display: "omitted"); reasoning is internal-only. To surface the thinking trace in your UI:

thinking={"type": "adaptive", "display": "summarized"}

display: "summarized" tells the model to emit a thinking summary (visible thinking blocks); omitted only counts thinking tokens without producing visible text. Sonnet 4.6 / Haiku 4.5 default to summarized.

Task Budgets (Opus 4.7 Beta)¶

If you are running an agent loop and want a global cap for the whole turn (covering thinking + tools + final output), use task_budget:

response = client.beta.messages.create(
    betas=["task-budgets-2026-03-13"],
    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 128000},
    },
    messages=[...],
)

The model sees a countdown of the remaining budget and self-regulates to avoid overshoot. The minimum total is 20,000.

Cost Observability Tips ¶

Once adaptive thinking is enabled, both input and output tokens trend higher; monitor accordingly:

print(response.usage)
# Watch: input_tokens, output_tokens, cache_read_input_tokens
# Adaptive thinking does not bill thinking tokens separately — they roll into output_tokens

For QCode.cc users, requests are reported to probe.qcode.cc (Shenzhen direct endpoint 103.236.53.153/api only) — you can see the input / output / thinking ratio per request there.

FAQ ¶

Model Selection Guide — pricing and capability comparison across models
Best Practices — prompting techniques for large contexts and agent workflows
Endpoints and API Paths — full reference for QCode.cc's ingress domains
Claude Code Tutorial — full CLI feature reference

← Previous

Permission Configuration

Automation & CI/CD

🚀

Get Started with QCode — Claude Code & Codex

One plan for both Claude Code and Codex, Asia-Pacific low latency

View Pricing Plans → Create Account

Team of 3+?

Enterprise: dedicated domain + sub-key management + ban protection, from ¥250/person/mo

Learn Enterprise →

Adaptive Thinking Configuration Guide