Image Input (Vision)

Feed images to Claude Code: paste, drag-and-drop, or reference a file path so the model can read screenshots, mockups, architecture diagrams, and charts. Powered by QCode.cc vision models — one API Key works across every endpoint.

Image Input (Vision)¶

Claude Code doesn't just read code — it can see images. Hand it a screenshot, a design mockup, an error screen, or an architecture diagram, and a vision-capable model will literally "read" what's in the picture, then act on it together with your prompt — turning a mockup into a UI, locating a bug from an error screenshot, or explaining an architecture diagram or data chart.

This page is about image input: you provide the picture, the model understands it. That is a different thing from "image generation."

⚠️ Don't confuse this with image generation This page is about feeding images to the model so it can understand them (vision input). If you want the AI to generate / edit images (posters, illustrations, UI mockups, cutout/background swaps), that's a separate capability — use gpt-image-2. See gpt-image-2 Image Generation. In one line: reading images = this page; drawing images = image-2.

Three ways to provide an image ¶

Inside a Claude Code session, there are three ways to give an image to the model:

Method	How	Best for
Paste	Copy an image, then press `Ctrl+V` into the input box	Images grabbed by a screenshot tool, anything in your clipboard
Drag-and-drop	Drag an image file straight into the terminal window	Image files you already have in a file manager
Reference a path	Write the image's file path in your prompt	Images already in the project, batch/scripted processing

On macOS, a screenshot goes to the clipboard by default, so Ctrl+V works right away; if the screenshot was saved as a file, drag-and-drop or a path is more reliable. Paste and drag support varies by terminal / OS — if it isn't recognized, fall back to "reference a file path," which is the most reliable option.

Reference-a-path examples ¶

> Look at the mockup ./design/dashboard.png and implement it with React + Tailwind
> What's causing this error screenshot @screenshots/build-error.png, and how do I fix it?
> Explain the system architecture in docs/architecture.png — how do the components interact?

Referencing a path is reproducible: put it in a script or a prompt template and it points to the exact same image every time, with no manual pasting.

Typical use cases ¶

1. Mockup / screenshot → UI code ¶

Give Claude Code a mockup or a screenshot of an existing page and have it reproduce runnable frontend code.

> Here's the mockup @mockups/pricing-page.png. Implement it with Next.js + Tailwind.
  Match the layout, spacing, and colors as closely as possible, with sensible component splits.

The model reads the layout, copy, and colors from the image and generates the matching component code. Fidelity depends on how clear the image is and on the constraints in your prompt (framework, styling system, naming conventions, etc.).

2. Error screenshot → locate and fix ¶

Those red errors in your IDE, browser console, or terminal — just screenshot them instead of retyping a long stack trace.

> @screenshots/ts-error.png How do I fix this TypeScript error? The relevant code is in src/api/client.ts

The model reads the error message and stack from the screenshot and, combined with the code file you point to, suggests a fix.

3. Architecture diagram / data chart → understand and explain ¶

Hand it an architecture diagram, sequence diagram, flowchart, monitoring dashboard, or data chart, and have it explain, extract data, or write code / docs from it.

> Explain the data-flow diagram @diagrams/data-flow.png and draft an API doc based on it
> Read this monitoring screenshot @screenshots/latency.png — what's the p95 latency range? Any abnormal spikes?

Which models can see images ¶

Vision is a property of the model — not every model can read images. Vision-capable models available via QCode.cc:

Model	Notes
`claude-opus-4-8`	Flagship; strong vision + reasoning, first pick for complex mockups / charts
`claude-sonnet-4-6`	Balanced choice; great value for everyday image reading / UI reproduction
GPT-5.x (`gpt-5.5` / `gpt-5.4` / `gpt-5.4-mini`)	OpenAI-family vision models with strong multimodal understanding

Lightweight text- / code-only models may not support image input. Feed an image to a non-vision model and the image is ignored or errors out — pick one of the models above. For switching models, see Model Selection.

QCode.cc's advantage: one cr_ API Key covers every protocol and model. The same key works with Claude vision models and GPT-5.x alike, switching on demand without changing accounts. For endpoints and protocol paths, see Endpoints & API Formats (users in China should prefer asia.qcode.cc).

Practical tips ¶

Keep images clear: too low a resolution or blurred text means the model can't read it accurately. Use originals for mockups and error screenshots; don't over-compress.
Don't pile on too many images at once: more and larger images burn more tokens and make it harder to stay on point. Focus on one or two key images at a time.
Spell out your intent in text: state next to the image what you want it to do ("reproduce in React," "find the cause of the error," "extract the table data") — far better than dropping an image alone.
Redact sensitive info first: mask keys, tokens, internal addresses, user data, etc. in the screenshot before sending it.
Use paths for batch scenarios: to process images in scripts / automation, "reference a file path" is the most controllable approach; combined with the headless mode in Automation & CI/CD you can batch-process.

gpt-image-2 Image Generation — generate / edit images (the opposite of "reading images" here)
Model Selection — pick a vision-capable model
Endpoints & API Formats — endpoints, protocol paths, self-test methods
Automation & CI/CD — batch-feed images in scripts

With a single QCode.cc API Key, enjoy Claude vision models and GPT-5.x multimodal power, with quota shared across every endpoint. See plans & pricing →

← Previous

Adaptive Thinking Configuration Guide

gpt-image-2 Image Generation and Editing

Image Input (Vision)

Related Documents