Blog / Claude Code

9 Ways to Reduce Token Usage in Claude Code

I burn less context per session since fixing these nine habits in Claude Code — from /clear and /compact to swapping screenshots for structured UI context.

Published July 4, 2026

Picture this: you’re twenty minutes into a Claude Code session, deep in a refactor, and it stops mid-thought to tell you the context window is almost full — do you want to compact? Claude Code’s version of “we need to talk,” always timed for the least convenient moment.

I used to think the fix was just… typing less. Shorter messages, fewer questions. It barely moved the needle, because the conversation itself was never the expensive part — it’s what gets loaded alongside it. Full files read “just in case,” a CLAUDE.md that had quietly grown into a wiki, three MCP servers registering tools I hadn’t touched in weeks. The chat history was the smallest line item the whole time.

Once I started treating context like a budget instead of an afterthought, sessions got a lot longer before that compact prompt showed up. Here’s what actually moved the number.

1. Use /clear between unrelated tasks

Every message carries the full conversation history forward — that’s how Claude tracks what it’s already read and changed. Finish a task and jump to something unrelated, and that old history is now just dead weight riding along on every turn.

/clear

I run this the way I’d close a browser tab I’m done with. It doesn’t touch your files, only what Claude’s holding in memory.

2. Use /compact before it’s forced on you

If you’re mid-task and the window’s filling up, don’t wait for Claude Code to compact at the worst possible moment — do it yourself:

/compact

It condenses the conversation into a summary and keeps going from there. Less thorough than a clean /clear, but it preserves the task-specific context you still need.

3. Keep CLAUDE.md scoped, not comprehensive

CLAUDE.md loads into context at the start of every session in that project. It’s tempting to let it grow into full documentation — architecture diagrams, API references, style guides — but every line in there is a tax on every session, forever, whether that session needs it or not.

A good CLAUDE.md is short: build commands, test commands, conventions Claude has gotten wrong before. Anything you’d look up in a wiki instead of memorize probably doesn’t belong there.

4. Read narrowly, not broadly

“Look at the auth module” invites Claude to read several files it didn’t need. Point it at the specific file, ideally the specific range:

Read src/auth/session.ts:42-88 and check whether the refresh logic
handles expired tokens correctly.

This is the single biggest lever in a large repo — file contents dominate context usage far more than the back-and-forth ever does.

5. Delegate exploration to sub-agents

Claude Code can spin up sub-agents to do open-ended searching in an isolated context, then report back a summary. A sprawling “find every place this function is called” search doesn’t fill up your main session with dozens of file reads this way — only the final answer does.

Use it for genuinely exploratory work, not lookups you could do yourself with grep in ten seconds.

6. Audit what your MCP servers are exposing

Every connected MCP server registers its tools’ names, descriptions, and parameter schemas into context — before you’ve called a single one. A few heavyweight servers can quietly cost thousands of tokens per session just sitting idle, like paying rent on a room you never open.

Disconnect what you’re not using for the current project. If your client supports deferred tool loading, prefer that over eagerly loading everything upfront — the Model Context Protocol spec doesn’t require every server to front-load its full tool list, it’s just the common default behavior.

Five separate messages — “now fix the button,” “now fix the spacing,” “now fix the color” — each drag the full session history forward again. If the changes are related, describe them together and let Claude plan the pass once.

8. Watch /cost, don’t guess at it

/cost

I check this before and after anything that touches a lot of files. It’s the difference between noticing a context-heavy habit after one session versus after fifty. All of these commands are covered in Claude Code’s own documentation if you want the full reference rather than just the ones I reach for most.

9. Send structured UI context instead of screenshots

This one’s specific to frontend work, but it adds up fast if that’s your day job. A screenshot forces Claude to visually interpret pixels, guess at the underlying selector, and often ask a follow-up before it can make the right edit — and each round trip re-sends the growing conversation with it.

This is the exact gap that got me building UICuts in the first place: point at any element in the browser, and it copies out the selector, computed styles, and DOM hierarchy as text — ready to paste into Claude Code instead of a screenshot. No guessing, no “which button did you mean,” no second retry just to nail the class name.

Key lessons learned

  • The chat history is rarely the expensive part — what you load into it is.
  • CLAUDE.md is a recurring cost, not a one-time setup file. Treat it like one.
  • Screenshots are a slow, lossy way to describe a UI change to a model that would rather read the actual selector.

None of these is a silver bullet alone. Stack them and a session that used to hit the compact wall in twenty minutes will run a lot longer before Claude asks if you two need to talk.

If you’re not exclusively a Claude Code shop, the same discipline carries over with different names: Cursor has its own context-length settings and a Rules system, Windsurf splits persistent context into Memories vs. Rules, and GitHub Copilot leans on a .github/copilot-instructions.md file for the same job CLAUDE.md does here. If you’re writing your own Claude Skills, the load-on-demand principle in step 6 applies there too — and if you’re calling Claude through the raw API instead of Claude Code, the prompt-engineering-level tactics pick up where this post leaves off.

Install UICuts free if frontend feedback is part of your Claude Code loop — takes about thirty seconds, no account required.

Frequently asked

Does Claude Code have a token limit? +

Claude Code runs on the underlying model's context window (e.g. 200K tokens for most Claude models). Once a session fills up, Claude Code prompts you to compact or clear before continuing.

What does /compact actually do? +

/compact asks Claude to summarize the current conversation and continues the session from that summary instead of the full history — frees up space without starting the task over.

Does /clear delete my code changes? +

No. /clear only resets what Claude remembers about the conversation. Your files on disk don't move.

How can I check my token usage in Claude Code? +

Run /cost in a session for token usage and estimated spend. /context (where available) breaks down exactly what's occupying the window.

Do CLAUDE.md files cost tokens? +

Yes — every line loads on every session in that project. A bloated CLAUDE.md is a recurring tax, not a one-time cost.

Does connecting MCP servers increase token usage? +

Yes. Each connected server registers its tools' names and schemas into context before you've called any of them. Fewer, more relevant servers keep that overhead down.

Keep reading

Less guessing.
Faster fixes!

Stop burning time on vague prompts and AI retries that miss the point.

Start using UICuts

Free plan available · No credit card · 30-second setup