Blog / GitHub Copilot

How to Reduce Token Usage in GitHub Copilot

Copilot's seat-based pricing hides the token cost, but it still shows up as slower, shallower answers. Here's what actually fixes it.

Published July 4, 2026

Copilot doesn’t show you a running token counter the way a metered API would, so most people never think about it — until a chat session starts feeling sluggish, or Copilot confidently “forgets” something you told it three messages ago. That’s the token budget showing up sideways: not as a bill, but as worse answers.

I went looking for why one Copilot Chat session felt sharp and another felt like talking to someone half-listening, and it came down to the same thing every time — how much irrelevant material was sitting in context that had nothing to do with the actual question.

1. Keep .github/copilot-instructions.md short and specific

This is Copilot’s version of an always-loaded system prompt for your repo — pulled into context automatically, every time. Treat it like any other persistent context: build commands, naming conventions, things Copilot’s gotten wrong before. Not a full architecture writeup.

# .github/copilot-instructions.md
- Use TypeScript strict mode; no `any` without a comment explaining why.
- Run `npm test` before suggesting a fix is complete.
- Components live in src/components, one per file, PascalCase.

GitHub’s own custom instructions docs cover repo-wide, path-specific, and agent instruction files if you want to split this up further.

2. Close tabs you’re not working in

Copilot pulls context from open tabs to inform suggestions. Twenty unrelated files open means twenty files’ worth of noise weighed against the one you’re actually editing. Closing tabs outside your current task is free and immediate.

3. Reach for @workspace on purpose, not by habit

@workspace searches the whole repo before answering — genuinely useful when you don’t know where something lives, overkill when you already do.

@workspace Where is session refresh handled?      ← broad search, use when unsure
#file:src/auth/session.ts How does refresh work?   ← direct, cheaper

4. Pick the model for the task, not the biggest one by default

Not every request needs the largest model in the picker. Quick completions and boilerplate are usually fine on faster, smaller models — save the heavy one for multi-file reasoning where the extra context handling actually earns its keep.

5. Scope chat sessions the way you’d scope anything else

A fresh Copilot Chat conversation per unrelated task beats one long thread that’s drifted across three different features. Twenty messages about three unrelated things is twenty messages of irrelevant history riding into every new response. Copilot CLI’s context management docs cover the same idea for the CLI specifically, including the /context command for checking what’s actually loaded.

6. Prefer selections over full-file pastes

Highlighting the exact function and asking about that selection beats pasting the whole file and asking Copilot to “find the bug.” The bigger the pasted context, the more it has to sift through to find what actually matters.

7. Send structured element data instead of a screenshot for UI fixes

Describing a UI change with a screenshot makes Copilot interpret the image and infer the markup and styles underneath it — often needing a follow-up just to get the selector right. Give it the actual selector and computed styles as text and that step disappears.

That’s the whole premise behind UICuts: point at any element on a live page, and it exports the selector, styles, and hierarchy as text you paste straight into Copilot Chat instead of a screenshot.

Key lessons learned

  • Copilot’s cost doesn’t show up as a bill, it shows up as slower and shallower answers — same root cause, different symptom.
  • Open tabs are context whether you meant them to be or not. Close what you’re not using.
  • @workspace is a search, not a default — use it when you actually need one.

Every AI coding tool pays the same tax for irrelevant context, just under different names: Claude Code has CLAUDE.md and /clear//compact, Cursor has Rules and @file, Windsurf splits it into Memories and Rules, and OpenCode leans on provider/model choice more than any of them. If you’re building your own Copilot extensions or MCP integrations, the MCP-specific tactics are worth a look too.

Install UICuts free if UI feedback is part of your Copilot workflow.

Frequently asked

Does GitHub Copilot use tokens like the OpenAI API? +

Yes. Copilot Chat and agent mode run on LLMs, and every request consumes tokens for the prompt (your message, open files, instructions) plus the response. Billing is seat-based rather than metered, but usage still affects latency and answer quality.

What are copilot-instructions.md files for? +

A .github/copilot-instructions.md file gives Copilot persistent, repo-specific context instead of repeating it every chat. Like any always-loaded file, keep it short and specific.

Does having many files open in my editor affect Copilot's context? +

Yes. Copilot uses open tabs and nearby code as ambient context. Closing unrelated files cuts the irrelevant material it has to weigh.

Does switching models in Copilot Chat change token usage? +

The model picker lets you choose models with different context windows and costs. Smaller/faster models suit simple completions; save the larger model for tasks that need deep reasoning across more context.

Does @workspace cost more tokens than referencing a single file? +

Yes. @workspace triggers a broader repo search to gather context, more expensive than pointing at a specific file. Use it when you don't know where the code lives, not as a default habit.

Keep reading

Less guessing.
Faster fixes!

Stop burning time on vague prompts and AI retries that miss the point.

Start using UICuts

Free plan available · No credit card · 30-second setup