The cost of context shows up in the Claude Code window. System prompts, enabled tools, your messages, and large inputs (like repomix) all consume tokens you pay for.
/context
You’ll get a breakdown: system/tool overhead, messages so far, and free space remaining.
Prefer precise context (Lesson 03) first; use big bundles only when necessary.
Batch expensive operations: run repomix once, then work within that bundle.
Use a larger-window model when you truly need it:
/model sonnet[1m]
Tradeoffs exist: more context can improve answers but may increase cost and cause history to compress.
[00:00] Run slash context to be able to see how much space you have left in your conversation before it starts compressing it and you lose valuable knowledge. But also the longer the context gets the more challenging it is for Cloud Code to really zero in on the task that you gave it. So if we look at the current context of an empty session for my current setup, you'll see that I'm using the default model on the CloudMax plan with Opus, which has a total context window of 200, 000 tokens. Simply by starting a conversation you're going to lose about 17, 000 tokens. And the word token essentially means just a few characters like IDE.
[00:36] Just think of it as chunks of text. Then once you start adding in MCP tools, which we'll cover later, and I have Playwright installed and DeepWiki installed both globally, Playwright itself is going to take up another 10, 000 plus tokens, and then the custom agents you add in, which we'll dig into soon, leaves us at only about 170, 000 tokens, because we've already taken up 31, 000 tokens in our context window here. So when I talk about the cost of running a command like this repo mix command, so source and summarize these files, I'll let this run. Once this is done we can check on our context again, and you'll see the cost of that one repo mix command was about 12, 000 tokens because we brought in the entire source directory. So it's definitely something to keep in the back of your mind if you're bringing in a huge bulk of text, whether from documentation or from entire code bases, that you still need to put some limits on it to be able to fit everything into the context window and still get good results.
[01:37] Now you can switch over to in the models. You can switch over to Sonnet, which has a 1 million context mode. And then if we check our context again, you'll see we now have a much bigger budget here. But just be aware the more context you use the faster you hit your rate limits and the faster you take up your context budget. So it will always be a trade-off of better context for better solutions versus running out of context and incurring cost against rate limits or cost against the conversation having to compress itself.
[02:12] There is no silver bullet for the situation, but there are a lot of context engineering tips and tricks we'll go over in future lessons.