How to Save Thousands of Tokens Per Message in Claude Code
Your CLAUDE.md is probably costing you thousands of messages a month. I cut mine from 40K to 4.8K characters and lost zero functionality. Here is the maths and the method.
Your CLAUDE.md is probably costing you thousands of messages a month. Mine was 40,000 characters of server management instructions, injected into every single message. I cut it to 4,800 characters and lost zero functionality. Here's exactly how, and the maths on what it saved.
A bloated CLAUDE.md file gets injected as context with every message you send, eating your token budget silently. I restructured mine from 948 lines to 89 lines using a 4-tier instruction architecture. The result: ~8,800 fewer input tokens per message, which translates to roughly 2,500-3,000 extra messages per month on a Pro plan. Same functionality, same safety, fraction of the cost.
Why Does Your CLAUDE.md Size Matter?
Every time you send a message in Claude Code, your project's CLAUDE.md gets loaded into the conversation context. Every. Single. Message. It's not a one-time cost. It's not cached between turns. It's fresh input tokens, charged against your usage, on every interaction.
For most people with a small CLAUDE.md, this is negligible. A couple hundred characters of "use TypeScript, prefer functional components" costs basically nothing.
But if you're doing anything serious with Claude Code, your CLAUDE.md grows. Fast.
Mine managed a production VPS with 20+ Docker containers, multiple databases, deployment pipelines, safety hooks, git standards, monitoring systems, and coordination protocols for multi-device access. It had grown organically over weeks into a 948-line, 40,000-character beast.
Key takeaway: If your CLAUDE.md is over 5,000 characters, you're likely burning thousands of tokens per message on instructions that could be loaded on demand instead.
How Much Does a Bloated CLAUDE.md Actually Cost?
Let's do the maths. One token is roughly 4 characters of English text.
| Metric | Before | After | Savings |
|---|---|---|---|
| CLAUDE.md size | 40,175 chars | 4,856 chars | 88% smaller |
| Tokens per message | ~10,000 | ~1,200 | ~8,800 fewer |
| Monthly token burn (1,350 messages) | ~13.6M | ~1.6M | ~12M tokens saved |
| Extra messages per month | baseline | +2,500 to 3,000 | Real conversations, not wasted context |
That last row is the one that matters. Those 12 million tokens I was burning on repeated instructions could instead be actual work. Actual code reviews, actual deployments, actual problem-solving.
On a Pro plan where you're bumping up against usage limits, that's the difference between running out of messages at 3pm and having capacity left for evening work.
Key takeaway: A 40K CLAUDE.md costs roughly 10,000 input tokens per message. Over a month of active use, that's 12 million tokens you're paying for instructions the model already knows.
What Was Actually in Those 948 Lines?
Before I could cut anything, I had to understand what was in there and why. Here's what my CLAUDE.md had accumulated:
Safety rules that hooks already enforce (database protection, destructive command blocking, git safety). The hooks block dangerous operations regardless of what CLAUDE.md says. Having the rules written out in CLAUDE.md was belt-and-suspenders, costing tokens for zero additional safety.
Procedure documentation for deployments, backups, and git workflows. These were step-by-step instructions that only matter when you're actually doing that operation, not on every single message.
Infrastructure reference for every service, every port, every connection string. Useful when connecting to ZeroMemory's database. Completely irrelevant when writing a commit message.
Protocol documents for blackboard coordination, lessons-learned workflows, and code quality standards. Important, but not 200 lines of important-every-message.
SSH commands, MCP configuration, slash command tables, agent role descriptions. All things that either self-document at runtime or only matter during specific operations.
The pattern was clear: about 10% of the content was needed every message (identity, core rules, routing). The other 90% was needed occasionally, during specific operations.
How I Structured the Fix: 4-Tier Instruction Architecture
I built what I'm calling a tiered instruction system. The core idea: only load what you need, when you need it.
Tier 1: Hooks and Settings (Always Active, Zero Tokens)
These are the deterministic safety rails. They run as code before and after every tool call. No CLAUDE.md text needed.
- PreToolUse hooks block dangerous commands (DROP TABLE, rm -rf, force push)
- Settings files define permission boundaries (what's allowed, what requires confirmation)
- PostToolUse hooks log changes automatically
If a hook blocks something, it blocks it. Doesn't matter what CLAUDE.md says. So remove all the safety documentation from CLAUDE.md and let the hooks do their job.
Token cost: 0. Hooks run as shell scripts, not as context tokens.
Tier 2: Skills and Agents (Loaded On Demand)
Skills are markdown files that get loaded only when invoked. My /deploy skill has the full blue-green deployment procedure, SSH commands, health check logic, and rollback steps. But it only enters the context when someone types /deploy.
Same with /backup, /release, /git-workflow, and 20 other skills. Each one is self-contained with all the context it needs.
Token cost: 0 until invoked, then only that skill's tokens.
Tier 3: CLAUDE.md (Every Message, Keep It Tiny)
This is the only tier that costs tokens on every message. So it contains only:
- Identity: What server, what IP, what SSH aliases
- Doctrine: The 12 core rules that actually need to be in every conversation
- Routing: Which skill or document to read for which operation
- Pointers: File paths to Tier 4 docs for detailed reference
That's it. No procedures. No infrastructure detail. No protocol documents. Just enough to route correctly and make good decisions.
Tier 4: Operational Specs (Read When Needed)
Everything else lives in standalone documents that get read on demand:
docs/code-quality-standards.mdfor anti-patterns and testing rulesdocs/zeromemory-session-protocol.mdfor memory managementstate/connections.mdfor every port, database, tunnel command, and API endpointstate/blackboard.protocol.mdfor coordination rulesdocs/config-sync.mdfor repo-to-server synchronisation
The model reads these files when the task requires them. Writing code? It'll read the quality standards. Deploying? It'll read the connection reference and deployment skill. Just answering a question about git? It reads nothing extra.
Token cost: 0 unless the specific document is needed.
[IMAGE: diagram showing the 4 tiers as layers, with token cost on the right side]
- Type: diagram
- Filename: zerotoken-4-tier-architecture.png
- Alt text: Four-tier instruction architecture showing hooks at the base (zero tokens), skills and agents above (on-demand), CLAUDE.md in the middle (every message, kept small), and operational specs at the top (read when needed)
- Caption: The 4-tier instruction architecture. Only Tier 3 costs tokens every message.
How I Actually Did the Migration
This wasn't a weekend project, but it wasn't months of work either. Here's the process:
-
Backed up the original. Copied the full 948-line CLAUDE.md to an archive. Non-negotiable first step.
-
Audited every section. For each block of content, I asked: "Is this enforced by a hook? Is this in a skill? Does this need to be in every message?" If no to all three, it got extracted.
-
Extracted operational specs. Moved code quality standards, current-project workflow rules, and detailed protocols into standalone docs.
-
Added preflight reads to skills. Each skill got explicit "read this file before starting" steps, so they pull their own context.
-
Wrote the new CLAUDE.md. 89 lines. Doctrine, routing, pointers. Nothing else.
-
Ran a full simulation. Spawned 5 test agents to verify every routing path worked, every pointer resolved, every safety hook still blocked what it should.
-
Iterated on gaps. The simulation found issues. Missing pointers, overly aggressive hook patterns, lost operational knowledge. Fixed them all.
The simulation step was crucial. Without it, I would have shipped blind spots. The agents traced 15 common scenarios ("deploy an app", "create a backup", "troubleshoot a failing service") and flagged every dead end.
What Could Go Wrong (and How I Handled It)
The obvious risk: if you remove instructions from CLAUDE.md, the model might not know to read the replacement document.
This is real. My simulation found that the model couldn't discover health monitor configuration because there was no CLAUDE.md pointer to it. It found that ZeroMemory's architecture details were orphaned in a memory file with no route from CLAUDE.md.
The fix is pointers. Every operational spec needs a one-line entry in CLAUDE.md that tells the model what it is and where to find it. Without the pointer, the document might as well not exist.
The other risk: safety rules that were instructional, not enforced. My old CLAUDE.md said "never commit .env files." But no hook actually blocks that. Removing the text from CLAUDE.md would remove the instruction entirely.
The fix is honesty. My new Safety section explicitly says which rules are hook-enforced and which require discipline. No pretending everything is automated when it isn't.
How to Do This for Your Own Project
You probably don't need the full 4-tier architecture. But you can apply the principle immediately:
1. Measure your CLAUDE.md. Run wc -c CLAUDE.md. If it's over 5,000 characters, you have room to optimise.
2. Categorise every section. For each block, label it:
- ALWAYS (identity, core rules, routing decisions)
- SOMETIMES (procedures for specific operations)
- RARELY (reference material, detailed specs)
3. Move SOMETIMES and RARELY content to separate files. Put them in docs/ or wherever makes sense for your project.
4. Replace with pointers. One line in CLAUDE.md: - Database setup guide: docs/database-setup.md
5. Test the routing. Ask Claude to do something that requires the moved content. Does it know to read the file? If not, adjust your pointer text.
That's it. You don't need hooks or skills or a simulation framework. Just measure, categorise, extract, pointer, test.
Frequently Asked Questions
Q: Won't the model lose context if I move instructions out of CLAUDE.md? A: Only if you don't leave a pointer. The model follows file path references. A one-line pointer costs ~20 tokens. The full procedure it replaces might cost 2,000. That's a 99% saving on content that's only needed occasionally.
Q: How do I know which rules need to stay in CLAUDE.md? A: If the rule affects every interaction (like "use TypeScript" or "always read the manifest first"), it stays. If it only matters during specific operations (like "here's the backup procedure"), it moves to a separate file.
Q: Does this work with Claude Pro, not just Claude Code? A: The principle applies anywhere you're injecting system prompts or project instructions. If you're pasting a long system prompt into every conversation, the same categorise-and-extract approach works. The savings are proportional to how bloated your instructions are.
Q: What about the global CLAUDE.md in ~/.claude/? A: Same principle applies. That file also gets loaded every message. If it's large, apply the same tier approach. Keep identity and routing in the global file, move detailed instructions to project-level docs.
Q: Is 89 lines the right target? A: There's no magic number. The target is: only content that genuinely needs to be in every message. For me that was 89 lines. For a simpler project it might be 20. For a complex multi-service platform it might be 150. The principle is the same regardless.
The Numbers
Before and after:
| What | Before | After |
|---|---|---|
| CLAUDE.md | 948 lines, 40K chars | 89 lines, 4.8K chars |
| Token cost per message | ~10,000 | ~1,200 |
| Monthly token savings | — | ~12 million |
| Extra messages per month | — | ~2,500 to 3,000 |
| Functionality lost | — | Zero |
| Safety degraded | — | No (hooks enforce it) |
| New docs created | — | 6 operational specs |
| Connection reference | Scattered across 5 files | Single consolidated file |
88% reduction. Same safety. Same functionality. Thousands more messages per month.
If you're spending real money on Claude usage, or constantly hitting rate limits, this is low-hanging fruit. Your CLAUDE.md is probably the biggest single source of wasted tokens in your workflow, and you can fix it in an afternoon.
Ready to try this yourself? Download the companion agent file below. Drop it into Claude Code and it'll walk you through auditing and restructuring your own CLAUDE.md, step by step.
[Download: jimmy-goode-zerotoken-claude-instructions.md]
Want more practical guides like this? Join the newsletter for weekly content on building with AI tools. No theory, just the stuff that actually saves you time and money.
[Subscribe to the newsletter]