Won't the model lose context if I move instructions out of CLAUDE.md?

Only if you don't leave a pointer. The model follows file path references. A one-line pointer costs roughly 20 tokens. The full procedure it replaces might cost 2,000. That's a 99% saving on content that's only needed occasionally.

How do I know which rules need to stay in CLAUDE.md?

If the rule affects every interaction (like 'use TypeScript' or 'always read the manifest first'), it stays. If it only matters during specific operations (like 'here's the backup procedure'), it moves to a separate file.

Does this work with Claude Pro, not just Claude Code?

The principle applies anywhere you're injecting system prompts or project instructions. If you're pasting a long system prompt into every conversation, the same categorise-and-extract approach works. The savings are proportional to how bloated your instructions are.

What about the global CLAUDE.md in ~/.claude/?

Same principle applies. That file also gets loaded every message. If it's large, apply the same tier approach. Keep identity and routing in the global file, move detailed instructions to project-level docs.

Is 89 lines the right target?

There's no magic number. The target is: only content that genuinely needs to be in every message. For a simpler project it might be 20. For a complex multi-service platform it might be 150. The principle is the same regardless.

Back to VPS & Infra

Jimmy Goode/19 March 2026/Updated 20 April 2026/11 min read/VPS & Infra

How to Save Thousands of Tokens Per Message in Claude Code

Your CLAUDE.md is probably costing you thousands of messages a month. I cut mine from 40K to 4.8K characters. Here is how.

claude-code tokens optimization AI-tools

Your CLAUDE.md is probably costing you thousands of messages a month. Mine was 40,000 characters of server management instructions, injected into every single message. I cut it to 4,800 characters and lost zero functionality. Here's exactly how, and the maths on what it saved.

Last updated: 2026-03-27 · Tested against Claude Code v0.2.29

Why Does Your CLAUDE.md Size Matter?

Every time you send a message in Claude Code, your project's CLAUDE.md gets loaded into the conversation context. Every single message. Fresh input tokens, charged against your usage, on every interaction.

For most people with a small CLAUDE.md, this is negligible. A couple hundred characters of "use TypeScript, prefer functional components" costs basically nothing.

But if you're doing anything serious with Claude Code, your CLAUDE.md grows. Fast.

Mine managed a production VPS with 20+ Docker containers, multiple databases, deployment pipelines, safety hooks, git standards, monitoring systems, and coordination protocols for multi-device access. It had grown organically over weeks into a 948-line, 40,000-character beast.

How Much Does a Bloated CLAUDE.md Actually Cost?

Let's do the maths. One token is roughly 4 characters of English text.

Metric	Before	After	Savings
CLAUDE.md size	40,175 chars	4,856 chars	88% smaller
Tokens per message	~10,000	~1,200	~8,800 fewer
Monthly token burn (1,350 messages)	~13.6M	~1.6M	~12M tokens saved
Extra messages per month	baseline	+2,500 to 3,000	Real conversations, not wasted context

That last row is the one that matters. Those 12 million tokens I was burning on repeated instructions could instead be actual work. Actual code reviews, actual deployments, actual problem-solving.

On a Pro plan where you're bumping up against usage limits, that's the difference between running out of messages at 3pm and having capacity left for evening work.

What Was Actually in Those 948 Lines?

Before I could cut anything, I had to understand what was in there and why. Here's what my CLAUDE.md had accumulated:

Safety rules that hooks already enforce (database protection, destructive command blocking, git safety). The hooks block dangerous operations regardless of what CLAUDE.md says. Having the rules written out in CLAUDE.md was belt-and-suspenders, costing tokens for zero additional safety.

Procedure documentation for deployments, backups, and git workflows. These were step-by-step instructions that only matter when you're actually doing that operation, not on every single message.

Infrastructure reference for every service, every port, every connection string. Useful when connecting to ZeroMemory's database. Completely irrelevant when writing a commit message.

Protocol documents for blackboard coordination, lessons-learned workflows, and code quality standards. Important, but not 200 lines of important-every-message.

SSH commands, MCP configuration, slash command tables, agent role descriptions. All things that either self-document at runtime or only matter during specific operations.

The pattern was clear: about 10% of the content was needed every message (identity, core rules, routing). The other 90% was needed occasionally, during specific operations.

Terminal output showing the ZeroVPS instruction layout split across the core guide, supporting docs, and on-demand skills.

How I Structured the Fix: 4-Tier Instruction Architecture

I built what I'm calling a tiered instruction system. The core idea: only load what you need, when you need it.

Tier 1: Hooks and Settings (Always Active, Zero Tokens)

These are the deterministic safety rails. They run as code before and after every tool call. No CLAUDE.md text needed.

PreToolUse hooks block dangerous commands (DROP TABLE, rm -rf, force push)
Settings files define permission boundaries (what's allowed, what requires confirmation)
PostToolUse hooks log changes automatically

For a deeper look at how hooks fit into a production workflow, see Claude Code hooks in practice: building safety rails that actually work.

If a hook blocks something, it blocks it. Doesn't matter what CLAUDE.md says. So remove all the safety documentation from CLAUDE.md and let the hooks do their job.

Token cost: 0. Hooks run as shell scripts, not as context tokens.

Tier 2: Skills and Agents (Loaded On Demand)

Skills are markdown files that get loaded only when invoked. My /deploy skill has the full blue-green deployment procedure, SSH commands, health check logic, and rollback steps. But it only enters the context when someone types /deploy. If you want to see skills working inside a full multi-agent review pipeline, this breakdown of the AI review agents content pipeline covers exactly that.

Same with /backup, /release, /git-workflow, and 20 other skills. Each one is self-contained with all the context it needs.

Token cost: 0 until invoked, then only that skill's tokens.

Tier 3: CLAUDE.md (Every Message, Keep It Tiny)

This is the only tier that costs tokens on every message. So it contains only:

Identity: What server, what IP, what SSH aliases
Doctrine: The 12 core rules that actually need to be in every conversation
Routing: Which skill or document to read for which operation
Pointers: File paths to Tier 4 docs for detailed reference

That's it. No procedures. No infrastructure detail. No protocol documents. Just enough to route correctly and make good decisions.

Tier 4: Operational Specs (Read When Needed)

Everything else lives in standalone documents that get read on demand:

docs/code-quality-standards.md for anti-patterns and testing rules
docs/zeromemory-session-protocol.md for memory management
state/connections.md for every port, database, tunnel command, and API endpoint
state/blackboard.protocol.md for coordination rules
docs/config-sync.md for repo-to-server synchronisation

The model reads these files when the task requires them. Writing code? It'll read the quality standards. Deploying? It'll read the connection reference and deployment skill. Just answering a question about git? It reads nothing extra.

Token cost: 0 unless the specific document is needed.

How I Actually Did the Migration

This wasn't a weekend project, but it wasn't months of work either. Here's the process:

Backed up the original. Copied the full 948-line CLAUDE.md to an archive. Non-negotiable first step.
Audited every section. For each block of content, I asked: "Is this enforced by a hook? Is this in a skill? Does this need to be in every message?" If no to all three, it got extracted.
Extracted operational specs. Moved code quality standards, current-project workflow rules, and detailed protocols into standalone docs.
Added preflight reads to skills. Each skill got explicit "read this file before starting" steps, so they pull their own context.
Wrote the new CLAUDE.md. 89 lines. Doctrine, routing, pointers. Nothing else.
Ran a full simulation. Spawned 5 test agents to verify every routing path worked, every pointer resolved, every safety hook still blocked what it should.
Iterated on gaps. The simulation found issues. Missing pointers, overly aggressive hook patterns, lost operational knowledge. Fixed them all.

The simulation step was important. Without it, I would have shipped blind spots. The agents traced 15 common scenarios ("deploy an app", "create a backup", "troubleshoot a failing service") and flagged every dead end.

What Could Go Wrong (and How I Handled It)

The obvious risk: if you remove instructions from CLAUDE.md, the model might not know to read the replacement document.

This is real. My simulation found that the model couldn't discover health monitor configuration because there was no CLAUDE.md pointer to it. It found that ZeroMemory's architecture details were orphaned in a memory file with no route from CLAUDE.md.

The fix is pointers. Every operational spec needs a one-line entry in CLAUDE.md that tells the model what it is and where to find it. Without the pointer, the document might as well not exist.

The other risk: safety rules that were instructional, not enforced. My old CLAUDE.md said "never commit .env files." But no hook actually blocks that. Removing the text from CLAUDE.md would remove the instruction entirely.

The fix is honesty. My new Safety section explicitly says which rules are hook-enforced and which require discipline. No pretending everything is automated when it isn't.

How to Do This for Your Own Project

You probably don't need the full 4-tier architecture. But you can apply the principle immediately:

1. Measure your CLAUDE.md. Run wc -c CLAUDE.md. If it's over 5,000 characters, you have room to optimise.

2. Categorise every section. For each block, label it:

ALWAYS (identity, core rules, routing decisions)
SOMETIMES (procedures for specific operations)
RARELY (reference material, detailed specs)

3. Move SOMETIMES and RARELY content to separate files. Put them in docs/ or wherever makes sense for your project. We ended up with six standalone docs covering deployment, connections, quality standards, memory protocol, config sync, and blackboard coordination.

4. Replace with pointers. One line in CLAUDE.md: - Database setup guide: docs/database-setup.md

5. Test the routing. Ask Claude to do something that requires the moved content. Does it know to read the file? If not, adjust your pointer text.

That's it. You don't need hooks or skills or a simulation framework. Just measure, categorise, extract, pointer, test.

Frequently Asked Questions

Won't the model lose context if I move instructions out of CLAUDE.md?: Only if you don't leave a pointer. The model follows file path references. A one-line pointer costs ~20 tokens. The full procedure it replaces might cost 2,000. That's a 99% saving on content that's only needed occasionally.

How do I know which rules need to stay in CLAUDE.md?: If the rule affects every interaction (like "use TypeScript" or "always read the manifest first"), it stays. If it only matters during specific operations (like "here's the backup procedure"), it moves to a separate file.

Does this work with Claude Pro, not just Claude Code?: The principle applies anywhere you're injecting system prompts or project instructions. If you're pasting a long system prompt into every conversation, the same categorise-and-extract approach works. The savings are proportional to how bloated your instructions are.

What about the global CLAUDE.md in ~/.claude/?: Same principle applies. That file also gets loaded every message. If it's large, apply the same tier approach. Keep identity and routing in the global file, move detailed instructions to project-level docs.

Is 89 lines the right target?: There's no magic number. The target is: only content that genuinely needs to be in every message. For me that was 89 lines. For a simpler project it might be 20. For a complex multi-service platform it might be 150. The principle is the same regardless.

The Numbers

Before and after:

What	Before	After
CLAUDE.md	948 lines, 40K chars	89 lines, 4.8K chars
Token cost per message	~10,000	~1,200
Monthly token savings	,	~12 million
Extra messages per month	,	~2,500 to 3,000
Functionality lost	,	Zero
Safety degraded	,	No (hooks enforce it)
New docs created	,	6 operational specs
Connection reference	Scattered across 5 files	Single consolidated file

88% reduction. Same safety. Same functionality. Thousands more messages per month.

If you're spending real money on Claude usage, or constantly hitting rate limits, this is low-hanging fruit. Your CLAUDE.md is probably the biggest single source of wasted tokens in your workflow, and you can fix it in an afternoon.

The migration took me about three hours, including the simulation pass. The savings showed up immediately on the first day of normal use. Not a particularly clever solution: just measuring something that had been invisible, then restructuring it. Most optimisation problems in AI tooling are like that.

Ready to try this yourself? Download the companion agent file below. Drop it into Claude Code and it'll walk you through auditing and restructuring your own CLAUDE.md, step by step.

[Download: jimmy-goode-zerotoken-claude-instructions.md]

Want more practical guides like this? Join the newsletter for weekly content on building with AI tools. No theory, just the stuff that actually saves you time and money.

[Subscribe to the newsletter]

Measure your CLAUDE.md now. Run wc -c CLAUDE.md. If it's over 5,000 characters, the fix is waiting.

Jimmy Goode

AI Systems Architect at ZeroShot Studio

Builds production multi-agent AI systems and automation infrastructure. Previously founded and operated Australia's first communal motorcycle workshop, scaling it to 1,000+ members and $1M+ annual turnover with zero employees. Now applies that operator mindset to AI.

jimmygoode.com →

LinkedIn GitHub Instagram Reddit

How to Save Thousands of Tokens Per Message in Claude Code

Why Does Your CLAUDE.md Size Matter?

How Much Does a Bloated CLAUDE.md Actually Cost?

What Was Actually in Those 948 Lines?

How I Structured the Fix: 4-Tier Instruction Architecture

Tier 1: Hooks and Settings (Always Active, Zero Tokens)

Tier 2: Skills and Agents (Loaded On Demand)

Tier 3: CLAUDE.md (Every Message, Keep It Tiny)

Tier 4: Operational Specs (Read When Needed)

How I Actually Did the Migration

What Could Go Wrong (and How I Handled It)

How to Do This for Your Own Project

Frequently Asked Questions

The Numbers

Related Articles

Setting Up AI Coding Agents: A Practical Guide to Claude Code, Copilot, and Gemini CLI

What Are AI Agents? The Complete Guide to Autonomous AI

AI Is a Prediction Engine, Not a Brain