My CLAUDE.md Hit 500 Lines and Claude Started Ignoring It
I had a rule that said "never create branches without asking." Claude Code followed it for two months. Then my CLAUDE.md crossed 500 lines and Claude created four branches in a single session — while I had three other terminals open. Each branch switch changed the shared working tree. Fourteen phases of newsletter consolidation, gone.
Anthropic's official best practices cover the grammar of CLAUDE.md — what fields to include, how rules work, where to put skills. This article is what happens when you write a novel in that grammar. Six months, 300 instruction files (765 total when you count every reference doc, template, and brand guideline they point to), three AI agents, and a lot of things that broke. Read their docs first if you haven't. I'll show you what comes next.
Where you probably sit right now:
Stage 1: Monolithic. A single CLAUDE.md, 20-100 lines. Works for a single-project repo. Most people are here.
Stage 2: Bloated. Past 200 lines. Claude follows instructions about 80% of the time. You notice it ignoring rules you know are in the file. This is the pain point that drove everything below.
Stage 3: Modular. Rules files, a handful of skills, maybe an AGENTS.md for a second tool. You catch yourself copy-pasting the same instructions into multiple conversations. Worth reaching if you manage three or more workstreams.
Stage 4: Full system. 300 instruction files across three agents. You run multiple AI tools on the same codebase and need them to share conventions without drifting.
The specific trigger for me: running three Claude Code terminals simultaneously on a single repository, each needing different context for different workstreams. My repo serves as a knowledge operating system — eight workstreams, sixteen top-level directories, consulting and content and product work sharing one codebase. "Keep it under 200 lines" doesn't account for that.
Today: 343 lines in CLAUDE.md, 239 in AGENTS.md, 21 domain-scoped rule files, and 46 skills. Every token is load-bearing.
The Architecture — CLAUDE.md as Router, Not Encyclopedia
I stopped trying to put everything in CLAUDE.md. Instead, it became a routing layer — it doesn't contain all instructions, it connects the agent to the right instructions at the right time.
Layer 1 — Root Instruction Files
CLAUDE.md runs about 18KB across 343 lines. It handles Claude-specific features: session recovery protocol, skill chains, keyboard shortcuts, cost optimization, git workflow rules. AGENTS.md holds the shared foundation at 13KB across 239 lines: repository structure, core patterns, workstream routing, content quality principles, the full skill catalog.
Line 34 of my CLAUDE.md: "This file extends AGENTS.md." Two files, clear separation of concerns. Anything that applies to all AI agents goes in AGENTS.md. Anything Claude-specific — hooks, MCP tools, slash commands — stays in CLAUDE.md.
Explicit rules beat verbose explanations. Five clear "NEVER do X" statements in 10 lines outperform a 200-line essay. When I trimmed CLAUDE.md from 500-plus to 343 lines, compliance went up. The agent doesn't need to understand why. It needs to know what.
Layer 2 — Domain-Scoped Rules
Twenty-one rule files in .claude/rules/, each with paths: frontmatter that tells Claude when to auto-load them:
# branch-safety-rules.md
---
paths: []
description: "Single-owner repo safety — never switch branches"
---
## Rule #1: NEVER Switch Branches
Do NOT run git checkout, git switch, or any branch-changing command.
## Rule #2: NEVER Create Branches Without Victor Asking
Do not create branches as a "safety" measure or because you're on main.
paths: [] means this rule loads for every operation — a universal safety constraint, born from losing those 14 phases of work. By contrast, newsletter-rules.md specifies paths: ["14_newsletters/**"] and only activates when I'm editing newsletter files. No reason to burn context on newsletter brand guidelines when editing financial documents.
Your CLAUDE.md is the always-on baseline. Rules are domain expertise that appears only when relevant. That's how you scale to eight workstreams without building a 2,000-line monolith.
Layer 3 — Skills
Forty-six canonical skills in .claude/skills/, each in its own directory with a SKILL.md plus reference docs and templates. They load when invoked via slash commands, not at startup.
The range: a push-pr skill that wraps git workflow, a deep-planning skill that runs seven-phase task decomposition with PRD generation, dispatcher skills that route to sub-skills based on specific commands.
Total file count in .claude/skills/ alone: 643 files. Progressive disclosure keeps this manageable — Claude reads the SKILL.md first, then pulls reference files only when the workflow needs them. The official skill documentation covers the spec. My system follows it at a scale the docs don't address.
Layer 4 — The Redundancy Strategy
One detail that rarely gets discussed: in long sessions, Claude's context window fills. The system compresses earlier conversation to make room — I call this context compaction. Your CLAUDE.md instructions get summarized, sometimes effectively forgotten. A rule you set at the start of a 45-minute session might not survive to minute 40.
The defense is deliberate redundancy. The same critical instruction appears at multiple layers:
- Root CLAUDE.md states "never create branches without asking"
.claude/rules/branch-safety-rules.mdrestates it with full context- Folder-level READMEs reinforce project-specific constraints
- Skills re-state critical constraints in their own SKILL.md when relevant
Each layer is a reinforcement point. If the agent has compacted away the root instruction, the domain rule catches it. I don't have hard data proving exactly how much this helps — what I have is six months of observing that critical rules get violated less when they appear in multiple layers versus only in the root file.
What I Learned Running 3 Agents on One Codebase
The Reference-Not-Copy Pattern
Fifty skill ports in .agents/skills/ — each a thin redirect, 50-150 lines, pointing back to the canonical .claude/skills/ source. Three Gemini stubs in .gemini/skills/ follow the same pattern.
Why not symlinks? Windows core.symlinks=false breaks them across clones.
Why not copy the files? We tried. Within weeks, 50 duplicate skills drifted from their originals. Bug fixes applied to the Claude version never reached the Codex version. We deleted 471 duplicate files during cleanup. One source of truth, or you effectively have none.
The 32KB Budget Problem
Codex CLI concatenates all AGENTS.md files in the directory chain up to 32,768 bytes. Hard ceiling.
My budget: root AGENTS.md at 13KB plus four domain AGENTS.md files at about 2.5KB each, totaling 22KB. That leaves 10KB of headroom.
Every line in AGENTS.md justifies itself against one question: "Would removing this cause the agent to make a mistake?" If no, cut it. This constraint actually improved the system — Claude's instruction set can be generous with context because it loads progressively. Codex must stay tight because it loads everything at once.
Three Agents, One Instruction Set
Claude Code gets the full system: CLAUDE.md + AGENTS.md + 21 rules + 46 skills. Planning, execution, review. The primary workhorse.
Codex CLI reads AGENTS.md + 50 skill redirects. Adversarial review and read-only analysis. I invoke it through a /codex-review skill for cross-model plan validation — a second model poking holes in Claude's plans catches blind spots.
Gemini reads GEMINI.md + three skill stubs. Deep research sweeps where Gemini's larger context window helps process many documents at once.
Each agent reads the shared foundation and adds tool-specific extensions. No agent sees instructions meant for another. For teams thinking about this: AGENTS.md becomes your shared standard — conventions, architecture decisions, workflow skills that every developer's agent reads. Individual developers keep their own CLAUDE.md for personal preferences. The AGENTS.md spec is the cross-tool standard, and SmartScope's cross-tool guide confirms the pattern holds across Copilot, Cursor, and Windsurf.
What Broke — YAML Parsing, Skill Invisibility, and the 37-Skill Incident
The YAML frontmatter crisis. Thirty-seven out of 52 skills were silently broken because of nested double quotes in YAML strings. Claude couldn't discover them — they weren't malfunctioning, they were invisible. No errors, no warnings. Skills that just never loaded.
Root cause: description: "Use when user says "plan this"" — nested quotes broke YAML parsing. Fix: single quotes for trigger phrases inside double-quoted strings. I added validation:
python -c "import yaml; yaml.safe_load(open('SKILL.md').read().split('---',2)[1])"
Test your YAML programmatically. Don't trust that it works because it looks right. Anthropic's skill authoring guidance covers the format — what it doesn't warn you about is how silently it fails.
Instruction conflicts across agents. Branch safety rules designed for Claude's multi-terminal environment confused Codex, which runs in single-shot mode and has no concept of persistent branches. Fix: agent-specific rules go in agent-specific files. Branch safety lives in .claude/rules/, not in AGENTS.md.
Context overload. Past 500 lines, Claude's adherence to specific instructions dropped to about 80%. Enough to feel like the system worked, unreliable enough that critical rules got violated under load. The modular architecture was the fix — CLAUDE.md dropped from 500-plus to 343 lines by moving domain rules into .claude/rules/ and shared instructions into AGENTS.md.
At 200 lines, a CLAUDE.md matters a lot. At 500, less — signal drowns in noise.
PRD skill invocation decay. When deep-planning generates multi-phase plans, later phases — past Phase 5 — start dropping skill invocations. Instructions sit in the PRD, but the agent forgets to call them. Same phenomenon as CLAUDE.md instruction decay, at a smaller scale.
CLAUDE.md as Bootloader
Claude Code is stateless between sessions. Every conversation starts fresh. Your CLAUDE.md is the only persistent memory the agent has. But it's unreliable within sessions too — long conversations compact context, and instructions from the start get summarized or dropped.
Before I built session recovery, the first 10-15 minutes of every conversation went to re-establishing context. What was I working on? What branch am I on? What's the current state of the kanban board?
Now, my session recovery protocol fires at conversation start: check for active plan files, read the current git branch, load workstream briefs, check the kanban board, read the learning rules index. All before the agent does anything else. Claude summarizes where we left off and asks what to do next.
That's the real framing for CLAUDE.md — it's not project configuration, it's a bootloader for a stateless agent. Startup sequence, context loading, recovery procedures, safety constraints. And like any bootloader, its critical functions need to survive partial failures. Here, the partial failure is context compaction eating your instructions mid-session.
I tweak this system weekly. The session recovery protocol has been rewritten three times. The redundancy strategy evolved from frustration, not theory. If someone tells you they have the definitive CLAUDE.md approach, they either have a simpler use case or they're not paying close attention.
Should You Build This? Honestly.
Single project, one developer, one AI agent, fewer than three domains of work — a well-written 150-line CLAUDE.md gets you most of the way there. That covers 90% of users. The gap it can't close: multi-agent sharing, session recovery across workstreams, and the forgetting problem in long sessions.
The migration path. Start with CLAUDE.md. When it hits 200 lines, move domain rules into .claude/rules/. When you repeat the same workflow three times, extract it into a skill. When you add a second AI agent, create AGENTS.md as the shared layer. Each step solves a specific pain. None of this was designed upfront — it grew from running into walls.
Time investment. Six months of organic evolution through daily use. But each piece solved an immediate problem.
The real ROI isn't faster coding. It's context continuity — the agent retains your project's patterns, constraints, and decisions across sessions. That compounds. But it's fragile too. Long sessions degrade it, context compaction erodes it, and one bad YAML parse makes 37 skills invisible.
Last week Claude ignored a rule that's been in the system since month one. But it followed 46 skill invocations and 21 domain rules that would have been invisible inside a monolith. Six months ago, one bad session destroyed 14 phases of work. Now the system catches most of its own failures before I notice them. Not perfection — a system that fails less and recovers faster.
Victor Sowers builds AI-native GTM systems at STEEPWORKS.
