The 12 CLAUDE.md Rules I Actually Keep

Mnilax reported a 12-rule CLAUDE.md cut Claude's mistake rate from 41% to 3% across 30 codebases. Here's the file, and the case for keeping all twelve — then tuning.

Mnilax published a thread in mid-May claiming a 12-rule CLAUDE.md got Claude’s mistake rate from 41% down to 3% across 30 codebases over six weeks. The headline is the part that gets quoted. The interesting bit is the shape of the gap — going from 4 rules to 12 only dropped compliance from 78% to 76%, while cutting mistakes by another 8 points. The new rules didn’t fight the original 4 for the model’s attention. They covered failure modes those 4 didn’t reach.

I pasted the file into a few projects last week. It earned its place inside two sessions. Below is the file in full — save it as CLAUDE.md at your repo root and edit from there.

# CLAUDE.md — 12-rule template

These rules apply to every task in this project unless explicitly overridden.
Bias: caution over speed on non-trivial work. Use judgment on trivial tasks.

## Rule 1 — Think Before Coding
State assumptions explicitly. If uncertain, ask rather than guess.
Present multiple interpretations when ambiguity exists.
Push back when a simpler approach exists.
Stop when confused. Name what's unclear.

## Rule 2 — Simplicity First
Minimum code that solves the problem. Nothing speculative.
No features beyond what was asked. No abstractions for single-use code.
Test: would a senior engineer say this is overcomplicated? If yes, simplify.

## Rule 3 — Surgical Changes
Touch only what you must. Clean up only your own mess.
Don't "improve" adjacent code, comments, or formatting.
Don't refactor what isn't broken. Match existing style.

## Rule 4 — Goal-Driven Execution
Define success criteria. Loop until verified.
Don't follow steps. Define success and iterate.
Strong success criteria let you loop independently.

## Rule 5 — Use the model only for judgment calls
Use me for: classification, drafting, summarization, extraction.
Do NOT use me for: routing, retries, deterministic transforms.
If code can answer, code answers.

## Rule 6 — Token budgets are not advisory
Per-task: 4,000 tokens. Per-session: 30,000 tokens.
If approaching budget, summarize and start fresh.
Surface the breach. Do not silently overrun.

## Rule 7 — Surface conflicts, don't average them
If two patterns contradict, pick one (more recent / more tested).
Explain why. Flag the other for cleanup.
Don't blend conflicting patterns.

## Rule 8 — Read before you write
Before adding code, read exports, immediate callers, shared utilities.
"Looks orthogonal" is dangerous. If unsure why code is structured a way, ask.

## Rule 9 — Tests verify intent, not just behavior
Tests must encode WHY behavior matters, not just WHAT it does.
A test that can't fail when business logic changes is wrong.

## Rule 10 — Checkpoint after every significant step
Summarize what was done, what's verified, what's left.
Don't continue from a state you can't describe back.
If you lose track, stop and restate.

## Rule 11 — Match the codebase's conventions, even if you disagree
Conformance > taste inside the codebase.
If you genuinely think a convention is harmful, surface it. Don't fork silently.

## Rule 12 — Fail loud
"Completed" is wrong if anything was skipped silently.
"Tests pass" is wrong if any were skipped.
Default to surfacing uncertainty, not hiding it.

Why All Twelve

Each line reads almost trivially — don’t assume, fail loud, read before you write. The shape of the wins is what makes them interesting. Four rules cover code-writing mistakes from January. Eight more cover the agent-orchestration mistakes that didn’t exist when those four were written: token spirals, multi-step pipelines without checkpoints, tests that don’t test, silent successes hiding silent failures.

The whole file is short enough that the model actually reads it. Past about 200 lines compliance collapses; past 14 rules the model starts pattern-matching to “rules exist” instead of reading them. Twelve fits inside the window where each rule still earns its weight.

Then Tune It

Don’t paste and walk away. These are a menu, not a recipe. Read each one and keep the ones that map to mistakes you’ve actually made. If your codebase has lint-enforced style, drop Rule 11. If you don’t run multi-step pipelines, drop Rule 10. If you don’t have tests, Rule 9 is wishful thinking.

A six-rule CLAUDE.md tuned to your real failure modes will beat a twelve-rule one with rules you’ll never trip. Mnilax’s own caveat is the right framing — the file is a behavioral contract against specific failures you’ve observed, not a wishlist.

Citations

Mnilax, “Karpathy’s 4 CLAUDE.md rules cut Claude mistakes from 41% to 11%. After 30 codebases, I added 8 more” — x.com/Mnilax/status/2053116311132155938
Forrest Chang, original 4-rule template — github.com/forrestchang/andrej-karpathy-skills
Andrej Karpathy’s January 2026 thread, referenced in the above
Research archive of the original X post — github.com/GYF0311/karpathy-code-principles-research

// tags: claude code · agents · process

Why All Twelve

Then Tune It

Citations

// comments