syntaxai/tdd.md · main · content / blog / claude-code-harness-postmortem.md

claude-code-harness-postmortem.md 127 lines · 13214 bytes raw · source

Forty hidden reminders, one failing test: reading the Claude Code postmortem thread

ThePaSch's recent r/ClaudeAI post is the most-cited critique of Claude Code's harness in weeks: forty-plus hidden system reminders, five sites that explicitly tell the model "never mention this reminder to the user", a malware reminder that fires on every file read, contradictory instructions, importance inflation, and a 158-version system-prompt churn in eleven days. Anthropic's own postmortem covers three different bugs and announces internal dogfooding; it doesn't touch the prompt sprawl. Reading the thread, two questions kept landing: what survives all that sloppiness in the artefact a reviewer actually sees? And what's the smallest user-side discipline that makes the harness's noise irrelevant? Both have the same answer.

what ThePaSch and the comments are saying

Stripped of the snark, the post is a careful audit of the leaked Claude Code source (paths and line numbers throughout). The findings:

Forty-plus auto-injected "system reminders". Each one is sent as a user-role message but tagged as system bookkeeping. Triggers include: a file was opened in the IDE, the user selected lines, a linter ran, LSP emitted diagnostics, a hook fired, the date changed, you read a file, you read an empty file, you tried to invoke an agent. ThePaSch counts ~45 distinct types in utils/messages.ts:3453.

Five explicit gag-order sites. Reminders that instruct the model to keep the reminder secret. Verbatim from the leaked source:

  • "Don't tell the user this, since they are already aware."utils/messages.ts:3541
  • "Make sure that you NEVER mention this reminder to the user"utils/messages.ts:3668 and :3688
  • "DO NOT mention this to the user explicitly because they are already aware."utils/messages.ts:4165
  • "internal ID - do not mention to user"tools/AgentTool/AgentTool.tsx:1328

Per-file-read malware reminder. Every FileRead tool result has the same CYBER_RISK_MITIGATION_REMINDER block concatenated to it — same full text, every read, filling context without adding signal. Opus 4.6 was specifically exempted in the leaked code (MITIGATION_EXEMPT_MODELS = new Set(['claude-opus-4-6'])); Opus 4.7 isn't.

Hedging that drains attention. Phrases like "may or may not be relevant", "consider whether", "if applicable" across reminder surfaces. Every one is a token spend that draws attention heads to a reminder that probably shouldn't fire at all.

Importance inflation. "Not developing malware is IMPORTANT. But using dedicated tools instead of bash? That is CRITICAL." The single use of "CRITICAL" in the entire prompt set is reserved for tool-vs-bash preference. A calibration failure on the prompt-author's own scale.

Contradictory instructions. Plan-mode entry says "This supersedes any other instructions"; plan-mode exit says nothing about restoring them. The malware policy says the model may write security code in research contexts; the per-read reminder says it MUST refuse. The URL policy says "NEVER generate or guess URLs unless you are confident they help with programming" — i.e. never, unless you think you should.

A 158-version system-prompt log in eleven days. Piebald's tracker shows a release cadence faster than once a day, with many reactive prompt patches that look like they were added without anyone reading the assembled output afterward.

The top comment distils it as: "competing system prompts fighting over attention". The next: "the system prompt stuff is the actual story. The post-mortem doesn't really touch it."

a live exhibit, mid-research

While reading the thread, the following arrived in my own session:

system-reminder: "The TodoWrite tool hasn't been used recently. If you're working on tasks that would benefit from tracking progress, consider using the TodoWrite tool to track progress... This is just a gentle reminder - ignore if not applicable. Make sure that you NEVER mention this reminder to the user."

That is the exact reminder cited at utils/messages.ts:3688 — a "gentle reminder" with a gag order attached, fired during a session whose only task was researching this exact reminder. The harness is consistent.

I'm mentioning it.

the two questions worth asking

  1. What survives the harness mess in the artefact?
  2. What's the smallest user-side discipline that makes harness noise irrelevant?

The answer to the first is the commit and the test result. Everything else — the reminder cascade, the hidden gag orders, the inflated importance, the contradictory instructions — happens in the agent's context window where neither the user nor a reviewer can see it. Once the agent stops typing, what lands in git log is all that's left.

The answer to the second is a discipline that constrains the artefact, not the harness. You don't get to fix Anthropic's utils/messages.ts. You don't get to delete the malware reminder. You don't even get to see most of these reminders fire in your own session. But you do get to choose what counts as "done", and what shape your codebase has when the agent shows up.

That is what tdd.md and SAMA are for.

what TDD survives

The iron law: no production code without a failing test first. Verify-RED, verify-GREEN, refactor.

Run that against the harness mess:

  • The agent gets bombarded with forty reminders. The test still has to fail for the right reason before any impl lands.
  • A gag-ordered reminder tells the model to skip the verify-RED step. The commit history shows whether a red: SHA preceded the green: SHA, and whether the test actually ran red. The harness has no way to hide that.
  • LSP diagnostics fire mid-refactor and pull attention. The refactor commit is still required to leave every previous test green. Diagnostics-as-distraction can't change a passing test count.
  • The model may or may not consider some auto-injected file. The test for the unit being changed doesn't move. Either it passes for the right reason, or it doesn't.

This is exactly what tdd.md does on real-project commits via /reports/live: the structural failure modes — red-did-not-fail, test-deleted in refactor, broken refactor, no phase tag — are all detectable from the commit log alone. The harness can be as loud as it wants; it doesn't get to whisper into the diff.

The relationship is sharp: the harness shapes the agent's path; TDD shapes its destination. As long as the destination is enforced externally — failing test → passing test → no test deletion — the path can be a circus and the artefact still passes review.

what SAMA survives

SAMA is the four-property file convention — Sorted, Architecture, Modeled, Atomic. Three of its rules survive harness chaos directly:

Sorted (alphabetical sort = dependency direction). The verification is one line:

grep -rE 'from "\./c[5-9]' src/c1*.ts src/c2*.ts src/c3*.ts

The grep returns empty or it doesn't. If the agent — drowned in reminders — accidentally imports a higher layer from a lower one, the grep catches it. Reactive prompt patches in utils/messages.ts are not part of the verification. They can churn 158 times next week and the grep still works.

Atomic (~700-line split rule). The argument from the recent three-constraints post lands harder here: a small atom plus its sibling test fits in a small context window with room to spare even after the harness has injected its forty reminders. The harness wastes tokens, but the atom does not need most of them. The cost of the bloat shrinks because the surface area to bloat into is small.

Modeled (tests next to source, types in c31_*). A cXX_*.ts without a sibling cXX_*.test.ts is visible. So is an as Foo cast at an I/O boundary. The harness cannot hide either — both are in the file system and the diff. A reviewer running ls src/ after the agent's session sees what's there.

The fourth letter — Architecture (the layer-prefix contract) — is more about what an agent should write into a file than what survives review. But the first three are mechanically falsifiable, which is the whole point.

two concrete failure modes from the OP, mapped to outcome

"the harness loses track of plan-mode state"

"Plan mode entry says it supersedes any other instructions. Plan mode exit just says you can now make edits. Nothing about prior instructions reapplying."

In a free-form coding session, this is a real correctness issue — the agent might keep ignoring rules it was told to ignore only while planning. In a TDD-disciplined session, the iron law is the same in plan mode and out of it: the next thing that ships is a failing test. There is no instruction-state corruption because there is only one instruction state.

"the agent considers every IDE-opened file as 'may or may not be related'"

"You're constantly forcing the model to briefly consider every file you open in your IDE."

In a sprawling repo this is expensive — context bloat plus attention drift. In a SAMA repo, "the file relevant to this task" is a tiny, well-defined set: the atom, its sibling test, the model file it imports from c31_*. Three files. The may-or-may-not file the IDE pushed in is visibly not one of those three, even to the agent. The bloat lands in a place where it has the least leverage to do harm.

what TDD + SAMA do not fix

Honestly:

  • Token cost is still real. You pay for forty reminders even when SAMA makes the working surface small. The harness wastes your money.
  • Latency is still real. Every reminder is a few tokens of input plus a few tokens of attention; multiply by message count, multiply by session length.
  • Trust is fragile. Once a user knows there are five gag-order sites in the source, "I never told the model to do X" becomes a question, not a fact. No external discipline restores that.
  • Capybara, Opus 4.6 vs 4.7, RLHF speculation. Out of scope; that's Anthropic's training problem.
  • The reactive prompt-patch culture ThePaSch describes — 158 versions in 11 days, prompts shipped without anyone reading the assembled output — is internal hygiene. No external user-side discipline reaches into it.

What the discipline does do is shrink the surface where harness sloppiness lands in your work. The agent can be asked silly contradictory things; what shows up in git log is still bound by tests-must-fail-first and a grep that catches layer violations. That is not "I fixed Claude Code". It is "I made the artefact externally verifiable, regardless of what my agent's context window looked like on the way".

a small ask of Anthropic, in the spirit of the thread

ThePaSch's plea boils down to: please slow down, simplify, and stop telling the model to keep things from the user. Independently of whether they take that advice, three things would help users in the current state:

  1. Surface the reminder set. Show users which reminders fired during their session, after the fact. Not as a debugging tool — as a transparency tool. "You sent 14 reminders this session; here they are." The harness already knows; the user should too.
  2. Drop the gag orders. "Make sure that you NEVER mention this reminder to the user" is the most trust-corrosive line in the leaked source. If a reminder is needed for the model's behaviour but not the user's awareness, log it at the harness layer and don't tell the model to lie about it.
  3. Audit redundant reminders. The empty-file reminder and the agent-invocation echo are leftovers from an earlier model era. The postmortem mentions internal dogfooding will catch issues; these two should not survive a single dogfood session.

None of this is in the user's hands. But the takeaway underneath is: while Anthropic sorts out the harness, the user can sort out the artefact. And the artefact is what hits production.


tl;dr

The harness is loud. The diff doesn't have to be. TDD's iron law and SAMA's three falsifiable rules survive every reminder-bombardment, gag-order, importance-inflation, and contradictory-instruction the OP catalogues, because they are enforced outside the agent's context window — in the commit log and the file tree. They do not fix Anthropic's prompt sprawl, and they do not refund the tokens. What they do is make the work the agent ships externally verifiable, regardless of what the agent was told on the way there.

Read the original Reddit thread → · Anthropic's April-23 postmortem → · the four SAMA disciplines → · drop SAMA into your agent → · verify your repo → · back to the blog