Red, tokens, atoms: three constraints that compound

Three pieces landed in the same week — Jesse Vincent's superpowers TDD skill, Harsh Mishra's 23 token-saving tips for Claude Code, and the rebrand of the file-naming convention this site is built on (SAMA — Sorted, Architecture, Modeled, Atomic). Each is useful on its own. The interesting thing is that they multiply: stacked, they make agentic coding cheap, correct, and reviewable in a way no single one of them delivers. Here is why they fit together.

#Red: write the failing test first

The superpowers TDD skill is a tight 200-line distillation of the discipline, written for an agent that has been told what TDD is and forgets every two prompts. Its claims:

The iron law: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST." Code before test? Delete it. Start over.
Watch the test fail. "If you didn't watch the test fail, you don't know if it tests the right thing." The verify-RED step is mandatory, not decorative.
No keeping it as reference. Don't "adapt" pre-test code while writing the test. "Delete means delete."
Tests-after answer the wrong question. Tests written after impl pass immediately, and passing immediately proves nothing.

The skill calls out the rationalizations explicitly — "too simple to test", "I'll test after", "TDD will slow me down", "tests-after achieve the same goals" — and gives each one a single line of pushback. It is a manifesto for forcing the agent to behave like a junior dev who has the discipline rather than one who has only heard about it.

"Violating the letter of the rules is violating the spirit of the rules."

That sentence is the whole point. AI agents are very good at finding the spirit-but-not-letter shortcut. The iron law removes that path.

#Tokens: clear, scope, cap, model

Mishra's piece (Analytics Vidhya, 8 May 2026) is the practitioner's complement: how to keep the agent cheap while it does the disciplined work above. The shape of his 23 tips:

Trim the context window. /clear between tasks. /compact when continuity matters. CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=70 to compact sooner than default. /context and /usage before large tasks. A status line in the terminal so you can see what you are spending in real time.

Trim the global instructions. CLAUDE.md under 200 lines. Path-scoped rules in .claude/rules/. Skills that load on demand instead of globally. Prefer CLI tools to MCP servers.

Cap the noise. MAX_MCP_OUTPUT_TOKENS=8000, BASH_MAX_OUTPUT_LENGTH=20000. Filter logs through grep before feeding them to the model. Subagents for verbose research that return a one-paragraph summary. Deny noisy paths in .claude/settings.json.

Pick the cheapest model that works. Sonnet for daily, Opus for hard reasoning. /effort low for simple tasks. CLAUDE_CODE_DISABLE_THINKING=1 if you do not need extended reasoning.

Be specific. Exact filenames over "scan the repo". Verification targets stated up front so the agent does not loop on corrections. Course-correct early when it reads irrelevant files.

"By setting strict boundaries from the outset, teams can reduce costs without compromising code quality."

Read the list and a pattern emerges: smaller surface area, sharper instructions, fewer choices. The agent stops drifting because there is nowhere to drift to.

#Atoms: SAMA — sorted, architecture, modeled, atomic

The third piece is structural. SAMA — Sorted, Architecture, Modeled, Atomic — is the file-naming convention this site is built on, shared across two other projects in my workspace. Every source file has a cXX_<name>.ts prefix where the number is its layer:

c11_server.ts          entry (Bun.serve)
c13_database.ts        SQLite
c14_github.ts          HTTP I/O
c21_app.ts             route dispatcher
c21_handlers_*.ts      route handlers per domain
c31_*.ts               models (pure types + data)
c32_*.ts               business logic (pure)
c51_render_*.ts        HTML rendering per domain

Four properties, one acronym:

Sorted. Alphabetical sort = dependency direction. ls src/ is the architecture diagram. Lower-numbered layers never import from higher ones — verifiable with one grep.
Architecture. The number is the layer; the layer is the contract. A c31_* file is a model — no I/O. A c21_* file composes lower layers — no SQL of its own. The contract is in the prefix.
Modeled. Tests live next to source: c32_session.test.ts next to c32_session.ts. Types and parse-functions live in c31_*. The shape comes before the logic.
Atomic. One responsibility per module. When a layer file passes ~700 lines, split per UI/data domain using the same prefix — c51_render_layout.ts + c51_render_reports.ts + c51_render_projects.ts. No barrel re-exports; consumers import directly from the atom.

What you get for free: a file tree where every file is small, predictable, and unambiguously placed. There is exactly one right place for any given function — and ls src/ | sort will show you where.

#Why the three compound

Each constraint helps on its own. The interesting claim is that stacked, they multiply — and not by adding up benefits. They multiply by removing the failure modes the others cannot see.

SAMA makes tokens cheap. A SAMA file is bounded above by the per-domain split rule (~700 lines, one atom). When you tell an agent to work on c32_session.ts, the relevant context is exactly that file plus its tests. You do not need /clear discipline to keep the window small — the file system already enforced it. Mishra's tip 19 ("avoid broad scans, specify exact filenames") becomes the natural way to work, not a discipline to remember.

TDD makes SAMA self-policing. The iron law forbids writing code before the test. In a SAMA codebase that means: the failing test goes in c32_session.test.ts, and the impl that makes it pass goes in c32_session.ts. The atom and its proof of correctness sit two lines apart in ls. There is no path where the impl drifts into a higher layer or into another atom — the test refuses to find it.

Tokens make TDD honest. The biggest TDD failure mode for agents — the one the rationalizations table in the superpowers skill spends most of its weight on — is the agent quietly skipping the verify-RED step. "I'll add the test in a sec." The structural fix the skill recommends is delete-and-start-over. The structural fix Mishra's tips imply is different but compatible: the agent's context window is small enough that it cannot both quietly write the impl and pretend to be writing the test. Combined, you get an agent that physically cannot get away with a tests-after move because there is not room in its head to hold both halves.

The compounding works the other way too. Without SAMA, "specify exact filenames" is hard — there is no obvious right file. Without TDD, "verification targets stated up front" lacks a place to land — the agent invents what up front means. Without token discipline, the iron law gets quietly violated as the agent's context bloats and it loses track of which step it is on.

#What this looks like in practice

This site (tdd.md, the one you are reading) runs all three:

/reports/live/tests is built TDD-first: every body-builder in c51_render_reports.ts has a sibling *.test.ts that landed before the impl.
The project's CLAUDE.md is short. Path-scoped rules live in .claude/rules/ (gitignored — they are the agent's local instructions, not project source).
Every source file is cXX_<name>.ts. There are 21 files in src/, none over 700 lines, none importing upward. Run grep -rE 'from "\\./c[5-9]' src/c1*.ts src/c2*.ts src/c3*.ts and you get an empty result — the layer rule is mechanically verifiable.

The cost of running an agent against this codebase is unusually low — Sonnet for full sessions is enough — and the diff produced is unusually reviewable. Both come from the same place: the agent has nowhere to be sloppy.

#The takeaway

The three pieces do not compete for the same slot. They occupy different ones:

layer	what it constrains	source
red	the agent's behaviour during a change	the iron law: failing test first
tokens	the agent's context per turn	trim, scope, cap, pick the cheap model
atoms	the agent's file tree to navigate	layer-prefixed, one responsibility, sorted

Pick one and the others get harder. Pick all three and they stop being disciplines you maintain — they become the path of least resistance. That is the goal, and that is why these three pieces showing up in the same week feels like the same idea told three ways.

Read the obra/superpowers TDD skill → · Read Mishra's 23 token-saving tips → · back to the blog