Add "principles" section: what TDD in agentic coding requires
The homepage now spells out the seven things that must hold for a run to count as TDD on agentic coding, each one mapping directly to a judge check: - Test first (judge rejects red commits whose tests already pass) - Honest green (judge rejects green commits whose tests still fail) - Authoritative verification (hidden tests catch tautologies) - No test deletion (test count must hold red→green) - Refactor without regression (refactor commits keep tests green) - Machine-tagged phases (red:/green:/refactor: in commit messages) - Public, replayable verdicts (permanent URL per run) Closes the loop between the broader "TDD for agentic coding" positioning and what the judge actually enforces. The scoring block is also synced with the real verdict numbers (+20 verified / +5 refactor / 0 tautology / -5 fake red or broken green or broken refactor / -20 test deletion) — it was carrying stale +10/+10/+5/-5/-∞ copy from the original spec brainstorm. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
1 file changed · +20 −4
content/home.md
+20
−4
| @@ -8,6 +8,20 @@ | ||
| 8 | 8 | |
| 9 | 9 | Agentic coding is here. The question is whether your agent can do it *well* — and TDD is the cleanest measure we have. tdd.md doesn't just check whether the code works. It verifies your agent got there the right way: failing test first, simplest passing impl second, refactor without regression. |
| 10 | 10 | |
| 11 | +## principles | |
| 12 | + | |
| 13 | +What "TDD in agentic coding" actually requires — and what tdd.md grades on: | |
| 14 | + | |
| 15 | +1. **Test first.** No code without a failing test driving it. Red commits whose tests already pass — meaning the impl was earlier — are rejected. | |
| 16 | +2. **Honest green.** The simplest code that passes. Green commits whose tests still fail are rejected. | |
| 17 | +3. **Authoritative verification.** Your own tests aren't enough — they could be tautological. tdd.md owns hidden tests per kata step and runs them against your impl after green. Tautologies score 0. | |
| 18 | +4. **Tests don't disappear.** Once written, they stay. The judge counts tests across red→green and refuses any step where tests went missing. | |
| 19 | +5. **Refactor without regression.** Refactor commits run against the existing tests. Green-stays-green or the commit costs points. | |
| 20 | +6. **Phases machine-tagged.** Commit messages start with `red:`, `green:`, or `refactor:` (optionally with `(step)`). The judge replays your work from the git log alone — no reading the code by hand. | |
| 21 | +7. **Public, replayable verdicts.** Every run is a permanent URL at `tdd.md/<your-name>/<kata>`. Anyone can audit your trace; nothing is hidden. | |
| 22 | + | |
| 23 | +Pass all seven and you're doing TDD on agentic coding. Skip any one and the score reflects it. | |
| 24 | + | |
| 11 | 25 | ## the cycle |
| 12 | 26 | |
| 13 | 27 | | phase | rule | |
| @@ -19,10 +33,12 @@ Agentic coding is here. The question is whether your agent can do it *well* — | ||
| 19 | 33 | ## scoring |
| 20 | 34 | |
| 21 | 35 | ``` |
| 22 | -+10 test fails first, then passes | |
| 23 | - +5 refactor improves the code without changing behaviour | |
| 24 | - -5 code is written before a test | |
| 25 | - -∞ tests are deleted to make them pass | |
| 36 | ++20 step verified — red fails, green passes, hidden tests pass | |
| 37 | + +5 refactor commit, tests stay green | |
| 38 | + 0 hidden tests catch a tautological green | |
| 39 | + -5 red passes already (impl was earlier) or green still fails | |
| 40 | + -5 refactor breaks tests | |
| 41 | +-20 test count drops between red and green (deletion) | |
| 26 | 42 | ``` |
| 27 | 43 | |
| 28 | 44 | ## play |