syntaxai/tdd.md · commit f86b478

Add "principles" section: what TDD in agentic coding requires

The homepage now spells out the seven things that must hold for a run
to count as TDD on agentic coding, each one mapping directly to a
judge check:

- Test first (judge rejects red commits whose tests already pass)
- Honest green (judge rejects green commits whose tests still fail)
- Authoritative verification (hidden tests catch tautologies)
- No test deletion (test count must hold red→green)
- Refactor without regression (refactor commits keep tests green)
- Machine-tagged phases (red:/green:/refactor: in commit messages)
- Public, replayable verdicts (permanent URL per run)

Closes the loop between the broader "TDD for agentic coding"
positioning and what the judge actually enforces. The scoring block
is also synced with the real verdict numbers (+20 verified / +5
refactor / 0 tautology / -5 fake red or broken green or broken
refactor / -20 test deletion) — it was carrying stale +10/+10/+5/-5/-∞
copy from the original spec brainstorm.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
author
syntaxai <[email protected]>
date
2026-05-03 20:22:51 +01:00
parent
ac7aee5
commit
f86b478095f04456f2aabc9bc926dd80d7a4d8a5

1 file changed · +20 −4

modified content/home.md +20 −4
@@ -8,6 +8,20 @@
88
99 Agentic coding is here. The question is whether your agent can do it *well* — and TDD is the cleanest measure we have. tdd.md doesn't just check whether the code works. It verifies your agent got there the right way: failing test first, simplest passing impl second, refactor without regression.
1010
11+## principles
12+
13+What "TDD in agentic coding" actually requires — and what tdd.md grades on:
14+
15+1. **Test first.** No code without a failing test driving it. Red commits whose tests already pass — meaning the impl was earlier — are rejected.
16+2. **Honest green.** The simplest code that passes. Green commits whose tests still fail are rejected.
17+3. **Authoritative verification.** Your own tests aren't enough — they could be tautological. tdd.md owns hidden tests per kata step and runs them against your impl after green. Tautologies score 0.
18+4. **Tests don't disappear.** Once written, they stay. The judge counts tests across red→green and refuses any step where tests went missing.
19+5. **Refactor without regression.** Refactor commits run against the existing tests. Green-stays-green or the commit costs points.
20+6. **Phases machine-tagged.** Commit messages start with `red:`, `green:`, or `refactor:` (optionally with `(step)`). The judge replays your work from the git log alone — no reading the code by hand.
21+7. **Public, replayable verdicts.** Every run is a permanent URL at `tdd.md/<your-name>/<kata>`. Anyone can audit your trace; nothing is hidden.
22+
23+Pass all seven and you're doing TDD on agentic coding. Skip any one and the score reflects it.
24+
1125 ## the cycle
1226
1327 | phase | rule |
@@ -19,10 +33,12 @@ Agentic coding is here. The question is whether your agent can do it *well* —
1933 ## scoring
2034
2135 ```
22-+10 test fails first, then passes
23- +5 refactor improves the code without changing behaviour
24- -5 code is written before a test
25- -∞ tests are deleted to make them pass
36++20 step verified — red fails, green passes, hidden tests pass
37+ +5 refactor commit, tests stay green
38+ 0 hidden tests catch a tautological green
39+ -5 red passes already (impl was earlier) or green still fails
40+ -5 refactor breaks tests
41+-20 test count drops between red and green (deletion)
2642 ```
2743
2844 ## play