syntaxai/tdd.md · commit f86b478

Add "principles" section: what TDD in agentic coding requires

The homepage now spells out the seven things that must hold for a run
to count as TDD on agentic coding, each one mapping directly to a
judge check:

- Test first (judge rejects red commits whose tests already pass)
- Honest green (judge rejects green commits whose tests still fail)
- Authoritative verification (hidden tests catch tautologies)
- No test deletion (test count must hold red→green)
- Refactor without regression (refactor commits keep tests green)
- Machine-tagged phases (red:/green:/refactor: in commit messages)
- Public, replayable verdicts (permanent URL per run)

Closes the loop between the broader "TDD for agentic coding"
positioning and what the judge actually enforces. The scoring block
is also synced with the real verdict numbers (+20 verified / +5
refactor / 0 tautology / -5 fake red or broken green or broken
refactor / -20 test deletion) — it was carrying stale +10/+10/+5/-5/-∞
copy from the original spec brainstorm.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

author: syntaxai <[email protected]>
date: 2026-05-03 20:22:51 +01:00
parent: ac7aee5
commit: f86b478095f04456f2aabc9bc926dd80d7a4d8a5

1 file changed · +20 −4

modified content/home.md +20 −4

@@ -8,6 +8,20 @@
8	8
9	9	Agentic coding is here. The question is whether your agent can do it well — and TDD is the cleanest measure we have. tdd.md doesn't just check whether the code works. It verifies your agent got there the right way: failing test first, simplest passing impl second, refactor without regression.
10	10
	11	+## principles
	12	+
	13	+What "TDD in agentic coding" actually requires — and what tdd.md grades on:
	14	+
	15	+1. Test first. No code without a failing test driving it. Red commits whose tests already pass — meaning the impl was earlier — are rejected.
	16	+2. Honest green. The simplest code that passes. Green commits whose tests still fail are rejected.
	17	+3. Authoritative verification. Your own tests aren't enough — they could be tautological. tdd.md owns hidden tests per kata step and runs them against your impl after green. Tautologies score 0.
	18	+4. Tests don't disappear. Once written, they stay. The judge counts tests across red→green and refuses any step where tests went missing.
	19	+5. Refactor without regression. Refactor commits run against the existing tests. Green-stays-green or the commit costs points.
	20	+6. Phases machine-tagged. Commit messages start with `red:`, `green:`, or `refactor:` (optionally with `(step)`). The judge replays your work from the git log alone — no reading the code by hand.
	21	+7. Public, replayable verdicts. Every run is a permanent URL at `tdd.md/<your-name>/<kata>`. Anyone can audit your trace; nothing is hidden.
	22	+
	23	+Pass all seven and you're doing TDD on agentic coding. Skip any one and the score reflects it.
	24	+
11	25	## the cycle
12	26
13	27	\| phase \| rule \|
@@ -19,10 +33,12 @@ Agentic coding is here. The question is whether your agent can do it well —
19	33	## scoring
20	34
21	35	```
22		-+10 test fails first, then passes
23		- +5 refactor improves the code without changing behaviour
24		- -5 code is written before a test
25		- -∞ tests are deleted to make them pass
	36	++20 step verified — red fails, green passes, hidden tests pass
	37	+ +5 refactor commit, tests stay green
	38	+ 0 hidden tests catch a tautological green
	39	+ -5 red passes already (impl was earlier) or green still fails
	40	+ -5 refactor breaks tests
	41	+-20 test count drops between red and green (deletion)
26	42	```
27	43
28	44	## play

raw .diff