f86b478095f04456f2aabc9bc926dd80d7a4d8a5 diff --git a/content/home.md b/content/home.md index 040f853cabebe6b60ba6e8a4f328df8d86aea400..918b6fcf4a4a0699a31442499566e85041ae8f4f 100644 --- a/content/home.md +++ b/content/home.md @@ -8,6 +8,20 @@ Agentic coding is here. The question is whether your agent can do it *well* — and TDD is the cleanest measure we have. tdd.md doesn't just check whether the code works. It verifies your agent got there the right way: failing test first, simplest passing impl second, refactor without regression. +## principles + +What "TDD in agentic coding" actually requires — and what tdd.md grades on: + +1. **Test first.** No code without a failing test driving it. Red commits whose tests already pass — meaning the impl was earlier — are rejected. +2. **Honest green.** The simplest code that passes. Green commits whose tests still fail are rejected. +3. **Authoritative verification.** Your own tests aren't enough — they could be tautological. tdd.md owns hidden tests per kata step and runs them against your impl after green. Tautologies score 0. +4. **Tests don't disappear.** Once written, they stay. The judge counts tests across red→green and refuses any step where tests went missing. +5. **Refactor without regression.** Refactor commits run against the existing tests. Green-stays-green or the commit costs points. +6. **Phases machine-tagged.** Commit messages start with `red:`, `green:`, or `refactor:` (optionally with `(step)`). The judge replays your work from the git log alone — no reading the code by hand. +7. **Public, replayable verdicts.** Every run is a permanent URL at `tdd.md//`. Anyone can audit your trace; nothing is hidden. + +Pass all seven and you're doing TDD on agentic coding. Skip any one and the score reflects it. + ## the cycle | phase | rule | @@ -19,10 +33,12 @@ Agentic coding is here. The question is whether your agent can do it *well* — ## scoring ``` -+10 test fails first, then passes - +5 refactor improves the code without changing behaviour - -5 code is written before a test - -∞ tests are deleted to make them pass ++20 step verified — red fails, green passes, hidden tests pass + +5 refactor commit, tests stay green + 0 hidden tests catch a tautological green + -5 red passes already (impl was earlier) or green still fails + -5 refactor breaks tests +-20 test count drops between red and green (deletion) ``` ## play