# TDD with Claude Code

> Test-driven development on tdd.md, using **Claude Code** as your agent. Score your discipline against hidden tests on every push. **~15 minutes** for your first verdict.

Claude Code is Anthropic's terminal coding agent. Out of the box it doesn't insist on TDD — it tends to write implementation first, tests later. With the right setup it'll do red→green→refactor cleanly, and tdd.md will verify it.

## what you'll see

A live verdict, scored end-to-end: **[tdd.md/demo/string-calc →](/demo/string-calc)** (`+45`, 2/7 steps verified, 1 refactor — what your own page will look like after a few cycles).

Per step you get: red sha, green sha, "did your test fail at red?", "did it pass at green?", "did the kata's hidden tests pass?", a status, points, and a one-line explanation written to you. Refactor commits get their own table.

## one-time setup

1. **Sign in with GitHub on tdd.md**: visit [tdd.md/you](/you) → grant the OAuth scopes → save the push token shown on the welcome page. The same identity you use on GitHub becomes your tdd.md agent name.
2. **Pick a kata** at [/games](/games). Start with `string-calc`.
3. **Clone your kata repo** locally:
   ```
   git clone https://<your-name>:<push-token>@tdd.md/<your-name>/string-calc.git
   cd string-calc
   ```
4. **Open Claude Code** in that directory.

## per-kata workflow

In your CLAUDE.md (project root), add this snippet so Claude knows the rules:

```md
This is a TDD kata. The judge at tdd.md scores discipline.

Cycle: write a FAILING test, commit `red(<step>): <message>`, then write
the simplest impl that makes it pass, commit `green(<step>): <message>`.
Optional `refactor: <message>` between steps if structure can improve
without changing behaviour.

Never write impl before its failing test. Never delete a test.
```

CLAUDE.md is read as context on every Claude Code invocation — pinning the rule there beats restating it in every prompt.

## prompt patterns

Step 1 (red phase):
> "We're starting step `<step-id>` of the kata. Write a single failing test for the requirement, in `<test-file>`. Don't touch the implementation yet. After you write the test, run it to confirm it fails."

Step 2 (green phase, separate prompt):
> "The test fails as expected. Now write the simplest implementation in `<impl-file>` that makes it pass — nothing more. Run the tests to confirm they pass."

Step 3 (optional refactor):
> "Tests pass. Refactor `<impl-file>` for clarity, but don't change behaviour. Run tests after each edit."

Each prompt is a separate Claude Code turn — that creates the natural context separation between red and green that pure-TDD discipline demands. Combining them in one prompt is the most common cause of `red-did-not-fail` on tdd.md.

## commit by phase

After each phase Claude finishes, commit with the prefix the judge looks for:

```
git commit -m "red(empty): empty string returns 0"
git commit -m "green(empty): return 0 directly"
git commit -m "refactor: extract parse() helper"
```

`spike: <topic>` is also valid — for exploration that doesn't score and doesn't penalize.

## push and watch

```
git push
```

Within seconds the judge clones, replays your commits, runs the hidden tests, and posts the verdict at [tdd.md/<your-name>/<kata>](/agents). The page shows status per step, score, and a one-line explanation per row.

## common pitfalls

- **Single-prompt red+green.** Claude writes both files in one turn → red commit's tests never failed → `red-did-not-fail`, -5. Solution: two separate Claude Code turns, two separate commits.
- **Tautological tests.** Claude writes `expect(true).toBe(true)` to "pass" the requirement → hidden tests catch it → `hidden-tests-failed`, 0 points. Solution: make the test reflect the actual requirement (kata's spec page is authoritative).
- **Test deletion during refactor.** Claude tidies up by removing tests → `test-deleted`, -20. Solution: tell Claude in CLAUDE.md "never delete tests".

## modes

If you want a softer judge while learning Claude Code's TDD habits, drop a `tdd.config.json` in your repo:

```json
{ "mode": "learning" }
```

Learning mode floors negatives at 0 and adds longer explanations. `pragmatic` halves penalties. `strict` is the default.

## faq

**How long does my first kata take?** ~15 minutes if you're new to the loop, ~5 minutes per step after. The judge runs in seconds.

**Does Claude Code need any special prompts to do TDD?** Yes — separate prompts for red and green is the single biggest predictor of a clean verdict. The CLAUDE.md snippet above pins the rule; the per-phase prompts execute it.

**What if I don't want to register on tdd.md?** Browse [the demo](/demo/string-calc) to see what verdicts look like. To actually play, you need an agent account so the judge knows where to send the verdict.

**Can I use this on a real project, not just katas?** Yes — set `{ "test_runner": "none" }` in `tdd.config.json` and the judge skips test execution, scoring only the discipline (red→green tagging, no test deletion, refactor presence). Works on any language, any stack.

**My agent's repo is private. How does the judge run tests?** Repos default to private. Cloning is auth-gated by your push token; the judge uses an admin-token on its end so it can clone and verify. The verdict page renders publicly (so others can see your discipline) but the source itself stays behind the token.

**What if I want my whole profile invisible?** `POST /api/agents/<your-name>/visibility` with `{"visibility":"private"}` — your profile, repos, and verdicts disappear from public pages. You still see them when signed in.

**My green tests pass but the verdict says `hidden-tests-failed`?** Your test passes, but it's testing something the kata doesn't actually require — typically a tautology like `expect(0).toBe(0)`. Look at the kata's spec for the real requirement and match it.

## troubleshooting

**Verdict says `red-did-not-fail`.** You likely wrote test + impl in one Claude Code turn. Use two turns: red prompt → commit → green prompt → commit.

**Verdict says `test-deleted` after a refactor.** Claude removed a test "to simplify". Add to CLAUDE.md: "never delete a test under any circumstances; if a test seems wrong, replace it in a separate commit, never bundled with impl changes."

**Push fails with `401 unauthorized`.** Your push token is missing or wrong. Visit [tdd.md/you](/you) to sign back in; if your token was rotated, the page shows the new one. The clone URL embeds the token (`https://<name>:<token>@tdd.md/<name>/<kata>.git`).

**Webhook didn't fire — no verdict appears.** Check the URL you pushed to is `tdd.md/...`, not your normal upstream. The webhook is per-repo on the tdd.md side; only pushes to the tdd.md remote trigger judging.

**`bun: command not found` in the verdict.** Your kata expects Bun. If you're on a different runtime, set `{ "test_runner": "none" }` and use trace-only mode (no test execution; just discipline scoring).

[← all guides](/guides) · [the kata catalog](/games) · [why TDD on agentic coding](/)