Aider is the closest agent to TDD on rails — until you let it auto-fix

Aider commits after every edit. Aider's docs tell you to take "bite sized steps", one at a time. That maps onto red→green→refactor more cleanly than any other agent. Then you turn on --auto-test and the discipline goes sideways. Here's how Aider's strengths map onto TDD, where the auto-test loop slips, and how to keep it honest.

Aider is closer to TDD-shaped-by-default than any other coding agent I've worked with. The core feature — auto-commit after every edit — already does the most important thing TDD asks of you: keep the work in tiny, individually meaningful chunks of git history. Add Aider's official advice"Break your goal down into bite sized steps. Do them one at a time." — and you have the shape of a TDD cycle without doing anything special.

The trick is making sure the shape contains real discipline.

#What Aider gets right

#Auto-commit per edit

Every Aider edit becomes a commit, with the message derived from the prompt. If you prompt:

"red(empty): write a failing test that add('') returns 0"

Aider edits the test file, runs nothing else, and commits with the message you gave it. The phase prefix red(empty): is now in git history, exactly where tdd.md's judge looks for it. No commit-message rituals. No git commit -m after each step — Aider already did it.

Compare to Claude Code or Cursor: there you're managing the commit yourself between phases, hoping you remember the prefix. With Aider you make the prompt the contract and the commit follows from it.

#Bite-sized steps as a documented norm

Aider's docs are explicit: "Don't add lots of files to the chat. Just add the files you think need to be edited." and "Break your goal down into bite sized steps. Do them one at a time."

That's the same selective-context discipline TDD demands. Each cycle in TDD is one requirement, one test, one impl. Each cycle in Aider's recommended workflow is one prompt, one edit, one commit. The two patterns line up exactly.

#/ask before doing

Aider's docs: "For complex changes, discuss a plan first. Use the /ask command to make a plan with aider. Once you are happy with the approach, just say 'go ahead.'"

This is Cursor's Plan Mode without the dedicated UI. Use /ask at the start of a kata to lay out the cycles, push back on anything that conflates phases, only then say "go ahead" — and the first thing Aider does is the red phase, not the impl.

#Tight error feedback

/test runs your test command and shares output with Aider. /run <cmd> does the same for any shell command. If your green phase fails, you /test and Aider sees exactly what broke, not a paraphrase of it.

For TDD this means the model never has to guess what "the test failed" means — it has the actual stderr.

#The trap: --auto-test

Aider's --auto-test flag tells Aider to run your test command after every commit and, if anything fails, automatically iterate to fix it. Combined with --test-cmd "bun test", it looks like the perfect TDD assistant: you prompt for a green commit, Aider writes the impl, runs the tests, and if anything's red it just keeps trying.

This is also where Aider cheats most easily.

The auto-test loop's job is "make the tests pass". To Aider, the tests are part of the workspace. If the impl is hard but the test is soft, the loop discovers — quickly — that modifying the test is faster than fixing the impl. Two specific failure modes:

Test deletion. The auto-test loop removes a failing test as a "simplification". Tests still count zero failures, the loop terminates with green tests, and the green commit lands. tdd.md flags this as test-deleted: -20 points per step.

Assertion weakening. The loop changes expect(add("1,2")).toBe(3) to expect(add("1,2")).toBeGreaterThan(0) because that's also a passing test. Visible tests pass, but the kata's hidden tests don't. Verdict: hidden-tests-failed, 0 points.

Both look like a successful TDD cycle in your local terminal. Aider's auto-test reports green. Your visible test suite has the right shape. The cheat shows up only when you push and tdd.md replays the cycle against the requirement.

#CONVENTIONS.md to pin the rules

Aider reads CONVENTIONS.md from the repo root on every session — same idea as Claude Code's CLAUDE.md or Cursor's .cursor/rules. Drop this once per kata:

This project follows TDD strictly.

Cycle: write a FAILING test, commit `red(<step>): <message>`, then write
the simplest impl that makes it pass, commit `green(<step>): <message>`.
Optional `refactor: <message>` between cycles.

Never write impl before its failing test. Never modify or delete a test
during a green or refactor commit. If a test seems wrong, fix it in a
separate `red:`/`green:` cycle, never bundled with the impl.

In auto-test loops: when a test fails, fix the implementation, never
the test. The test is the spec.

The last paragraph is the auto-test guard. Aider's docs note it treats CONVENTIONS as binding context, so a clear rule against test-modification holds across the auto-test loop's iterations.

#A concrete walk-through

String Calculator step 1: add("") returns 0.

Plan first (one Aider session, --architect if you want a senior model planning):

> /ask plan the cycles for the string-calc kata. Each cycle is one red commit + one green commit + an optional refactor commit. Don't generate code, just the plan.

Aider drafts seven cycles. You read it, push back where you disagree, then > go ahead — but stop after Aider proposes the FIRST cycle. The plan is the contract; you'll work cycle-by-cycle from here.

Red phase, one prompt:

> red(empty): write a failing test that add("") returns 0. Don't write add() yet.

Aider edits add.test.ts, doesn't touch add.ts, auto-commits as red(empty): write a failing test that add("") returns 0. Run tests:

> /test

Aider sees the failure (no add() exists yet), confirms.

Green phase:

> green(empty): write the simplest add() that makes the failing test pass.

Aider edits add.ts, auto-commits. /test confirms green.

Push:

> /run git push

The judge runs within seconds.

#How tdd.md verifies

tdd.md clones your kata repo on push, walks each commit, and runs the tests in a sandbox at every checkout. For each red commit it asserts the tests fail. For each green commit it asserts they pass. It runs hidden tests it owns (catches tautologies and weakened assertions), counts test functions across commits (catches deletion), and re-runs tests on each refactor commit (catches regression).

The verdict is a public URL with per-step status, score, and a one-line explanation per row. /demo/string-calc is what a clean run looks like — +45, two steps verified, one refactor with tests staying green.

If you used --auto-test and it cheated, tdd.md catches it on either count: the deleted test shows up as test-deleted (-20), the weakened assertion shows up as hidden-tests-failed (0). Aider's local "all tests pass!" stops being a reliable signal; the verdict becomes the source of truth.

#Common pitfalls and what they cost

red-did-not-fail — combined-prompt that asks for test + impl in one step. Aider edits both files, both commit messages get tagged with whatever you wrote, the red commit's tests already pass. -5. Fix: two separate prompts, one with red:, one with green:.

hidden-tests-failed--auto-test weakened the assertion to "win". 0 points. Fix: the CONVENTIONS rule + don't trust auto-test for green commits in the kata; use /test manually to gate the commit yourself.

test-deleted — auto-test loop removed a test. -20. Fix: same CONVENTIONS rule + on green failure, prompt explicitly: "fix the impl in add.ts. Don't touch any test file."

Broken refactor — refactor commit's tests fail. -5. Fix: re-run after every edit; --auto-test is fine for refactor (it's not under prompt pressure to "win" against a failing test).

#Architect mode

aider --architect splits the work between a senior model that plans and a junior that writes. For TDD this is mostly useful in green: the architect produces a small, structurally honest plan; the junior follows it. For red, plain mode is enough — single-purpose tests don't need architecture.

You can mix: aider --architect for the green phase, plain aider for red and refactor. Costs less per cycle, gets you the planning benefit where it counts.

#Why Aider remains the closest fit

Even with the auto-test trap, Aider has a structural advantage: every commit is intentional and named. Claude Code and Cursor want you to commit between phases as a separate operation; Aider folds it into the prompt. That's one fewer place for "I forgot to commit before continuing" to break the trace.

If you keep --auto-test off for green commits and on for refactor, prompt with phase prefixes, and pin the no-test-modification rule in CONVENTIONS, Aider's loop stops cheating and the verdict pages turn green.

#Try it

Sign in at tdd.md/you, pick the string-calc kata, set up CONVENTIONS.md, and run a few cycles in Aider with the workflow above. The verdict updates within seconds of each push.

Six steps in, you'll have an evidence-backed answer to whether your Aider workflow does real TDD or theater.

← all guides · Aider reference guide · the kata catalog · the Claude Code post → · the Cursor post →