syntaxai/tdd.md · commit 779579a

Blog post: Cursor knows how to do TDD; users skip Plan Mode + fresh chats

Companion to the Claude Code post, framed differently: Cursor's own
agent best practices already document a 5-step TDD pattern. The gap
isn't the loop — it's the supporting features (Plan Mode, fresh
conversations between phases, minimalist .cursor/rules) that make
the loop survive contact with a real workflow.

Quotes Cursor's docs explicitly on:
- Planning forces clear thinking, gives the agent concrete goals
- Long conversations accumulate noise; start fresh per task
- Project rules: minimalist, only when agent makes the same mistake

Walk-through follows their pattern: Plan Mode for kata setup → fresh
chat per phase → red commit → fresh chat → green commit → optional
refactor. Pitfalls map directly to verdict statuses, with Cursor-
specific causes (Composer auto-applying across files, agent mode
pushing through fixes, "let me update the test" suggestions on
green failures).

Both posts reachable from /blog and from the homepage CTA strip;
both have BlogPosting JSON-LD; both in the sitemap.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
author
syntaxai <[email protected]>
date
2026-05-07 12:39:06 +01:00
parent
532b094
commit
779579adf9bf03c939e18ad7fc6e514bd5ad9b21

3 files changed · +146 −1

added content/blog/cursor-tdd.md +139 −0
@@ -0,0 +1,139 @@
1+# Cursor knows how to do TDD. Most users skip the parts that matter.
2+
3+> Cursor's own [agent best practices](https://cursor.com/blog/agent-best-practices) document a clean TDD workflow: plan first, write tests, confirm they fail, *then* implement. The sequence is right. The features that make it work — Plan Mode, fresh conversations, project rules — are exactly what most users skip. Here's how to put the pieces together, and how to verify you didn't skip any.
4+
5+If you spend an afternoon reading Cursor's docs, you find an unusually disciplined TDD recommendation. Their five-step pattern, paraphrased:
6+
7+1. Ask the agent to write tests based on input/output pairs. Be explicit it's TDD so the agent doesn't mock-implement.
8+2. Have the agent run the tests and confirm they fail — without writing implementation code.
9+3. Commit at this point.
10+4. Request implementation that passes the tests without modifying them.
11+5. Commit.
12+
13+That sequence is correct. It's the same loop Kent Beck wrote about twenty-three years ago. The gap between Cursor's documented practice and what most users actually do isn't the loop itself — it's the supporting features that make the loop survive contact with a real workflow.
14+
15+## The three features most users skip
16+
17+### Plan Mode (`Shift+Tab`)
18+
19+Cursor's docs are direct about this: "Planning forces clear thinking about what you're building and gives the agent concrete goals to work toward." Plan Mode is where the agent researches your codebase, asks clarifying questions, and writes a plan *before* touching any file.
20+
21+Most users skip it because chat-first is faster for small tweaks. For TDD that's a mistake. Plan Mode is where the kata's requirements turn into a concrete sequence of red→green→refactor cycles. Without it, the agent improvises — and improvisation is what collapses the test-first/impl-second boundary.
22+
23+When a session goes sideways, Cursor recommends reverting and refining the plan rather than iterating through failed attempts. That advice applies just as cleanly to TDD: when the agent starts writing impl during the red phase, the right move is "stop, rewind to the plan", not "let me prompt my way out".
24+
25+### Fresh conversations between phases
26+
27+Cursor's own warning: *"Long conversations can cause the agent to lose focus. After many turns and summarizations, the context accumulates noise and the agent can get distracted or switch to unrelated tasks."*
28+
29+This is the structural fix for combined-phase TDD. If you write the red test in the same chat that's about to write the impl, the model has the impl plan in its working memory. The test ends up shaped to the impl rather than to the requirement.
30+
31+Fresh chat per phase breaks that. Your "write the failing test" prompt enters a conversation with no future. The model writes the test as a contract for someone else to satisfy — not as a step in its own plan.
32+
33+Cursor's docs recommend fresh conversations "when you're moving to a different task or feature" or "when the agent seems confused or keeps making the same mistakes." For TDD: treat each phase as a different task. Red is one task. Green is another. Refactor is a third. Three tasks, three conversations.
34+
35+### `.cursor/rules/` (minimalist, pinned)
36+
37+Project rules are persistent context — read on every agent invocation. Cursor's docs are emphatic about keeping them minimal: *"Start simple. Add rules only when you notice the agent making the same mistake repeatedly."*
38+
39+For TDD, one rule covers it. Drop this in `.cursor/rules/tdd.md`:
40+
41+```md
42+This project follows TDD strictly.
43+
44+Cycle: write a FAILING test, commit `red(<step>): <message>`, then write
45+the simplest impl that makes it pass, commit `green(<step>): <message>`.
46+Optional `refactor: <message>` between cycles.
47+
48+Never write impl before its failing test. Never write test + impl in the
49+same response. Never delete a test under any circumstances.
50+
51+If a test seems wrong, replace it in a separate commit, never bundled
52+with impl changes.
53+```
54+
55+That's it. Cursor's docs warn against copying entire style guides into rules — the lint should catch style; rules should encode workflow conventions the linter can't enforce. TDD discipline is exactly that kind of convention.
56+
57+## The full workflow
58+
59+Putting the pieces together, a single TDD cycle in Cursor looks like this:
60+
61+**Setup, once per kata.** Open the kata folder, drop the `.cursor/rules/tdd.md` from above, then `Shift+Tab` into Plan Mode and prompt:
62+
63+> "We're starting the `<kata-name>` kata. The requirements are at `<spec-url>`. Plan the cycles: one red→green pair per requirement, one optional refactor at the end. Don't write code yet."
64+
65+The plan is a markdown artifact. You read it, push back where you disagree (Cursor explicitly recommends pushback), commit it as a planning note if you like.
66+
67+**Red phase, fresh chat.** Exit Plan Mode (or open a new chat with `Cmd+L` if you've already drifted). Prompt:
68+
69+> "Implement the first cycle of the plan: a failing test for `<requirement>` in `<test-file>`. Don't touch the implementation file. Run the test and confirm it fails for the right reason."
70+
71+Cursor edits, runs, reports the failure. Commit:
72+
73+```
74+git commit -m "red(<step>): <one-line summary>"
75+```
76+
77+**Green phase, fresh chat.** New conversation. Prompt:
78+
79+> "The test in `<test-file>` is failing. Write the simplest impl in `<impl-file>` that makes it pass — no more. Run tests to confirm green."
80+
81+Commit:
82+
83+```
84+git commit -m "green(<step>): <one-line summary>"
85+```
86+
87+**Refactor (optional), fresh chat.** Prompt:
88+
89+> "Tests pass. Refactor `<impl-file>` for clarity without changing behaviour. Run tests after each edit."
90+
91+Commit:
92+
93+```
94+git commit -m "refactor: <what changed>"
95+```
96+
97+## A concrete walk-through
98+
99+Take the canonical String Calculator kata. Step 1: `add("")` returns `0`.
100+
101+**Plan Mode** drafts a plan that says "seven cycles, starting with empty-string returns zero". You commit it.
102+
103+**Red phase, fresh chat.** Cursor writes `add.test.ts` with one failing test, runs `bun test`, reports the failure. You commit `red(empty): empty string returns 0`.
104+
105+**Green phase, fresh chat.** Cursor opens `add.ts`, writes `export const add = () => 0`, runs tests, reports green. You commit `green(empty): hardcoded 0`.
106+
107+The "hardcoded 0" is intentional: Beck's "fake it 'til you make it". The next step (`add("42")` returns `42`) will force a real implementation, because returning 0 will fail the new test. The discipline emerges step-by-step; you can't shortcut it without inheriting design debt you didn't choose.
108+
109+## Common pitfalls and what they cost
110+
111+**`red-did-not-fail`** — combined Composer turn writes test + impl in one apply. Test passes immediately. -5. Fix: fresh chat per phase.
112+
113+**`hidden-tests-failed`** — your test passes, but the kata's hidden tests don't. The test was tautological or shaped around the wrong impl. 0 points. Fix: anchor the test to the requirement, not to the implementation you're about to write.
114+
115+**`test-deleted`** — Cursor offers to "fix" a failing green by removing the test instead of fixing the impl. -20. Fix: the rule above ("never delete a test under any circumstances") + push back when Cursor proposes it. Cursor's docs explicitly recommend treating the agent as a "capable collaborator" — pushback is part of the relationship.
116+
117+**Broken refactor** — refactor commit's tests fail. -5. Fix: re-run after every edit; revert if anything breaks.
118+
119+## How tdd.md verifies you actually followed the pattern
120+
121+[tdd.md](/) clones your kata repo on push, walks each commit, and runs the tests in a sandbox at every checkout. For the red commit, it asserts the tests *fail*. For the green commit, it asserts they pass. It also runs hidden tests it owns (catches tautologies), counts test functions across commits (catches deletion), and re-runs tests on each refactor (catches regression).
122+
123+The verdict is a public URL with per-step status, score, and a one-line explanation per row. [/demo/string-calc](/demo/string-calc) is what a clean run looks like — `+45`, two steps verified, one refactor with tests staying green.
124+
125+If you ran the kata in Cursor with everything documented above, your verdict matches the demo's pattern. If you slipped — combined Composer, long-running session that bled context, agent mode that auto-applied across the boundary — the verdict shows exactly which step it caught you on, and why.
126+
127+## Why Plan Mode is the secret weapon
128+
129+The structural insight Cursor's docs nudge at but don't quite spell out: Plan Mode is where the agent's plan and the kata's requirements line up *before* any code touches disk. The plan is a markdown artifact. It survives across session resets. It's what makes fresh-conversation-per-phase coherent — each phase enters a fresh context, but with the same plan visible.
130+
131+Without a plan, fresh conversations are amnesia. With a plan, they're focus. That's the whole shift between "TDD-shaped output" and "actual TDD discipline".
132+
133+## Try it
134+
135+Sign in at [tdd.md/you](/you), pick the [string-calc kata](/games/string-calc), and run it through Cursor with the rules and the workflow above. The verdict updates within seconds of each push. The phase log shows what the judge saw, the score column shows what each commit earned, and the explanation column tells you why.
136+
137+Six steps in, you'll have an evidence-backed answer to: "Is my Cursor workflow doing real TDD, or just looking like it?"
138+
139+[← all guides](/guides) · [Cursor reference guide](/guides/cursor) · [the kata catalog](/games) · [the Claude Code post →](/blog/claude-code-tdd)
modified content/home.md +1 −1
@@ -8,7 +8,7 @@
88 - [TDD with **Cursor** →](/guides/cursor) · Composer-per-phase, project rules, agent-mode caveats
99 - [TDD with **Aider** →](/guides/aider) · auto-commit phase tags, --auto-test gotchas
1010
11-See what a real verdict looks like: [tdd.md/demo/string-calc →](/demo/string-calc). Read the post: [Claude Code does not do TDD by default →](/blog/claude-code-tdd).
11+See what a real verdict looks like: [tdd.md/demo/string-calc →](/demo/string-calc). Posts: [Claude Code does not do TDD by default →](/blog/claude-code-tdd) · [Cursor knows how to do TDD →](/blog/cursor-tdd).
1212
1313 ---
1414
modified src/server.ts +6 −0
@@ -50,6 +50,12 @@ interface BlogEntry {
5050 }
5151
5252 const ALL_POSTS: BlogEntry[] = [
53+ {
54+ slug: "cursor-tdd",
55+ title: "Cursor knows how to do TDD. Most users skip the parts that matter.",
56+ description: "Cursor's own agent best practices document a clean TDD workflow — but most users skip the features (Plan Mode, fresh conversations, .cursor/rules) that actually make it work. Here's how to put the pieces together, with a kata you can run end-to-end.",
57+ date: "2026-05-04",
58+ },
5359 {
5460 slug: "claude-code-tdd",
5561 title: "Claude Code does not do TDD by default — here's how to make it",