syntaxai/tdd.md · commit 7b55b52

Blog post: greening our own dogfood — Modeled flipped to ✓ on /sama/verify

The receipt for the live verifier moving from 3/4 to 4/4 pillars on
syntaxai/tdd.md. Same URL anyone can hit, same answer the local CLI
gives. Cites the 4 missing-sibling violations as the named artifact,
the 4 new test files as the fix, and the public dogfood URL as the
proof.

- content/blog/sama-empirical-modeled-green.md — the case-study post
- src/c31_blog.ts — registry entry dated 2026-05-22

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
author
syntaxai <[email protected]>
date
2026-05-22 13:06:30 +01:00
parent
44fca64
commit
7b55b52e6fe4d3924d61d56545dfa234c5923459

2 files changed · +149 −0

added content/blog/sama-empirical-modeled-green.md +143 −0
@@ -0,0 +1,143 @@
1+# Greening our own dogfood: four sibling tests, the live verifier flipped from 3/4 to 4/4
2+
3+The strongest claim a coding standard can make is "I follow my own rules
4+on my own codebase, in public, on the page where I document the rules."
5+The weakest version of that claim is the version you'd make about your
6+own work where nobody can check. Halfway is a `/sama/verify?repo=mine`
7+URL that anyone can hit — and that says **three of four** pillars green.
8+
9+That's where this site was yesterday. Today it's 4/4. Here's the receipt.
10+
11+## The dogfood URL
12+
13+`/sama/verify` is the public verifier on tdd.md. Drop in any GitHub
14+`owner/name` and it runs the four SAMA checks against the default
15+branch — same checks, same code, as the local `sama check` CLI. The
16+*"verify any public repo"* link on the SAMA landing page seeds it with
17+this site's own slug as an example:
18+
19+```
20+https://tdd.md/sama/verify?repo=syntaxai/tdd.md
21+```
22+
23+Yesterday that page showed:
24+
25+```
26+Sorted ✓ pass
27+Architecture ✓ pass
28+Modeled ✗ 4 violations
29+Atomic ✓ pass
30+```
31+
32+The four Modeled violations were real. SAMA's Modeled rule is hard:
33+every `c32_*.ts` file must have a sibling `.test.ts`. Four files were
34+missing one:
35+
36+```
37+c32_judge.ts — no sibling test file at c32_judge.test.ts
38+c32_session.ts — no sibling test file at c32_session.test.ts
39+c32_real_reports.ts — no sibling test file at c32_real_reports.test.ts
40+c32_real_tests.ts — no sibling test file at c32_real_tests.test.ts
41+```
42+
43+They'd been flagged on every check for weeks. Long enough that the
44+"we follow our own rules" narrative was starting to wear thin every
45+time someone pasted the dogfood URL into a conversation.
46+
47+## What I did about it
48+
49+Four sibling test files. One per violation. 55 new tests, all real
50+unit tests with real `expect()` calls — the verifier flags placeholder
51+tests too (zero `expect()` calls in a `test()` body) under the same
52+Atomic pass, so cheating with empty bodies isn't an option.
53+
54+| File | Tests | What's actually covered |
55+|---|---:|---|
56+| `c32_session.test.ts` | 24 | The pure helpers: `parseCookies`, `timingSafeEqual`, `hmacSha256Hex`, `sessionCookieHeader`, `randomHex`, plus the full `signSession` ↔ `verifySession` round-trip including forged-signature and expired-cookie rejection paths |
57+| `c32_judge.test.ts` | 9 | `applyMode` — the strict / pragmatic / learning penalty math (positive deltas pass through; pragmatic halves negatives; learning zeroes them); `explainRefactor` — the two-branch explanation strings |
58+| `c32_real_reports.test.ts` | 12 | `detectAgent` (Claude / Cursor / Aider attribution from commit footers, case-insensitive); `buildTrend` (30-day daily commit sparkline — out-of-window drop, same-day stacking, empty input flat-lines) |
59+| `c32_real_tests.test.ts` | 10 | `detectAgent` again (this one returns `null` instead of `"unknown"` — documented difference); `shortenTestLabel` |
60+
61+`c32_session.ts` was the easy one — every helper it ships is already
62+exported. The other three needed a small visibility move: three `const`
63+helpers (`applyMode`, `explainRefactor`, `detectAgent`, `buildTrend`,
64+`shortenTestLabel`) had to become `export const` so the sibling could
65+import them. No behaviour changed. The exports are the test surface
66+SAMA's Modeled rule expects c32 files to have anyway — burying pure
67+helpers as private `const` defeats the whole point of putting them in
68+a c32 file in the first place.
69+
70+## What the verifier sees now
71+
72+Local run:
73+
74+```
75+$ bun scripts/sama-cli.ts check
76+SAMA verify · (local)/src · (working tree)
77+ examined 71 SAMA files / 16 tests / 74 src files
78+
79+ S — Sorted: ✓ pass (71 files)
80+ A — Architecture: ✓ pass (71 files)
81+ M — Modeled: ✓ pass (55 files)
82+ A — Atomic: ✓ pass (71 files)
83+
84+✓ all four checks passed
85+```
86+
87+Live run — same code, just over HTTP:
88+
89+```
90+$ curl -s 'https://tdd.md/sama/verify?repo=syntaxai/tdd.md' \
91+ | grep -oE '(Sorted|Architecture|Modeled|Atomic)[^<]*<'
92+Architecture · ✓ pass<
93+Atomic · ✓ pass<
94+Modeled · ✓ pass<
95+Sorted · ✓ pass<
96+```
97+
98+Same answer, two delivery surfaces, one source of truth.
99+
100+## Why this matters as evidence
101+
102+Two things are true that weren't true yesterday:
103+
104+1. **The publicly-hosted verifier shows this codebase clean.** Anyone
105+ can run `https://tdd.md/sama/verify?repo=syntaxai/tdd.md` and see
106+ 4/4 ✓. Skeptics don't have to trust me; they can hit the URL.
107+2. **The fix shape matched the verifier's instruction exactly.**
108+ The verifier said "no sibling test file at `<path>`." The fix was
109+ four sibling test files at those four paths. No metaprogramming, no
110+ reorganisation, no carve-out. The instruction *was* the diff.
111+
112+That's the empirical loop SAMA's pitch turns on: the verifier names
113+the missing artifact, the operator (human or agent) produces it, the
114+verifier flips green. No judgement calls, no taste, no escalation.
115+On a working codebase that you're shipping to users, you can run the
116+loop in an afternoon.
117+
118+## What this *doesn't* prove
119+
120+It doesn't prove the standard scales. It doesn't prove the standard
121+works on Python or Rust. It doesn't prove AI agents using SAMA write
122+better code than agents not using SAMA (that's [issue #1](https://github.com/syntaxai/tdd.md/issues/1)
123+— 360 measured data points, still to come).
124+
125+It proves *one* thing: on this codebase, the rules described in
126+`/sama/*` are the rules running in `bun scripts/sama-cli.ts check`,
127+which are the rules running on `https://tdd.md/sama/verify`, and the
128+codebase passes all four of them. The website is the spec is the
129+verifier is the test suite, and they agree on the same answer.
130+
131+That's the smallest unit of "this standard isn't just a blog post."
132+The blog post is the receipt; the URL is the proof.
133+
134+---
135+
136+**See for yourself:**
137+
138+- Live dogfood: <https://tdd.md/sama/verify?repo=syntaxai/tdd.md>
139+- The four checks documented:
140+ [Sorted](/sama/sorted) · [Architecture](/sama/architecture) ·
141+ [Modeled](/sama/modeled) · [Atomic](/sama/atomic)
142+- Previous post in this series:
143+ [When the verifier said "split this"](/blog/sama-empirical-c21-split)
modified src/c31_blog.ts +6 −0
@@ -12,6 +12,12 @@ export interface BlogEntry {
1212 }
1313
1414 export const ALL_POSTS: BlogEntry[] = [
15+ {
16+ slug: "sama-empirical-modeled-green",
17+ title: "Greening our own dogfood: four sibling tests, the live verifier flipped from 3/4 to 4/4",
18+ description: "/sama/verify?repo=syntaxai/tdd.md is the public verifier on tdd.md. Yesterday it showed three of four SAMA pillars green for this codebase — Modeled was flagging four c32_* files without sibling tests. Today it shows 4/4. Receipt for the round-trip: four new test files (55 unit tests), three const → export const visibility lifts on pure helpers, no behaviour changes, and the same URL anyone in the world can hit now reports the same answer the local CLI does. The website is the spec is the verifier is the test suite.",
19+ date: "2026-05-22",
20+ },
1521 {
1622 slug: "sama-empirical-c21-split",
1723 title: "When the verifier said 'split this': one Atomic-700 hit, four handler files, the build stayed green",