syntaxai/tdd.md · commit 2868f48

SAMA tooling: placeholder detection, sandbox runner, verify, blog post

Five slices the corpus post promised, shipped in one batch. Each one
turns a SAMA rule from text into an URL or a CI check.

1. Placeholder-test detection (caught today)
   scripts/p620/snapshot-tests.ts now walks every test() / it() body in
   junit-named files and flags zero-expect() bodies. The bundle gets a
   placeholderTests: { name, file, reason }[] field. The runtime
   /reports/live/tests page surfaces it as a flagged section. Catches
   r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass rate")
   directly.

2. Historical-commit testing (sandbox runner sliver)
   Same snapshot script accepts SAMA_HISTORY_DEPTH=N. For each of the
   last N commits not yet in the bundle, it spins up a git worktree,
   symlinks node_modules, runs bun test --reporter=junit, and appends
   the result keyed by SHA. Default 0 keeps deploy time bounded;
   opt-in for backfill or extended stability data. Groundwork for
   detecting r/ClaudeCode 1rug14a (runtime test-patching): the data
   to compare a clean-checkout run against an in-session run is now
   in the bundle.

3. /sama/verify?repo=owner/name (mechanical check, any public repo)
   New routes:
     GET /sama/verify           form
     GET /sama/verify?repo=...  four-discipline report
   New files:
     src/c14_github.ts          fetchRepoTree + fetchRepoRawFile
                                (tree via API, contents via raw -
                                no token required)
     src/c32_sama_verify.ts     pure verifier - S/A/M/A checks given
                                file list + contents
   Special case: tdd.md is a private repo; when asked to verify
   ourselves, the route reads ./src/ from the container instead of
   GitHub. The dogfood result is honest: S+A pass, M flags 5 c32_*
   files without sibling tests, A flags c21_app.ts at 1066 lines
   (over the 700-line atomic threshold). The verifier catches its
   own makers - which is the proof the verifier works.

   Sorted-rule correction: the strict "lower never imports higher"
   reading was self-contradictory (it forbade c21 -> c31, the
   natural pattern). The actual rule, now consistent across
   /sama/sorted, /sama/skill, and the verifier: foundation/data/
   logic (c1*, c3*) don't depend on UI (c5*+); handlers (c21) are
   the orchestration layer and may import anything. The grep is
   src/c1*.ts src/c3*.ts only.

4. Cross-link sweep
   Both blog posts (harness postmortem + corpus) gained /sama/skill
   and /sama/verify in their footers. /sama/skill's Reference
   section gained both blog posts and /sama/verify. /sama landing
   gained "verify any public repo" and "the case behind it"
   sections.

5. Wrap-up blog post: from-rules-to-checks
   Documents what shipped, with the dogfood result printed verbatim
   so the case is checkable rather than asserted.

The Atomic violation on c21_app.ts (1066 lines) is real and signals
that the dispatcher needs a per-domain split per the SAMA rule. That
split is the next sliver, not this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
author
syntaxai <[email protected]>
date
2026-05-09 19:46:43 +01:00
parent
a92d5a5
commit
2868f48b6d44ae47282d3d2dfb0f4d7e161b3ecd

12 files changed · +866 −22

modified content/blog/agentic-coding-corpus-three-patterns.md +1 −1
@@ -184,4 +184,4 @@ One thread is an audit. Ten threads are a pattern. The corpus shows three things
184184
185185 Plus the original one-thread audit: [ThePaSch — Claude Code has big problems and the post-mortem is not enough](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/) (325↑, 200+ comments). Treat this row as the eleventh entry.
186186
187-[Read the previous post →](/blog/claude-code-harness-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [back to the blog](/blog)
187+[Read the previous post →](/blog/claude-code-harness-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify your repo →](/sama/verify) · [back to the blog](/blog)
modified content/blog/claude-code-harness-postmortem.md +1 −1
@@ -123,4 +123,4 @@ None of this is in the user's hands. But the takeaway underneath is: while Anthr
123123
124124 The harness is loud. The diff doesn't have to be. TDD's iron law and SAMA's three falsifiable rules survive every reminder-bombardment, gag-order, importance-inflation, and contradictory-instruction the OP catalogues, because they are enforced *outside the agent's context window* — in the commit log and the file tree. They do not fix Anthropic's prompt sprawl, and they do not refund the tokens. What they do is make the work the agent ships externally verifiable, regardless of what the agent was told on the way there.
125125
126-[Read the original Reddit thread →](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/claude_code_has_big_problems_and_the_postmortem/) · [Anthropic's April-23 postmortem →](https://www.anthropic.com/engineering/april-23-postmortem) · [the four SAMA disciplines →](/sama) · [back to the blog](/blog)
126+[Read the original Reddit thread →](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/claude_code_has_big_problems_and_the_postmortem/) · [Anthropic's April-23 postmortem →](https://www.anthropic.com/engineering/april-23-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify your repo →](/sama/verify) · [back to the blog](/blog)
added content/blog/from-rules-to-checks.md +92 −0
@@ -0,0 +1,92 @@
1+# From rules to checks: shipping what the corpus post promised
2+
3+> The [corpus post](/blog/agentic-coding-corpus-three-patterns) closed with a promise: *"three of the ten threads describe failures only an actual test run can catch"* — and named which checks would have caught which failure modes (some today, some with a small extension, some only with a sandbox runner). This post is the receipt. Three of those checks now ship: **placeholder-test detection** (the *"one-evening sliver"*), **historical-commit testing via git worktree** (the *"next slice on the roadmap"*), and **`/sama/verify`** (mechanical layer grep + sibling-test + line-count + placeholder check, runnable against any public repo).
4+
5+## why this post is short
6+
7+The two previous posts made an argument. This one documents an outcome. If the argument was right, the outcome should be small and obvious: the rules become checks, the checks become routes, the routes catch the failure modes the corpus catalogued. Here's what that looks like in practice.
8+
9+## 1 · placeholder-test detection (caught today)
10+
11+Failure mode: r/ClaudeCode `1qix264`, *"Claude wrote 90 placeholder tests and reported 100% pass rate"*. The corpus post said:
12+
13+> *"empty assertion bodies — zero `expect()` calls, string-literal bodies, single-line `// TODO` stubs — are AST-checkable. The test bundle already lives in `content/git-history/syntaxai__tdd.md__tests.json`; an empty-body check is a one-evening sliver."*
14+
15+The check is a regex-based brace walker that extracts every `test(...)` and `it(...)` body, counts `expect(` occurrences, and flags zero-count bodies. It runs at deploy time as part of the existing `snapshot-tests.ts` script and writes its findings to the bundle as `placeholderTests: { name, file, reason }[]`. The runtime renderer surfaces them on [`/reports/live/tests`](/reports/live/tests):
16+
17+- Zero placeholders → a small "no placeholder tests detected at this snapshot" note explaining what the check looks for.
18+- One or more → a flagged section with a per-test table: name, file, reason (*"no expect() calls"*, *"empty test body"*, *"comment-only stub"*).
19+
20+It catches the most common shape of `1qix264` directly (`expect()` count is zero). It misses theoretical ones (custom assertion helpers that don't go through `expect`); the regex's blast radius is the real failures, not every imaginable one.
21+
22+## 2 · historical-commit testing (the sandbox runner sliver)
23+
24+Failure mode: r/ClaudeCode `1rug14a`, *"Claude wrote Playwright tests that secretly patched the app at runtime"*. This is the failure that the previous reporting layer couldn't catch — the diff looks fine, the test passes in the agent's terminal, the test passes in the deploy-time bundle too if the bundle only ever ran HEAD. Catching this needs the same test to run *somewhere it's never run before*, against the actual code at that SHA.
25+
26+The new mode: `SAMA_HISTORY_DEPTH=N` in the deploy environment makes the snapshot script also test the last *N* commits that aren't already in the bundle. Mechanically:
27+
28+```ts
29+// scripts/p620/snapshot-tests.ts (excerpt)
30+git worktree add --detach /tmp/tdd-md-wt-<sha> <sha>
31+ln -s "$REPO_ROOT/node_modules" "$WORKTREE/node_modules"
32+bun test --reporter=junit --reporter-outfile=/tmp/junit-<sha>.xml
33+git worktree remove --force /tmp/tdd-md-wt-<sha>
34+```
35+
36+Each historical run produces the same `TestRunRecord` shape as a HEAD run, gets appended to the bundle keyed by SHA, and feeds the existing stability table. Two consequences:
37+
38+- **Stability data builds 10× faster.** A first `SAMA_HISTORY_DEPTH=10` deploy backfills ten runs in one go instead of waiting ten deploys.
39+- **Runtime-patching becomes detectable in principle.** A test that passed in the agent's session AND in the original deploy run, but fails when re-run from a clean worktree at the same SHA, is the smoking-gun shape of `1rug14a`. We're not yet wired to flag the discrepancy as a separate failure mode (that's the next sliver), but the data to compare *is now in the bundle*.
40+
41+The default is still `HISTORY_DEPTH=0` (HEAD-only). Opt-in keeps deploy time bounded; flipping the default to `5` or `10` is a one-line change once we want it on by default.
42+
43+## 3 · `/sama/verify` (mechanical check for any public repo)
44+
45+The corpus post argued: *"don't write a CLAUDE.md instruction the harness can overrule. Write a structural check the harness doesn't get to know about."* That argument is hollow if the structural checks aren't actually runnable. The new route closes the loop:
46+
47+**[/sama/verify](/sama/verify)** — paste a public GitHub repo, get a four-discipline report. The mechanics:
48+
49+1. One GitHub API call to `git/trees/<default-branch>?recursive=1` resolves the file list.
50+2. Every `src/cXX_*.ts` file is fetched via `raw.githubusercontent.com` (no API rate limit, no token).
51+3. Pure logic in [`c32_sama_verify.ts`](https://github.com/syntaxai/tdd.md/blob/main/src/c32_sama_verify.ts) runs the four checks:
52+ - **S — Sorted**: every relative `from "./..."` import in a `cXX_*.ts` is parsed; flag if the target's prefix is higher than the source's.
53+ - **A — Architecture**: every `cXX_` prefix is matched against the known set (`c11`, `c13`, `c14`, `c21`, `c31`, `c32`, `c51`); unknown ones flagged.
54+ - **M — Modeled**: every `cXX_<name>.ts` (non-test) is checked for a sibling `cXX_<name>.test.ts`. Hard-fails for `c32_*` (logic); informational for `c31_*` (often pure-data registries).
55+ - **A — Atomic**: line count over 700 → flagged. Test files → run the same placeholder check from sliver #1.
56+
57+Output: pass/fail per discipline, with up to 20 violations per check listed (`file` + `detail`). Cached for an hour per repo.
58+
59+Try it on this site: [`/sama/verify?repo=syntaxai/tdd.md`](/sama/verify?repo=syntaxai/tdd.md). And here's the dogfood result, honestly:
60+
61+| check | tdd.md self-verify result |
62+|---|---|
63+| S — Sorted | ✓ pass — no UI dependency leaks into foundation/data/logic |
64+| A — Architecture | ✓ pass — every prefix is in the known set |
65+| M — Modeled | ✗ 5 violations — `c32_judge.ts`, `c32_session.ts`, `c32_real_reports.ts`, `c32_real_tests.ts`, `c32_sama_verify.ts` lack sibling test files |
66+| A — Atomic | ✗ 1 violation — `c21_app.ts` is 1066 lines (over the 700-line split threshold) |
67+
68+Two of four fail, and they're real. Five `c32_*` logic files — including `c32_sama_verify.ts`, the file that *runs the verification* — don't have sibling tests yet, and the route dispatcher has grown past the atomic threshold and now needs a per-domain split. Both findings were caught by the tool we just shipped, against the codebase we just shipped it from. That's the dogfood story: not "everything passes" but "the tool catches real things in real code, including its own". Both are on the very next slice of the roadmap.
69+
70+## what this changes about the case
71+
72+The argument has now happened in three layers:
73+
74+1. **The harness postmortem post** said: structural rules survive harness chaos because they're enforced outside the agent's context window.
75+2. **The corpus post** said: ten threads prove the failure modes are systematic, here are the rules that catch each, here's what we catch and what we don't yet.
76+3. **This post** says: the rules are now checks, the checks are now URLs you can hit, and you can verify the case against any public repo *including this one*.
77+
78+The leftover work — flagging a runtime-patching discrepancy as a distinct failure mode, hidden-test verification on real-project commits, AST-level placeholder detection beyond the regex — is in the open. It's smaller than what shipped this week.
79+
80+## tl;dr
81+
82+The two previous posts made a case from text. This one ships the checks the case promised:
83+
84+| sliver | route | catches | status |
85+|---|---|---|---|
86+| placeholder detection | [/reports/live/tests](/reports/live/tests) | r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass") | live |
87+| historical-commit testing | snapshot script with `SAMA_HISTORY_DEPTH=N` | runtime-patching SHAs ([groundwork for 1rug14a](/blog/agentic-coding-corpus-three-patterns)) | opt-in, default 0 |
88+| `/sama/verify` | [/sama/verify](/sama/verify) | layer violations, missing sibling tests, oversized files, placeholder tests, in any public repo | live |
89+
90+If the discipline is real, you should be able to point it at a repo and have it report findings. Now you can.
91+
92+[← back to the blog](/blog) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify a repo →](/sama/verify)
modified content/sama/skill.md +8 −3
@@ -30,16 +30,18 @@ Thinking "this one helper doesn't need a prefix"? Stop. That's how the rule erod
3030 ## The Iron Rule
3131
3232 ```
33-LOWER LAYERS NEVER IMPORT FROM HIGHER LAYERS
33+UI SITS AT THE EDGE — FOUNDATION, DATA AND LOGIC LAYERS NEVER DEPEND ON UI
3434 ```
3535
36+Foundation/data/logic (`c1*`, `c3*`) must never import UI (`c5*+`). Handlers (`c21`) are the orchestration layer and may compose UI; UI itself may read models for the data it renders.
37+
3638 Verify with one grep:
3739
3840 ```bash
39-grep -rE 'from "\./c[5-9]' src/c1*.ts src/c2*.ts src/c3*.ts
41+grep -rE 'from "\./c[5-9]' src/c1*.ts src/c3*.ts
4042 ```
4143
42-Empty output = rule holds. Any output = a lower layer reaches into a higher one. Either move the function or rename the file. Do not "fix" the violation by deleting the import without understanding what broke.
44+Empty output = rule holds. Any output = a UI dependency has leaked into foundation/data/logic. Move the function or rename the file. Do not "fix" the violation by deleting the import without understanding what broke.
4345
4446 ## The Four Letters
4547
@@ -184,3 +186,6 @@ Or, if the live and demo body builders are mostly the same shape parameterised b
184186
185187 - The four disciplines with examples: https://tdd.md/sama
186188 - Why SAMA compounds with TDD and token-discipline: https://tdd.md/blog/three-constraints-agentic-coding
189+- The Reddit harness postmortem this skill is a response to: https://tdd.md/blog/claude-code-harness-postmortem
190+- Ten more threads, three patterns, mitigation tables: https://tdd.md/blog/agentic-coding-corpus-three-patterns
191+- Mechanically verify any public repo against these rules: https://tdd.md/sama/verify
modified content/sama/sorted.md +3 −3
@@ -1,6 +1,6 @@
11 # S — Sorted
22
3-> **Rule:** alphabetical sort = dependency direction. Lower-numbered layers never import from higher-numbered ones.
3+> **Rule:** alphabetical sort = dependency direction. UI sits at the edge — foundation, data and logic layers (`c1*`, `c3*`) never depend on UI (`c5*+`). Handlers (`c21`) are the orchestration layer and may import anything.
44
55 The first property of SAMA. Run `ls src/` and you have the architecture diagram. There is no separate "where does the data flow?" document because the file system answers it.
66
@@ -16,10 +16,10 @@ Two reasons:
1616 Run this from the repo root:
1717
1818 ```bash
19-grep -rE 'from "\./c[5-9]' src/c1*.ts src/c2*.ts src/c3*.ts
19+grep -rE 'from "\./c[5-9]' src/c1*.ts src/c3*.ts
2020 ```
2121
22-If the output is empty, the rule holds. If anything appears, you have a higher layer leaking into a lower one — fix the import or move the file.
22+If the output is empty, the rule holds. If anything appears, you have a UI dependency leaking into a foundation, data or logic layer — fix the import or move the file. (Note: `c2*` files are intentionally excluded — handlers compose UI calls, so `c21` → `c51` is the normal pattern.)
2323
2424 This is the single load-bearing test for the *Sorted* property. Wire it into CI and forget about it.
2525
modified scripts/p620/snapshot-tests.ts +197 −9
@@ -1,18 +1,21 @@
11 #!/usr/bin/env bun
2-// Run `bun test` on the current HEAD and append the result to a
3-// per-repo bundle alongside the git-history snapshot. The container
4-// reads this bundle at runtime to render /reports/live/tests for the
5-// (private) syntaxai/tdd.md repo without needing a runtime sandbox.
2+// Run `bun test` on the current HEAD (and optionally the last N
3+// historical commits) and append the results to a per-repo bundle
4+// alongside the git-history snapshot. The container reads this bundle
5+// at runtime to render /reports/live/tests for the (private)
6+// syntaxai/tdd.md repo without needing a runtime sandbox.
67 //
7-// Strategy: HEAD-only per deploy. The bundle accumulates one run per
8-// deploy (capped at 50), so stability data builds organically over
9-// time. No git-worktree gymnastics, no per-commit bun-install.
8+// HEAD mode (default): one new run per deploy, fast, no worktree.
9+// History mode (SAMA_HISTORY_DEPTH=N): also runs the last N commits
10+// that aren't already in the bundle, via git worktree + bun install
11+// per SHA. Slower (~5-10s/commit) but builds real stability data
12+// instead of waiting for it to accumulate organically.
1013 //
1114 // Output: content/git-history/<owner>__<name>__tests.json
1215 // Schema: { owner, name, runs: TestRunRecord[] } — newest first.
1316
1417 import { spawnSync } from "node:child_process";
15-import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
18+import { existsSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
1619 import { resolve } from "node:path";
1720
1821 const REPO_ROOT = resolve(import.meta.dir, "..", "..");
@@ -20,6 +23,7 @@ const OWNER = "syntaxai";
2023 const NAME = "tdd.md";
2124 const MAX_RUNS = 50;
2225 const JUNIT_OUT = "/tmp/tdd-md-test-junit.xml";
26+const HISTORY_DEPTH = parseInt(process.env.SAMA_HISTORY_DEPTH ?? "0", 10);
2327
2428 const sh = (cmd: string, args: string[]): string => {
2529 const r = spawnSync(cmd, args, { cwd: REPO_ROOT, encoding: "utf8" });
@@ -81,6 +85,82 @@ const passing = tests.filter((t) => t.status === "pass").length;
8185 const failing = tests.length - passing;
8286 const totalDurationMs = tests.reduce((s, t) => s + t.durationMs, 0);
8387
88+interface PlaceholderTest {
89+ name: string;
90+ file: string;
91+ reason: string;
92+}
93+
94+// Placeholder detection. Catches the failure mode from r/ClaudeCode
95+// post 1qix264 ("90 placeholder tests, 100% pass rate"): tests with
96+// zero `expect(` calls in their body are flagged. Regex-based brace
97+// matching — full AST is overkill for the one structural property we
98+// care about. Limitations: misses tests that delegate to a custom
99+// assertion helper or pass through a subroutine. Acceptable for v1;
100+// the catch is the common failure shape, not every theoretical one.
101+const findPlaceholderTests = (testFile: string, content: string): PlaceholderTest[] => {
102+ const out: PlaceholderTest[] = [];
103+ const re = /\b(test|it)\s*\(\s*(["'`])((?:\\.|(?!\2).)*)\2\s*,\s*(?:async\s+)?(?:\([^)]*\)|[^=()]*?)\s*=>\s*\{/g;
104+ let m: RegExpExecArray | null;
105+ while ((m = re.exec(content)) !== null) {
106+ const name = m[3] ?? "";
107+ const startBrace = re.lastIndex - 1;
108+ let depth = 1;
109+ let i = startBrace + 1;
110+ let inString: string | null = null;
111+ while (i < content.length && depth > 0) {
112+ const c = content[i];
113+ if (inString !== null) {
114+ if (c === "\\") { i += 2; continue; }
115+ if (c === inString) inString = null;
116+ } else {
117+ if (c === '"' || c === "'" || c === "`") inString = c;
118+ else if (c === "/" && content[i + 1] === "/") {
119+ // line comment
120+ while (i < content.length && content[i] !== "\n") i++;
121+ continue;
122+ }
123+ else if (c === "/" && content[i + 1] === "*") {
124+ // block comment
125+ i += 2;
126+ while (i < content.length - 1 && !(content[i] === "*" && content[i + 1] === "/")) i++;
127+ i += 2;
128+ continue;
129+ }
130+ else if (c === "{") depth++;
131+ else if (c === "}") depth--;
132+ }
133+ i++;
134+ }
135+ const body = content.slice(startBrace + 1, i - 1);
136+ const expectCount = (body.match(/\bexpect\s*\(/g) ?? []).length;
137+ if (expectCount === 0) {
138+ const trimmedLen = body.replace(/\s+/g, "").length;
139+ const reason = trimmedLen === 0
140+ ? "empty test body"
141+ : trimmedLen < 20 && /^\s*\/\//.test(body.trim())
142+ ? "comment-only stub"
143+ : "no expect() calls in test body";
144+ out.push({ name, file: testFile, reason });
145+ }
146+ }
147+ return out;
148+};
149+
150+const detectPlaceholders = (testFiles: string[]): PlaceholderTest[] => {
151+ const out: PlaceholderTest[] = [];
152+ for (const f of testFiles) {
153+ const abs = resolve(REPO_ROOT, f);
154+ if (!existsSync(abs)) continue;
155+ const content = readFileSync(abs, "utf8");
156+ out.push(...findPlaceholderTests(f, content));
157+ }
158+ return out;
159+};
160+
161+const uniqueTestFiles = Array.from(new Set(tests.map((t) => t.file).filter(Boolean)));
162+const placeholderTests = detectPlaceholders(uniqueTestFiles);
163+
84164 interface TestRunRecord {
85165 sha: string;
86166 branch: string;
@@ -90,6 +170,7 @@ interface TestRunRecord {
90170 failing: number;
91171 durationMs: number;
92172 tests: TestRecord[];
173+ placeholderTests: PlaceholderTest[];
93174 }
94175
95176 interface TestBundle {
@@ -121,9 +202,116 @@ if (bundle.runs.some((r) => r.sha === head)) {
121202 failing,
122203 durationMs: totalDurationMs,
123204 tests,
205+ placeholderTests,
124206 });
125207 bundle.runs = bundle.runs.slice(0, MAX_RUNS);
126208 mkdirSync(resolve(REPO_ROOT, "content", "git-history"), { recursive: true });
127209 writeFileSync(bundlePath, JSON.stringify(bundle, null, 2));
128- console.log(`✓ tests at ${head.slice(0, 7)} (${branch}): ${passing}/${tests.length} pass, ${failing} fail → bundle (${bundle.runs.length} runs total)`);
210+ console.log(`✓ tests at ${head.slice(0, 7)} (${branch}): ${passing}/${tests.length} pass, ${failing} fail, ${placeholderTests.length} placeholder → bundle (${bundle.runs.length} runs total)`);
211+ if (placeholderTests.length > 0) {
212+ for (const p of placeholderTests) {
213+ console.log(` ⚠ placeholder: ${p.file} > ${p.name} (${p.reason})`);
214+ }
215+ }
216+}
217+
218+// ---------------------------------------------------------------------
219+// Historical mode: run the test suite at each of the last N commits
220+// that aren't already in the bundle. Opt-in via SAMA_HISTORY_DEPTH.
221+// ---------------------------------------------------------------------
222+
223+const runHistoricalCommit = (sha: string): boolean => {
224+ const wt = `/tmp/tdd-md-wt-${sha.slice(0, 12)}`;
225+ const junit = `/tmp/tdd-md-test-junit-${sha.slice(0, 12)}.xml`;
226+ // Cleanup any leftover from a previous failed run.
227+ spawnSync("git", ["worktree", "remove", "--force", wt], { cwd: REPO_ROOT });
228+ rmSync(wt, { recursive: true, force: true });
229+
230+ const add = spawnSync("git", ["worktree", "add", "--detach", wt, sha], {
231+ cwd: REPO_ROOT,
232+ encoding: "utf8",
233+ });
234+ if (add.status !== 0) {
235+ console.log(` ✗ worktree add failed for ${sha.slice(0, 7)}: ${add.stderr.trim()}`);
236+ return false;
237+ }
238+
239+ let added = false;
240+ try {
241+ // Symlink node_modules from the parent checkout. Works as long as
242+ // bun.lock didn't change between commits — true for almost every
243+ // commit on tdd.md. If it diverged, `bun test` will fail loudly
244+ // and we just skip that SHA.
245+ spawnSync("ln", ["-s", resolve(REPO_ROOT, "node_modules"), resolve(wt, "node_modules")]);
246+
247+ const ranAt = Date.now();
248+ spawnSync("bun", ["test", "--reporter=junit", `--reporter-outfile=${junit}`], {
249+ cwd: wt,
250+ stdio: "ignore",
251+ timeout: 120_000,
252+ });
253+ if (!existsSync(junit)) {
254+ console.log(` ✗ no junit output for ${sha.slice(0, 7)} — skipping`);
255+ return false;
256+ }
257+ const histXml = readFileSync(junit, "utf8");
258+ const histTests = parseJunit(histXml);
259+ if (histTests.length === 0) {
260+ console.log(` ⚠ ${sha.slice(0, 7)} produced 0 tests — likely deps mismatch, skipping`);
261+ return false;
262+ }
263+ const histPlaceholders: PlaceholderTest[] = [];
264+ for (const f of Array.from(new Set(histTests.map((t) => t.file).filter(Boolean)))) {
265+ const abs = resolve(wt, f);
266+ if (!existsSync(abs)) continue;
267+ histPlaceholders.push(...findPlaceholderTests(f, readFileSync(abs, "utf8")));
268+ }
269+ const histPassing = histTests.filter((t) => t.status === "pass").length;
270+ const histFailing = histTests.length - histPassing;
271+ const histDur = histTests.reduce((s, t) => s + t.durationMs, 0);
272+ const branchAtSha = sh("git", ["log", "-1", "--format=%D", sha]).split(",").map((s) => s.trim()).find((s) => s.startsWith("HEAD ->"))?.replace("HEAD -> ", "") ?? "(detached)";
273+
274+ bundle.runs.push({
275+ sha,
276+ branch: branchAtSha,
277+ ranAt,
278+ total: histTests.length,
279+ passing: histPassing,
280+ failing: histFailing,
281+ durationMs: histDur,
282+ tests: histTests,
283+ placeholderTests: histPlaceholders,
284+ });
285+ added = true;
286+ console.log(` ✓ ${sha.slice(0, 7)}: ${histPassing}/${histTests.length} pass, ${histFailing} fail, ${histPlaceholders.length} placeholder`);
287+ } finally {
288+ spawnSync("git", ["worktree", "remove", "--force", wt], { cwd: REPO_ROOT });
289+ rmSync(wt, { recursive: true, force: true });
290+ rmSync(junit, { force: true });
291+ }
292+ return added;
293+};
294+
295+if (HISTORY_DEPTH > 0) {
296+ console.log(`→ historical mode: walking last ${HISTORY_DEPTH} commits`);
297+ const recent = sh("git", ["log", `--max-count=${HISTORY_DEPTH + 1}`, "--pretty=format:%H"]).split("\n").slice(1); // skip HEAD
298+ let addedCount = 0;
299+ for (const sha of recent) {
300+ if (!sha) continue;
301+ if (bundle.runs.some((r) => r.sha === sha)) {
302+ console.log(` • ${sha.slice(0, 7)} already in bundle, skipping`);
303+ continue;
304+ }
305+ if (runHistoricalCommit(sha)) addedCount++;
306+ }
307+ if (addedCount > 0) {
308+ // Re-sort newest-first by ranAt before re-writing. The new
309+ // historical entries used Date.now() at the moment they ran, but
310+ // for chronology we want them positioned by commit author date.
311+ const tsByMessage = (s: string) => Date.parse(sh("git", ["log", "-1", "--format=%aI", s]));
312+ bundle.runs.sort((a, b) => tsByMessage(b.sha) - tsByMessage(a.sha));
313+ bundle.runs = bundle.runs.slice(0, MAX_RUNS);
314+ writeFileSync(bundlePath, JSON.stringify(bundle, null, 2));
315+ console.log(`✓ added ${addedCount} historical run${addedCount === 1 ? "" : "s"} → bundle (${bundle.runs.length} runs total)`);
316+ }
129317 }
modified src/c14_github.ts +98 −0
@@ -222,6 +222,12 @@ export interface TestRecord {
222222 durationMs: number;
223223 }
224224
225+export interface PlaceholderTest {
226+ name: string;
227+ file: string;
228+ reason: string;
229+}
230+
225231 export interface TestRunRecord {
226232 sha: string;
227233 branch: string;
@@ -231,6 +237,9 @@ export interface TestRunRecord {
231237 failing: number;
232238 durationMs: number;
233239 tests: TestRecord[];
240+ // Optional for backwards-compat with bundles written before the
241+ // placeholder-detection sliver shipped. Treat missing as [].
242+ placeholderTests?: PlaceholderTest[];
234243 }
235244
236245 export interface TestBundle {
@@ -239,6 +248,95 @@ export interface TestBundle {
239248 runs: TestRunRecord[];
240249 }
241250
251+// ---------------------------------------------------------------------
252+// SAMA verify support: tree listing via API (one call) + raw-content
253+// reads for every cXX_*.ts file (raw.githubusercontent.com, no rate
254+// limit). Used by /sama/verify to inspect any public repo without a
255+// token. Cached per (owner, name) for an hour.
256+// ---------------------------------------------------------------------
257+
258+export interface RepoTreeEntry {
259+ path: string;
260+ type: "blob" | "tree" | "commit";
261+ size?: number;
262+}
263+
264+interface RepoTree {
265+ defaultBranch: string;
266+ entries: RepoTreeEntry[];
267+ truncated: boolean;
268+}
269+
270+const TREE_TTL_MS = 60 * 60 * 1000;
271+const treeCache = new Map<string, { fetchedAt: number; tree: RepoTree }>();
272+
273+export const fetchRepoTree = async (
274+ repoOwner: string,
275+ repoName: string,
276+): Promise<RepoTree> => {
277+ const key = `${repoOwner}/${repoName}`;
278+ const cached = treeCache.get(key);
279+ if (cached && Date.now() - cached.fetchedAt < TREE_TTL_MS) return cached.tree;
280+
281+ const repoRes = await fetch(`https://api.github.com/repos/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}`, {
282+ headers: { Accept: "application/vnd.github+json", "User-Agent": "tdd.md" },
283+ });
284+ if (!repoRes.ok) {
285+ if (cached) return cached.tree;
286+ throw new Error(`GitHub repo lookup failed for ${repoOwner}/${repoName}: HTTP ${repoRes.status}`);
287+ }
288+ const repoMeta = (await repoRes.json()) as { default_branch?: string };
289+ const defaultBranch = repoMeta.default_branch ?? "main";
290+
291+ const treeRes = await fetch(
292+ `https://api.github.com/repos/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}/git/trees/${encodeURIComponent(defaultBranch)}?recursive=1`,
293+ { headers: { Accept: "application/vnd.github+json", "User-Agent": "tdd.md" } },
294+ );
295+ if (!treeRes.ok) {
296+ if (cached) return cached.tree;
297+ throw new Error(`GitHub tree fetch failed for ${repoOwner}/${repoName}: HTTP ${treeRes.status}`);
298+ }
299+ const data = (await treeRes.json()) as {
300+ tree?: Array<{ path: string; type: string; size?: number }>;
301+ truncated?: boolean;
302+ };
303+ const entries = (data.tree ?? []).map((e) => ({
304+ path: e.path,
305+ type: e.type as RepoTreeEntry["type"],
306+ size: e.size,
307+ }));
308+ const tree: RepoTree = { defaultBranch, entries, truncated: data.truncated ?? false };
309+ treeCache.set(key, { fetchedAt: Date.now(), tree });
310+ return tree;
311+};
312+
313+// Raw content fetch via raw.githubusercontent.com — no API rate limit.
314+// Per-call timeout via AbortController so a slow upstream can't tie up
315+// the verifier indefinitely.
316+export const fetchRepoRawFile = async (
317+ repoOwner: string,
318+ repoName: string,
319+ ref: string,
320+ path: string,
321+ timeoutMs = 10_000,
322+): Promise<string | null> => {
323+ const url = `https://raw.githubusercontent.com/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}/${encodeURIComponent(ref)}/${path.split("/").map(encodeURIComponent).join("/")}`;
324+ const ctrl = new AbortController();
325+ const t = setTimeout(() => ctrl.abort(), timeoutMs);
326+ try {
327+ const res = await fetch(url, {
328+ headers: { "User-Agent": "tdd.md" },
329+ signal: ctrl.signal,
330+ });
331+ if (!res.ok) return null;
332+ return await res.text();
333+ } catch {
334+ return null;
335+ } finally {
336+ clearTimeout(t);
337+ }
338+};
339+
242340 export const loadTestBundle = async (
243341 repoOwner: string,
244342 repoName: string,
modified src/c21_app.ts +163 −1
@@ -6,6 +6,7 @@ import {
66 renderPage,
77 renderNotFound,
88 htmlResponse,
9+ escape,
910 } from "./c51_render_layout.ts";
1011 import {
1112 projectsLandingMd,
@@ -23,7 +24,12 @@ import {
2324 adminApiHeaders,
2425 proxyToForgejo,
2526 } from "./c14_forgejo.ts";
26-import { fetchProjectConfig } from "./c14_github.ts";
27+import {
28+ fetchProjectConfig,
29+ fetchRepoTree,
30+ fetchRepoRawFile,
31+} from "./c14_github.ts";
32+import { verifySama, type SamaReport } from "./c32_sama_verify.ts";
2733 import { listGames, loadGame } from "./c31_games.ts";
2834 import { ALL_POSTS } from "./c31_blog.ts";
2935 import { ALL_GUIDES } from "./c31_guides.ts";
@@ -557,6 +563,7 @@ ${rows}
557563 snapshots: data.snapshots,
558564 stability: data.stability,
559565 unavailableNote,
566+ placeholderTests: data.placeholderTests,
560567 }),
561568 ogPath: "https://tdd.md/reports/live/tests",
562569 });
@@ -666,6 +673,148 @@ ${rows}
666673 return htmlResponse(html);
667674 },
668675
676+ "/sama/verify": async (req) => {
677+ const url = new URL(req.url);
678+ const repoArg = (url.searchParams.get("repo") ?? "").trim();
679+ const formMd = `# SAMA verify
680+
681+> Paste a public GitHub repo. tdd.md will run the four [SAMA disciplines](/sama) against the default branch — *Sorted* (lower never imports higher), *Architecture* (known layer prefixes), *Modeled* (sibling tests, types in c31_*), *Atomic* (~700-line split + placeholder-test detection) — and return a report. No clone, no token; just one tree-listing API call plus raw-content reads. Cached for an hour per repo.
682+
683+<form method="get" action="/sama/verify" class="sama-verify-form">
684+ <label>
685+ public GitHub repo:
686+ <input type="text" name="repo" placeholder="owner/name" required pattern="[^/\\s]+/[^/\\s]+" />
687+ </label>
688+ <button type="submit">verify</button>
689+</form>
690+
691+Try it on this site: [\`syntaxai/tdd.md\`](/sama/verify?repo=syntaxai/tdd.md) · or any public repo of your own.
692+
693+Limits: anonymous GitHub API quota is 60 requests/hour per IP. Each verify uses one tree-listing call; the rest of the work goes through raw.githubusercontent.com (uncapped). If the verifier returns "rate limit", come back later or use a token-authenticated proxy.
694+
695+[← /sama](/sama)
696+`;
697+
698+ if (!repoArg) {
699+ const html = await renderPage({
700+ title: "SAMA verify — tdd.md",
701+ description: "Paste a public GitHub repo, get the four SAMA disciplines verified mechanically: sorted (lower never imports higher), architecture (known layer prefixes), modeled (sibling tests), atomic (700-line + placeholder-test detection).",
702+ bodyMarkdown: formMd,
703+ ogPath: "https://tdd.md/sama/verify",
704+ active: "sama",
705+ });
706+ return htmlResponse(html);
707+ }
708+
709+ const m = /^([^\/\s]+)\/([^\/\s]+)$/.exec(repoArg);
710+ if (!m) {
711+ const html = await renderPage({
712+ title: "SAMA verify · bad input — tdd.md",
713+ description: "SAMA verify expects an owner/name repo identifier.",
714+ bodyMarkdown: `# SAMA verify\n\n> Couldn't parse \`${repoArg}\`. Use the form: \`owner/name\`.\n\n[← back](/sama/verify)\n`,
715+ ogPath: "https://tdd.md/sama/verify",
716+ active: "sama",
717+ noindex: true,
718+ });
719+ return htmlResponse(html, 400);
720+ }
721+
722+ const [, owner, name] = m;
723+ let report: SamaReport;
724+ try {
725+ // Dogfood short-circuit: tdd.md is a private repo, so the GitHub
726+ // API can't see it. When asked to verify ourselves, read the
727+ // source from the bundled `./src/` directory inside the container.
728+ // Same checks, same shape, same code path downstream.
729+ const isSelf = owner === LIVE_REPO_OWNER && name === LIVE_REPO_NAME;
730+ if (isSelf) {
731+ const { readdirSync, readFileSync } = await import("node:fs");
732+ const srcDir = "./src";
733+ const tsFiles = readdirSync(srcDir, { withFileTypes: true })
734+ .filter((e) => e.isFile() && e.name.endsWith(".ts"))
735+ .map((e) => e.name)
736+ .sort();
737+ const contents = new Map<string, string>();
738+ for (const f of tsFiles) {
739+ if (/^c\d{2}_/.test(f)) {
740+ contents.set(f, readFileSync(`${srcDir}/${f}`, "utf8"));
741+ }
742+ }
743+ report = verifySama({
744+ repoOwner: owner!,
745+ repoName: name!,
746+ defaultBranch: "main",
747+ srcPaths: tsFiles,
748+ contents,
749+ });
750+ } else {
751+ const tree = await fetchRepoTree(owner!, name!);
752+ const srcEntries = tree.entries
753+ .filter((e) => e.type === "blob" && e.path.startsWith("src/") && e.path.endsWith(".ts"))
754+ .slice(0, 200);
755+ const srcPaths = srcEntries.map((e) => e.path.slice("src/".length));
756+ const samaPaths = srcPaths.filter((p) => /^c\d{2}_/.test(p));
757+ const contents = new Map<string, string>();
758+ const fetches = await Promise.all(
759+ samaPaths.map(async (p) => [p, await fetchRepoRawFile(owner!, name!, tree.defaultBranch, `src/${p}`)] as const),
760+ );
761+ for (const [p, c] of fetches) {
762+ if (c !== null) contents.set(p, c);
763+ }
764+ report = verifySama({
765+ repoOwner: owner!,
766+ repoName: name!,
767+ defaultBranch: tree.defaultBranch,
768+ srcPaths,
769+ contents,
770+ });
771+ }
772+ } catch (e) {
773+ const msg = e instanceof Error ? e.message : String(e);
774+ const html = await renderPage({
775+ title: `SAMA verify · ${owner}/${name} · error — tdd.md`,
776+ description: `SAMA verify could not inspect ${owner}/${name}.`,
777+ bodyMarkdown: `# SAMA verify · \`${owner}/${name}\`\n\n> Couldn't fetch the repo: ${escape(msg)}\n\nMost common causes: the repo is private, the name is wrong, or you've hit GitHub's anonymous rate limit (60/hour). [← try another repo](/sama/verify)\n`,
778+ ogPath: `https://tdd.md/sama/verify?repo=${owner}/${name}`,
779+ active: "sama",
780+ noindex: true,
781+ });
782+ return htmlResponse(html, 502);
783+ }
784+
785+ const summary = report.overallPassed
786+ ? `> ✓ All four checks passed for [\`${report.repoSlug}\`](https://github.com/${report.repoSlug}) on \`${report.defaultBranch}\` (${report.samaFiles} SAMA files / ${report.testFiles} tests / ${report.totalSrcFiles} total in src/).`
787+ : `> ⚠ ${report.checks.filter((c) => !c.passed).length} of 4 checks failed for [\`${report.repoSlug}\`](https://github.com/${report.repoSlug}) on \`${report.defaultBranch}\`.`;
788+ const checkBlocks = report.checks
789+ .map((c) => {
790+ const status = c.passed ? "✓ pass" : `✗ ${c.violations.length} violation${c.violations.length === 1 ? "" : "s"}`;
791+ const violationsBlock = c.violations.length === 0
792+ ? ""
793+ : `\n\n${c.violations.slice(0, 20).map((v) => `- \`${escape(v.file)}\` — ${escape(v.detail)}`).join("\n")}${c.violations.length > 20 ? `\n- _...and ${c.violations.length - 20} more_` : ""}`;
794+ const noteBlock = c.note ? `\n\n_${escape(c.note)}_` : "";
795+ return `### ${c.letter} — ${c.property} · ${status}\n\nExamined ${c.examined} file${c.examined === 1 ? "" : "s"}.${violationsBlock}${noteBlock}`;
796+ })
797+ .join("\n\n");
798+ const reportMd = `# SAMA verify · \`${report.repoSlug}\`
799+
800+${summary}
801+
802+${checkBlocks}
803+
804+---
805+
806+[← verify another repo](/sama/verify) · [the four SAMA disciplines →](/sama) · [SAMA skill for your agent →](/sama/skill)
807+`;
808+ const html = await renderPage({
809+ title: `SAMA verify · ${report.repoSlug} — tdd.md`,
810+ description: `SAMA verification for ${report.repoSlug}: ${report.overallPassed ? "all four checks passed" : `${report.checks.filter((c) => !c.passed).length}/4 checks failed`}.`,
811+ bodyMarkdown: reportMd,
812+ ogPath: `https://tdd.md/sama/verify?repo=${report.repoSlug}`,
813+ active: "sama",
814+ });
815+ return htmlResponse(html);
816+ },
817+
669818 "/sama": async () => {
670819 const rows = ALL_SAMA
671820 .map((d) => `| **[${d.letter} — ${d.title}](/sama/${d.slug})** | ${d.rule} |`)
@@ -703,6 +852,19 @@ curl -fsSL https://tdd.md/skills/sama.md -o ~/.claude/skills/sama.md
703852
704853 The skill is the same content as the four pages here, written in obra/superpowers SKILL.md format with frontmatter, an iron-rule statement, and a verification checklist your agent can run before merging. **[Read it formatted →](/sama/skill)** · **[Raw markdown →](/skills/sama.md)**
705854
855+## verify any public repo
856+
857+Want to know whether a repo follows SAMA without reading its source? Paste the \`owner/name\` and tdd.md will run all four checks against the default branch — *Sorted* (the import-direction grep), *Architecture* (known layer prefixes), *Modeled* (sibling tests), *Atomic* (700-line + placeholder-test detection). Pass/fail per discipline, with violation lists. **[verify a repo →](/sama/verify)** · or try it on this site: [\`syntaxai/tdd.md\`](/sama/verify?repo=syntaxai/tdd.md).
858+
859+## the case behind it
860+
861+Two long-form pieces that argue *why* SAMA is shaped this way:
862+
863+- [**The Claude Code harness postmortem read through TDD + SAMA**](/blog/claude-code-harness-postmortem) — ThePaSch's r/ClaudeAI audit (40+ hidden reminders, 5 gag-order sites, 158 prompt versions in 11 days) read against the iron law and the verification grep. *The harness is loud; the diff doesn't have to be.*
864+- [**Three patterns ten threads converge on**](/blog/agentic-coding-corpus-three-patterns) — a six-month corpus of r/ClaudeAI, r/ClaudeCode, r/AgentsOfAI failure-mode threads. Per-pattern mitigation tables map each thread to the SAMA / iron-law rule that catches or prevents it.
865+
866+If you're reading these for the first time, the order to take them is harness postmortem → corpus → back here.
867+
706868 ## why these four together
707869
708870 Each property fixes a different failure mode:
modified src/c31_blog.ts +6 −0
@@ -12,6 +12,12 @@ export interface BlogEntry {
1212 }
1313
1414 export const ALL_POSTS: BlogEntry[] = [
15+ {
16+ slug: "from-rules-to-checks",
17+ title: "From rules to checks: shipping what the corpus post promised",
18+ description: "The corpus post named three checks the discipline should run. This post is the receipt. Three slivers shipped: placeholder-test detection (live on /reports/live/tests), historical-commit testing via git worktree (opt-in via SAMA_HISTORY_DEPTH), and /sama/verify - a four-discipline report runnable against any public repo. The rules are now URLs you can hit.",
19+ date: "2026-05-09",
20+ },
1521 {
1622 slug: "agentic-coding-corpus-three-patterns",
1723 title: "Three patterns ten threads converge on",
modified src/c32_real_tests.ts +5 −3
@@ -5,7 +5,7 @@
55 // Pure given the bundle + commits in (no I/O of its own beyond delegating
66 // to c14_github's bundle loader and commits fetcher).
77
8-import { fetchRepoCommits, loadTestBundle } from "./c14_github.ts";
8+import { fetchRepoCommits, loadTestBundle, type PlaceholderTest } from "./c14_github.ts";
99 import type {
1010 AgentReport,
1111 TestFailure,
@@ -31,6 +31,7 @@ export interface LiveTestData {
3131 runsCount: number;
3232 ranAt: number | null;
3333 headSha: string | null;
34+ placeholderTests: PlaceholderTest[];
3435 }
3536
3637 export const buildLiveTestData = async (
@@ -39,12 +40,12 @@ export const buildLiveTestData = async (
3940 ): Promise<LiveTestData> => {
4041 const bundle = await loadTestBundle(repoOwner, repoName);
4142 if (!bundle || bundle.runs.length === 0) {
42- return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null };
43+ return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null, placeholderTests: [] };
4344 }
4445 const repoSlug = `${repoOwner}/${repoName}`;
4546 const latest = bundle.runs[0];
4647 if (!latest) {
47- return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null };
48+ return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null, placeholderTests: [] };
4849 }
4950
5051 // For "since" we want the oldest run that has this test as failing.
@@ -136,5 +137,6 @@ export const buildLiveTestData = async (
136137 runsCount: bundle.runs.length,
137138 ranAt: latest.ranAt,
138139 headSha: latest.sha,
140+ placeholderTests: latest.placeholderTests ?? [],
139141 };
140142 };
added src/c32_sama_verify.ts +272 −0
@@ -0,0 +1,272 @@
1+// c32 — logic: pure SAMA verification given a repo's file tree and the
2+// contents of every cXX_*.ts file. Drives /sama/verify.
3+//
4+// Verifier is intentionally strict: a check passes iff there is zero
5+// evidence of violation. The four properties (S/A/M/A) each become one
6+// callable, and the top-level `verifySama(...)` runs them all and
7+// returns a SamaReport.
8+
9+export interface SamaViolation {
10+ file: string;
11+ detail: string;
12+}
13+
14+export interface SamaCheckResult {
15+ letter: "S" | "A" | "M" | "A";
16+ property: "Sorted" | "Architecture" | "Modeled" | "Atomic";
17+ passed: boolean;
18+ examined: number;
19+ violations: SamaViolation[];
20+ note?: string;
21+}
22+
23+export interface SamaReport {
24+ repoSlug: string;
25+ defaultBranch: string;
26+ totalSrcFiles: number;
27+ samaFiles: number;
28+ testFiles: number;
29+ checks: SamaCheckResult[];
30+ overallPassed: boolean;
31+ generatedAt: number;
32+}
33+
34+export interface SamaVerifyInput {
35+ repoOwner: string;
36+ repoName: string;
37+ defaultBranch: string;
38+ // src-relative paths, e.g. "c21_app.ts", "c31_blog.ts", "c32_session.test.ts"
39+ srcPaths: string[];
40+ // file path -> content. Contents only required for cXX_*.ts files
41+ // and *.test.ts files.
42+ contents: Map<string, string>;
43+}
44+
45+const SAMA_PREFIX = /^c(\d{2})_/;
46+
47+const isSamaFile = (p: string): boolean => SAMA_PREFIX.test(p) && p.endsWith(".ts");
48+const isTestFile = (p: string): boolean => p.endsWith(".test.ts");
49+
50+const layerOf = (filename: string): number | null => {
51+ const m = SAMA_PREFIX.exec(filename);
52+ if (!m) return null;
53+ return parseInt(m[1] ?? "0", 10);
54+};
55+
56+// Pull import targets out of a TypeScript source. Recognizes both
57+// static `import ... from "./x.ts"` and dynamic `import("./x.ts")`.
58+// We only care about relative imports (the cross-layer ones).
59+const collectRelativeImports = (source: string): string[] => {
60+ const out: string[] = [];
61+ const staticRe = /\bfrom\s+["']\s*(\.\/[^"']+)["']/g;
62+ const dynRe = /\bimport\s*\(\s*["']\s*(\.\/[^"']+)["']/g;
63+ let m: RegExpExecArray | null;
64+ while ((m = staticRe.exec(source)) !== null) if (m[1]) out.push(m[1]);
65+ while ((m = dynRe.exec(source)) !== null) if (m[1]) out.push(m[1]);
66+ return out;
67+};
68+
69+const importTargetFilename = (importPath: string): string => {
70+ // "./c14_github.ts" -> "c14_github.ts"
71+ return importPath.replace(/^\.\//, "");
72+};
73+
74+// S — Sorted. The rule, as practiced: foundation, data and logic layers
75+// (c1*, c3*) don't import UI (c5*+). c21 (handlers/composers) is the
76+// orchestration layer and is allowed to import anything; c51 (UI) is
77+// allowed to import models (c3*) for the data it renders. A strict
78+// "lower never imports higher" reading would forbid c21 → c31, which
79+// is the natural pattern (handler composes model). The actual
80+// constraint is one-directional: UI sits at the edge, never below.
81+const checkSorted = (input: SamaVerifyInput): SamaCheckResult => {
82+ const violations: SamaViolation[] = [];
83+ let examined = 0;
84+ for (const path of input.srcPaths) {
85+ if (!isSamaFile(path)) continue;
86+ examined++;
87+ const m = SAMA_PREFIX.exec(path);
88+ const prefix = m?.[1] ?? "";
89+ // Skip c2* (handlers, allowed to depend on anything) and c5*+ (UI,
90+ // its outbound deps are governed by other rules, not this one).
91+ if (!/^[13]/.test(prefix)) continue;
92+ const content = input.contents.get(path);
93+ if (!content) continue;
94+ for (const rawImport of collectRelativeImports(content)) {
95+ const target = importTargetFilename(rawImport);
96+ const targetMatch = SAMA_PREFIX.exec(target);
97+ const targetPrefix = targetMatch?.[1] ?? "";
98+ if (!targetPrefix) continue;
99+ if (/^[59]/.test(targetPrefix)) {
100+ violations.push({
101+ file: path,
102+ detail: `imports \`${target}\` (UI layer c${targetPrefix}_) from a non-UI/non-handler file (c${prefix}_) — UI sits at the edge, foundation/data/logic must not depend on it`,
103+ });
104+ }
105+ }
106+ }
107+ return {
108+ letter: "S",
109+ property: "Sorted",
110+ passed: violations.length === 0,
111+ examined,
112+ violations,
113+ note: examined === 0
114+ ? "no cXX_*.ts files found in the project — the convention isn't applied here"
115+ : undefined,
116+ };
117+};
118+
119+// A — Architecture. Each prefix is a known layer; flag unknown prefixes.
120+const KNOWN_LAYERS = new Set(["11", "13", "14", "21", "31", "32", "51"]);
121+const checkArchitecture = (input: SamaVerifyInput): SamaCheckResult => {
122+ const violations: SamaViolation[] = [];
123+ let examined = 0;
124+ for (const path of input.srcPaths) {
125+ if (!isSamaFile(path)) continue;
126+ examined++;
127+ const m = SAMA_PREFIX.exec(path);
128+ const prefix = m?.[1] ?? "";
129+ if (!KNOWN_LAYERS.has(prefix)) {
130+ violations.push({
131+ file: path,
132+ detail: `unknown layer prefix \`c${prefix}_\` (known: c11, c13, c14, c21, c31, c32, c51)`,
133+ });
134+ }
135+ }
136+ return {
137+ letter: "A",
138+ property: "Architecture",
139+ passed: violations.length === 0,
140+ examined,
141+ violations,
142+ };
143+};
144+
145+// M — Modeled. Tests live next to source. Every cXX_<name>.ts (non-data)
146+// should have a sibling cXX_<name>.test.ts. Pure data files (registries
147+// like c31_blog.ts that are just an exported array) often legitimately
148+// have no behaviour to test, so we soften this check by requiring a
149+// sibling for c32_*.ts (logic) at minimum, and reporting a list of c31
150+// files without siblings as informational rather than hard violations.
151+const checkModeled = (input: SamaVerifyInput): SamaCheckResult => {
152+ const violations: SamaViolation[] = [];
153+ const informational: SamaViolation[] = [];
154+ let examined = 0;
155+ const present = new Set(input.srcPaths);
156+ for (const path of input.srcPaths) {
157+ if (!isSamaFile(path) || isTestFile(path)) continue;
158+ examined++;
159+ const sibling = path.replace(/\.ts$/, ".test.ts");
160+ if (present.has(sibling)) continue;
161+ const layer = layerOf(path);
162+ if (layer === 32) {
163+ violations.push({ file: path, detail: `no sibling test file at \`${sibling}\`` });
164+ } else if (layer === 31) {
165+ informational.push({ file: path, detail: `no sibling test (often fine for pure data registries; flag if logic accumulates)` });
166+ }
167+ }
168+ const passed = violations.length === 0;
169+ const note = informational.length > 0
170+ ? `${informational.length} c31_* file${informational.length === 1 ? "" : "s"} without a sibling test — usually fine for pure-data registries, flag if logic accumulates: ${informational.map((v) => v.file).join(", ")}`
171+ : undefined;
172+ return {
173+ letter: "M",
174+ property: "Modeled",
175+ passed,
176+ examined,
177+ violations,
178+ note,
179+ };
180+};
181+
182+// A — Atomic. ~700-line split rule. Flag any cXX_*.ts over 700 lines.
183+// Also flag placeholder tests (zero expect() calls in test body) as
184+// part of the same pass — they're a structural violation of the
185+// testing surface that Atomic owns.
186+const findPlaceholderTestsLite = (file: string, content: string): SamaViolation[] => {
187+ const out: SamaViolation[] = [];
188+ const re = /\b(test|it)\s*\(\s*(["'`])((?:\\.|(?!\2).)*)\2\s*,\s*(?:async\s+)?(?:\([^)]*\)|[^=()]*?)\s*=>\s*\{/g;
189+ let m: RegExpExecArray | null;
190+ while ((m = re.exec(content)) !== null) {
191+ const name = m[3] ?? "";
192+ const startBrace = re.lastIndex - 1;
193+ let depth = 1;
194+ let i = startBrace + 1;
195+ let inString: string | null = null;
196+ while (i < content.length && depth > 0) {
197+ const c = content[i];
198+ if (inString !== null) {
199+ if (c === "\\") { i += 2; continue; }
200+ if (c === inString) inString = null;
201+ } else {
202+ if (c === '"' || c === "'" || c === "`") inString = c;
203+ else if (c === "/" && content[i + 1] === "/") {
204+ while (i < content.length && content[i] !== "\n") i++;
205+ continue;
206+ } else if (c === "/" && content[i + 1] === "*") {
207+ i += 2;
208+ while (i < content.length - 1 && !(content[i] === "*" && content[i + 1] === "/")) i++;
209+ i += 2;
210+ continue;
211+ } else if (c === "{") depth++;
212+ else if (c === "}") depth--;
213+ }
214+ i++;
215+ }
216+ const body = content.slice(startBrace + 1, i - 1);
217+ const expectCount = (body.match(/\bexpect\s*\(/g) ?? []).length;
218+ if (expectCount === 0) {
219+ out.push({ file, detail: `placeholder test \`${name}\` — zero \`expect()\` calls` });
220+ }
221+ }
222+ return out;
223+};
224+
225+const checkAtomic = (input: SamaVerifyInput): SamaCheckResult => {
226+ const violations: SamaViolation[] = [];
227+ let examined = 0;
228+ for (const path of input.srcPaths) {
229+ if (!isSamaFile(path)) continue;
230+ examined++;
231+ const content = input.contents.get(path);
232+ if (!content) continue;
233+ const lineCount = content.split("\n").length;
234+ if (lineCount > 700) {
235+ violations.push({
236+ file: path,
237+ detail: `${lineCount} lines (over the 700-line split threshold — split per UI/data domain)`,
238+ });
239+ }
240+ if (isTestFile(path)) {
241+ violations.push(...findPlaceholderTestsLite(path, content));
242+ }
243+ }
244+ return {
245+ letter: "A",
246+ property: "Atomic",
247+ passed: violations.length === 0,
248+ examined,
249+ violations,
250+ };
251+};
252+
253+export const verifySama = (input: SamaVerifyInput): SamaReport => {
254+ const samaPaths = input.srcPaths.filter(isSamaFile);
255+ const testPaths = samaPaths.filter(isTestFile);
256+ const checks = [
257+ checkSorted(input),
258+ checkArchitecture(input),
259+ checkModeled(input),
260+ checkAtomic(input),
261+ ];
262+ return {
263+ repoSlug: `${input.repoOwner}/${input.repoName}`,
264+ defaultBranch: input.defaultBranch,
265+ totalSrcFiles: input.srcPaths.length,
266+ samaFiles: samaPaths.length,
267+ testFiles: testPaths.length,
268+ checks,
269+ overallPassed: checks.every((c) => c.passed),
270+ generatedAt: Date.now(),
271+ };
272+};
modified src/c51_render_reports.ts +20 −1
@@ -38,6 +38,9 @@ export interface TestsOverviewContext {
3838 // When the runner sliver isn't wired (live mode, today), pass a
3939 // placeholder note instead of the snapshot+stability sections.
4040 unavailableNote?: string;
41+ // Placeholder-test detection: tests with zero `expect()` calls in
42+ // their body. Surfaces the failure mode from r/ClaudeCode 1qix264.
43+ placeholderTests?: { name: string; file: string; reason: string }[];
4144 }
4245
4346 const trendArrow = (delta: number): { glyph: string; cls: string } =>
@@ -275,6 +278,20 @@ ${ctx.bannerHtml}
275278 const failing = ctx.snapshots.reduce((s, r) => s + r.failing, 0);
276279 const snapshots = ctx.snapshots.map(snapshotBlock).join("\n");
277280 const stabRows = ctx.stability.map(stabilityRow).join("\n");
281+ const placeholders = ctx.placeholderTests ?? [];
282+ const placeholderBlock = placeholders.length === 0
283+ ? `## placeholder tests
284+
285+> No placeholder tests detected at this snapshot. A placeholder is a test whose body contains zero \`expect()\` calls — covered in [the corpus post](/blog/agentic-coding-corpus-three-patterns) as the failure mode from r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass rate"). Detection runs on every deploy.
286+`
287+ : `## placeholder tests · ⚠ ${placeholders.length} flagged
288+
289+> A placeholder test is one whose body contains zero \`expect()\` calls — empty body, comment-only stub, or string-literal body. Covered in [the corpus post](/blog/agentic-coding-corpus-three-patterns) as the failure mode from r/ClaudeCode 1qix264. The judge would refuse a merge that includes any of these.
290+
291+| test | file | reason |
292+|---|---|---|
293+${placeholders.map((p) => `| ${escape(p.name)} | \`${escape(p.file)}\` | ${escape(p.reason)} |`).join("\n")}
294+`;
278295 return `# tests overview
279296
280297 ${ctx.bannerHtml}
@@ -287,7 +304,9 @@ ${ctx.bannerHtml}
287304 ${snapshots}
288305 </div>
289306
290-**Total**: ${total.toLocaleString()} tests · <span class="green">${passing.toLocaleString()} passing</span> · <span class="${failing > 0 ? "red" : "muted"}">${failing.toLocaleString()} failing</span>.
307+**Total**: ${total.toLocaleString()} tests · <span class="green">${passing.toLocaleString()} passing</span> · <span class="${failing > 0 ? "red" : "muted"}">${failing.toLocaleString()} failing</span>${placeholders.length > 0 ? ` · <span class="red">${placeholders.length} placeholder ⚠</span>` : ""}.
308+
309+${placeholderBlock}
291310
292311 ## test stability · ${ctx.period}
293312