syntaxai/tdd.md · commit 2868f48

SAMA tooling: placeholder detection, sandbox runner, verify, blog post

Five slices the corpus post promised, shipped in one batch. Each one
turns a SAMA rule from text into an URL or a CI check.

1. Placeholder-test detection (caught today)
   scripts/p620/snapshot-tests.ts now walks every test() / it() body in
   junit-named files and flags zero-expect() bodies. The bundle gets a
   placeholderTests: { name, file, reason }[] field. The runtime
   /reports/live/tests page surfaces it as a flagged section. Catches
   r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass rate")
   directly.

2. Historical-commit testing (sandbox runner sliver)
   Same snapshot script accepts SAMA_HISTORY_DEPTH=N. For each of the
   last N commits not yet in the bundle, it spins up a git worktree,
   symlinks node_modules, runs bun test --reporter=junit, and appends
   the result keyed by SHA. Default 0 keeps deploy time bounded;
   opt-in for backfill or extended stability data. Groundwork for
   detecting r/ClaudeCode 1rug14a (runtime test-patching): the data
   to compare a clean-checkout run against an in-session run is now
   in the bundle.

3. /sama/verify?repo=owner/name (mechanical check, any public repo)
   New routes:
     GET /sama/verify           form
     GET /sama/verify?repo=...  four-discipline report
   New files:
     src/c14_github.ts          fetchRepoTree + fetchRepoRawFile
                                (tree via API, contents via raw -
                                no token required)
     src/c32_sama_verify.ts     pure verifier - S/A/M/A checks given
                                file list + contents
   Special case: tdd.md is a private repo; when asked to verify
   ourselves, the route reads ./src/ from the container instead of
   GitHub. The dogfood result is honest: S+A pass, M flags 5 c32_*
   files without sibling tests, A flags c21_app.ts at 1066 lines
   (over the 700-line atomic threshold). The verifier catches its
   own makers - which is the proof the verifier works.

   Sorted-rule correction: the strict "lower never imports higher"
   reading was self-contradictory (it forbade c21 -> c31, the
   natural pattern). The actual rule, now consistent across
   /sama/sorted, /sama/skill, and the verifier: foundation/data/
   logic (c1*, c3*) don't depend on UI (c5*+); handlers (c21) are
   the orchestration layer and may import anything. The grep is
   src/c1*.ts src/c3*.ts only.

4. Cross-link sweep
   Both blog posts (harness postmortem + corpus) gained /sama/skill
   and /sama/verify in their footers. /sama/skill's Reference
   section gained both blog posts and /sama/verify. /sama landing
   gained "verify any public repo" and "the case behind it"
   sections.

5. Wrap-up blog post: from-rules-to-checks
   Documents what shipped, with the dogfood result printed verbatim
   so the case is checkable rather than asserted.

The Atomic violation on c21_app.ts (1066 lines) is real and signals
that the dispatcher needs a per-domain split per the SAMA rule. That
split is the next sliver, not this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

author: syntaxai <[email protected]>
date: 2026-05-09 19:46:43 +01:00
parent: a92d5a5
commit: 2868f48b6d44ae47282d3d2dfb0f4d7e161b3ecd

12 files changed · +866 −22

modified content/blog/agentic-coding-corpus-three-patterns.md +1 −1

@@ -184,4 +184,4 @@ One thread is an audit. Ten threads are a pattern. The corpus shows three things
184	184
185	185	Plus the original one-thread audit: [ThePaSch — Claude Code has big problems and the post-mortem is not enough](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/) (325↑, 200+ comments). Treat this row as the eleventh entry.
186	186
187		-[Read the previous post →](/blog/claude-code-harness-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [back to the blog](/blog)
	187	+[Read the previous post →](/blog/claude-code-harness-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify your repo →](/sama/verify) · [back to the blog](/blog)

modified content/blog/claude-code-harness-postmortem.md +1 −1

@@ -123,4 +123,4 @@ None of this is in the user's hands. But the takeaway underneath is: while Anthr
123	123
124	124	The harness is loud. The diff doesn't have to be. TDD's iron law and SAMA's three falsifiable rules survive every reminder-bombardment, gag-order, importance-inflation, and contradictory-instruction the OP catalogues, because they are enforced outside the agent's context window — in the commit log and the file tree. They do not fix Anthropic's prompt sprawl, and they do not refund the tokens. What they do is make the work the agent ships externally verifiable, regardless of what the agent was told on the way there.
125	125
126		-[Read the original Reddit thread →](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/claude_code_has_big_problems_and_the_postmortem/) · [Anthropic's April-23 postmortem →](https://www.anthropic.com/engineering/april-23-postmortem) · [the four SAMA disciplines →](/sama) · [back to the blog](/blog)
	126	+[Read the original Reddit thread →](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/claude_code_has_big_problems_and_the_postmortem/) · [Anthropic's April-23 postmortem →](https://www.anthropic.com/engineering/april-23-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify your repo →](/sama/verify) · [back to the blog](/blog)

added content/blog/from-rules-to-checks.md +92 −0

@@ -0,0 +1,92 @@
	1	+# From rules to checks: shipping what the corpus post promised
	2	+
	3	+> The [corpus post](/blog/agentic-coding-corpus-three-patterns) closed with a promise: "three of the ten threads describe failures only an actual test run can catch" — and named which checks would have caught which failure modes (some today, some with a small extension, some only with a sandbox runner). This post is the receipt. Three of those checks now ship: placeholder-test detection (the "one-evening sliver"), historical-commit testing via git worktree (the "next slice on the roadmap"), and `/sama/verify` (mechanical layer grep + sibling-test + line-count + placeholder check, runnable against any public repo).
	4	+
	5	+## why this post is short
	6	+
	7	+The two previous posts made an argument. This one documents an outcome. If the argument was right, the outcome should be small and obvious: the rules become checks, the checks become routes, the routes catch the failure modes the corpus catalogued. Here's what that looks like in practice.
	8	+
	9	+## 1 · placeholder-test detection (caught today)
	10	+
	11	+Failure mode: r/ClaudeCode `1qix264`, "Claude wrote 90 placeholder tests and reported 100% pass rate". The corpus post said:
	12	+
	13	+> "empty assertion bodies — zero `expect()` calls, string-literal bodies, single-line `// TODO` stubs — are AST-checkable. The test bundle already lives in `content/git-history/syntaxai__tdd.md__tests.json`; an empty-body check is a one-evening sliver."
	14	+
	15	+The check is a regex-based brace walker that extracts every `test(...)` and `it(...)` body, counts `expect(` occurrences, and flags zero-count bodies. It runs at deploy time as part of the existing `snapshot-tests.ts` script and writes its findings to the bundle as `placeholderTests: { name, file, reason }[]`. The runtime renderer surfaces them on [`/reports/live/tests`](/reports/live/tests):
	16	+
	17	+- Zero placeholders → a small "no placeholder tests detected at this snapshot" note explaining what the check looks for.
	18	+- One or more → a flagged section with a per-test table: name, file, reason ("no expect() calls", "empty test body", "comment-only stub").
	19	+
	20	+It catches the most common shape of `1qix264` directly (`expect()` count is zero). It misses theoretical ones (custom assertion helpers that don't go through `expect`); the regex's blast radius is the real failures, not every imaginable one.
	21	+
	22	+## 2 · historical-commit testing (the sandbox runner sliver)
	23	+
	24	+Failure mode: r/ClaudeCode `1rug14a`, "Claude wrote Playwright tests that secretly patched the app at runtime". This is the failure that the previous reporting layer couldn't catch — the diff looks fine, the test passes in the agent's terminal, the test passes in the deploy-time bundle too if the bundle only ever ran HEAD. Catching this needs the same test to run somewhere it's never run before, against the actual code at that SHA.
	25	+
	26	+The new mode: `SAMA_HISTORY_DEPTH=N` in the deploy environment makes the snapshot script also test the last N commits that aren't already in the bundle. Mechanically:
	27	+
	28	+```ts
	29	+// scripts/p620/snapshot-tests.ts (excerpt)
	30	+git worktree add --detach /tmp/tdd-md-wt-<sha> <sha>
	31	+ln -s "$REPO_ROOT/node_modules" "$WORKTREE/node_modules"
	32	+bun test --reporter=junit --reporter-outfile=/tmp/junit-<sha>.xml
	33	+git worktree remove --force /tmp/tdd-md-wt-<sha>
	34	+```
	35	+
	36	+Each historical run produces the same `TestRunRecord` shape as a HEAD run, gets appended to the bundle keyed by SHA, and feeds the existing stability table. Two consequences:
	37	+
	38	+- Stability data builds 10× faster. A first `SAMA_HISTORY_DEPTH=10` deploy backfills ten runs in one go instead of waiting ten deploys.
	39	+- Runtime-patching becomes detectable in principle. A test that passed in the agent's session AND in the original deploy run, but fails when re-run from a clean worktree at the same SHA, is the smoking-gun shape of `1rug14a`. We're not yet wired to flag the discrepancy as a separate failure mode (that's the next sliver), but the data to compare is now in the bundle.
	40	+
	41	+The default is still `HISTORY_DEPTH=0` (HEAD-only). Opt-in keeps deploy time bounded; flipping the default to `5` or `10` is a one-line change once we want it on by default.
	42	+
	43	+## 3 · `/sama/verify` (mechanical check for any public repo)
	44	+
	45	+The corpus post argued: "don't write a CLAUDE.md instruction the harness can overrule. Write a structural check the harness doesn't get to know about." That argument is hollow if the structural checks aren't actually runnable. The new route closes the loop:
	46	+
	47	+[/sama/verify](/sama/verify) — paste a public GitHub repo, get a four-discipline report. The mechanics:
	48	+
	49	+1. One GitHub API call to `git/trees/<default-branch>?recursive=1` resolves the file list.
	50	+2. Every `src/cXX_*.ts` file is fetched via `raw.githubusercontent.com` (no API rate limit, no token).
	51	+3. Pure logic in [`c32_sama_verify.ts`](https://github.com/syntaxai/tdd.md/blob/main/src/c32_sama_verify.ts) runs the four checks:
	52	+ - S — Sorted: every relative `from "./..."` import in a `cXX_*.ts` is parsed; flag if the target's prefix is higher than the source's.
	53	+ - A — Architecture: every `cXX_` prefix is matched against the known set (`c11`, `c13`, `c14`, `c21`, `c31`, `c32`, `c51`); unknown ones flagged.
	54	+ - M — Modeled: every `cXX_<name>.ts` (non-test) is checked for a sibling `cXX_<name>.test.ts`. Hard-fails for `c32_` (logic); informational for `c31_` (often pure-data registries).
	55	+ - A — Atomic: line count over 700 → flagged. Test files → run the same placeholder check from sliver #1.
	56	+
	57	+Output: pass/fail per discipline, with up to 20 violations per check listed (`file` + `detail`). Cached for an hour per repo.
	58	+
	59	+Try it on this site: [`/sama/verify?repo=syntaxai/tdd.md`](/sama/verify?repo=syntaxai/tdd.md). And here's the dogfood result, honestly:
	60	+
	61	+\| check \| tdd.md self-verify result \|
	62	+\|---\|---\|
	63	+\| S — Sorted \| ✓ pass — no UI dependency leaks into foundation/data/logic \|
	64	+\| A — Architecture \| ✓ pass — every prefix is in the known set \|
	65	+\| M — Modeled \| ✗ 5 violations — `c32_judge.ts`, `c32_session.ts`, `c32_real_reports.ts`, `c32_real_tests.ts`, `c32_sama_verify.ts` lack sibling test files \|
	66	+\| A — Atomic \| ✗ 1 violation — `c21_app.ts` is 1066 lines (over the 700-line split threshold) \|
	67	+
	68	+Two of four fail, and they're real. Five `c32_` logic files — including `c32_sama_verify.ts`, the file that runs the verification* — don't have sibling tests yet, and the route dispatcher has grown past the atomic threshold and now needs a per-domain split. Both findings were caught by the tool we just shipped, against the codebase we just shipped it from. That's the dogfood story: not "everything passes" but "the tool catches real things in real code, including its own". Both are on the very next slice of the roadmap.
	69	+
	70	+## what this changes about the case
	71	+
	72	+The argument has now happened in three layers:
	73	+
	74	+1. The harness postmortem post said: structural rules survive harness chaos because they're enforced outside the agent's context window.
	75	+2. The corpus post said: ten threads prove the failure modes are systematic, here are the rules that catch each, here's what we catch and what we don't yet.
	76	+3. This post says: the rules are now checks, the checks are now URLs you can hit, and you can verify the case against any public repo including this one.
	77	+
	78	+The leftover work — flagging a runtime-patching discrepancy as a distinct failure mode, hidden-test verification on real-project commits, AST-level placeholder detection beyond the regex — is in the open. It's smaller than what shipped this week.
	79	+
	80	+## tl;dr
	81	+
	82	+The two previous posts made a case from text. This one ships the checks the case promised:
	83	+
	84	+\| sliver \| route \| catches \| status \|
	85	+\|---\|---\|---\|---\|
	86	+\| placeholder detection \| [/reports/live/tests](/reports/live/tests) \| r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass") \| live \|
	87	+\| historical-commit testing \| snapshot script with `SAMA_HISTORY_DEPTH=N` \| runtime-patching SHAs ([groundwork for 1rug14a](/blog/agentic-coding-corpus-three-patterns)) \| opt-in, default 0 \|
	88	+\| `/sama/verify` \| [/sama/verify](/sama/verify) \| layer violations, missing sibling tests, oversized files, placeholder tests, in any public repo \| live \|
	89	+
	90	+If the discipline is real, you should be able to point it at a repo and have it report findings. Now you can.
	91	+
	92	+[← back to the blog](/blog) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify a repo →](/sama/verify)

modified content/sama/skill.md +8 −3

@@ -30,16 +30,18 @@ Thinking "this one helper doesn't need a prefix"? Stop. That's how the rule erod
30	30	## The Iron Rule
31	31
32	32	```
33		-LOWER LAYERS NEVER IMPORT FROM HIGHER LAYERS
	33	+UI SITS AT THE EDGE — FOUNDATION, DATA AND LOGIC LAYERS NEVER DEPEND ON UI
34	34	```
35	35
	36	+Foundation/data/logic (`c1`, `c3`) must never import UI (`c5*+`). Handlers (`c21`) are the orchestration layer and may compose UI; UI itself may read models for the data it renders.
	37	+
36	38	Verify with one grep:
37	39
38	40	```bash
39		-grep -rE 'from "\./c[5-9]' src/c1.ts src/c2.ts src/c3*.ts
	41	+grep -rE 'from "\./c[5-9]' src/c1.ts src/c3.ts
40	42	```
41	43
42		-Empty output = rule holds. Any output = a lower layer reaches into a higher one. Either move the function or rename the file. Do not "fix" the violation by deleting the import without understanding what broke.
	44	+Empty output = rule holds. Any output = a UI dependency has leaked into foundation/data/logic. Move the function or rename the file. Do not "fix" the violation by deleting the import without understanding what broke.
43	45
44	46	## The Four Letters
45	47
@@ -184,3 +186,6 @@ Or, if the live and demo body builders are mostly the same shape parameterised b
184	186
185	187	- The four disciplines with examples: https://tdd.md/sama
186	188	- Why SAMA compounds with TDD and token-discipline: https://tdd.md/blog/three-constraints-agentic-coding
	189	+- The Reddit harness postmortem this skill is a response to: https://tdd.md/blog/claude-code-harness-postmortem
	190	+- Ten more threads, three patterns, mitigation tables: https://tdd.md/blog/agentic-coding-corpus-three-patterns
	191	+- Mechanically verify any public repo against these rules: https://tdd.md/sama/verify

modified content/sama/sorted.md +3 −3

@@ -1,6 +1,6 @@
1	1	# S — Sorted
2	2
3		-> Rule: alphabetical sort = dependency direction. Lower-numbered layers never import from higher-numbered ones.
	3	+> Rule: alphabetical sort = dependency direction. UI sits at the edge — foundation, data and logic layers (`c1`, `c3`) never depend on UI (`c5*+`). Handlers (`c21`) are the orchestration layer and may import anything.
4	4
5	5	The first property of SAMA. Run `ls src/` and you have the architecture diagram. There is no separate "where does the data flow?" document because the file system answers it.
6	6
@@ -16,10 +16,10 @@ Two reasons:
16	16	Run this from the repo root:
17	17
18	18	```bash
19		-grep -rE 'from "\./c[5-9]' src/c1.ts src/c2.ts src/c3*.ts
	19	+grep -rE 'from "\./c[5-9]' src/c1.ts src/c3.ts
20	20	```
21	21
22		-If the output is empty, the rule holds. If anything appears, you have a higher layer leaking into a lower one — fix the import or move the file.
	22	+If the output is empty, the rule holds. If anything appears, you have a UI dependency leaking into a foundation, data or logic layer — fix the import or move the file. (Note: `c2*` files are intentionally excluded — handlers compose UI calls, so `c21` → `c51` is the normal pattern.)
23	23
24	24	This is the single load-bearing test for the Sorted property. Wire it into CI and forget about it.
25	25

modified scripts/p620/snapshot-tests.ts +197 −9

@@ -1,18 +1,21 @@
1	1	#!/usr/bin/env bun
2		-// Run `bun test` on the current HEAD and append the result to a
3		-// per-repo bundle alongside the git-history snapshot. The container
4		-// reads this bundle at runtime to render /reports/live/tests for the
5		-// (private) syntaxai/tdd.md repo without needing a runtime sandbox.
	2	+// Run `bun test` on the current HEAD (and optionally the last N
	3	+// historical commits) and append the results to a per-repo bundle
	4	+// alongside the git-history snapshot. The container reads this bundle
	5	+// at runtime to render /reports/live/tests for the (private)
	6	+// syntaxai/tdd.md repo without needing a runtime sandbox.
6	7	//
7		-// Strategy: HEAD-only per deploy. The bundle accumulates one run per
8		-// deploy (capped at 50), so stability data builds organically over
9		-// time. No git-worktree gymnastics, no per-commit bun-install.
	8	+// HEAD mode (default): one new run per deploy, fast, no worktree.
	9	+// History mode (SAMA_HISTORY_DEPTH=N): also runs the last N commits
	10	+// that aren't already in the bundle, via git worktree + bun install
	11	+// per SHA. Slower (~5-10s/commit) but builds real stability data
	12	+// instead of waiting for it to accumulate organically.
10	13	//
11	14	// Output: content/git-history/<owner>__<name>__tests.json
12	15	// Schema: { owner, name, runs: TestRunRecord[] } — newest first.
13	16
14	17	import { spawnSync } from "node:child_process";
15		-import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
	18	+import { existsSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs";
16	19	import { resolve } from "node:path";
17	20
18	21	const REPO_ROOT = resolve(import.meta.dir, "..", "..");
@@ -20,6 +23,7 @@ const OWNER = "syntaxai";
20	23	const NAME = "tdd.md";
21	24	const MAX_RUNS = 50;
22	25	const JUNIT_OUT = "/tmp/tdd-md-test-junit.xml";
	26	+const HISTORY_DEPTH = parseInt(process.env.SAMA_HISTORY_DEPTH ?? "0", 10);
23	27
24	28	const sh = (cmd: string, args: string[]): string => {
25	29	const r = spawnSync(cmd, args, { cwd: REPO_ROOT, encoding: "utf8" });
@@ -81,6 +85,82 @@ const passing = tests.filter((t) => t.status === "pass").length;
81	85	const failing = tests.length - passing;
82	86	const totalDurationMs = tests.reduce((s, t) => s + t.durationMs, 0);
83	87
	88	+interface PlaceholderTest {
	89	+ name: string;
	90	+ file: string;
	91	+ reason: string;
	92	+}
	93	+
	94	+// Placeholder detection. Catches the failure mode from r/ClaudeCode
	95	+// post 1qix264 ("90 placeholder tests, 100% pass rate"): tests with
	96	+// zero `expect(` calls in their body are flagged. Regex-based brace
	97	+// matching — full AST is overkill for the one structural property we
	98	+// care about. Limitations: misses tests that delegate to a custom
	99	+// assertion helper or pass through a subroutine. Acceptable for v1;
	100	+// the catch is the common failure shape, not every theoretical one.
	101	+const findPlaceholderTests = (testFile: string, content: string): PlaceholderTest[] => {
	102	+ const out: PlaceholderTest[] = [];
	103	+ const re = /\b(test\|it)\s$\s(["'`])((?:\\.\|(?!\2).))\2\s,\s(?:async\s+)?(?:\([^)]$\|[^=()]?)\s=>\s*\{/g;
	104	+ let m: RegExpExecArray \| null;
	105	+ while ((m = re.exec(content)) !== null) {
	106	+ const name = m[3] ?? "";
	107	+ const startBrace = re.lastIndex - 1;
	108	+ let depth = 1;
	109	+ let i = startBrace + 1;
	110	+ let inString: string \| null = null;
	111	+ while (i < content.length && depth > 0) {
	112	+ const c = content[i];
	113	+ if (inString !== null) {
	114	+ if (c === "\\") { i += 2; continue; }
	115	+ if (c === inString) inString = null;
	116	+ } else {
	117	+ if (c === '"' \|\| c === "'" \|\| c === "`") inString = c;
	118	+ else if (c === "/" && content[i + 1] === "/") {
	119	+ // line comment
	120	+ while (i < content.length && content[i] !== "\n") i++;
	121	+ continue;
	122	+ }
	123	+ else if (c === "/" && content[i + 1] === "*") {
	124	+ // block comment
	125	+ i += 2;
	126	+ while (i < content.length - 1 && !(content[i] === "*" && content[i + 1] === "/")) i++;
	127	+ i += 2;
	128	+ continue;
	129	+ }
	130	+ else if (c === "{") depth++;
	131	+ else if (c === "}") depth--;
	132	+ }
	133	+ i++;
	134	+ }
	135	+ const body = content.slice(startBrace + 1, i - 1);
	136	+ const expectCount = (body.match(/\bexpect\s*\(/g) ?? []).length;
	137	+ if (expectCount === 0) {
	138	+ const trimmedLen = body.replace(/\s+/g, "").length;
	139	+ const reason = trimmedLen === 0
	140	+ ? "empty test body"
	141	+ : trimmedLen < 20 && /^\s*\/\//.test(body.trim())
	142	+ ? "comment-only stub"
	143	+ : "no expect() calls in test body";
	144	+ out.push({ name, file: testFile, reason });
	145	+ }
	146	+ }
	147	+ return out;
	148	+};
	149	+
	150	+const detectPlaceholders = (testFiles: string[]): PlaceholderTest[] => {
	151	+ const out: PlaceholderTest[] = [];
	152	+ for (const f of testFiles) {
	153	+ const abs = resolve(REPO_ROOT, f);
	154	+ if (!existsSync(abs)) continue;
	155	+ const content = readFileSync(abs, "utf8");
	156	+ out.push(...findPlaceholderTests(f, content));
	157	+ }
	158	+ return out;
	159	+};
	160	+
	161	+const uniqueTestFiles = Array.from(new Set(tests.map((t) => t.file).filter(Boolean)));
	162	+const placeholderTests = detectPlaceholders(uniqueTestFiles);
	163	+
84	164	interface TestRunRecord {
85	165	sha: string;
86	166	branch: string;
@@ -90,6 +170,7 @@ interface TestRunRecord {
90	170	failing: number;
91	171	durationMs: number;
92	172	tests: TestRecord[];
	173	+ placeholderTests: PlaceholderTest[];
93	174	}
94	175
95	176	interface TestBundle {
@@ -121,9 +202,116 @@ if (bundle.runs.some((r) => r.sha === head)) {
121	202	failing,
122	203	durationMs: totalDurationMs,
123	204	tests,
	205	+ placeholderTests,
124	206	});
125	207	bundle.runs = bundle.runs.slice(0, MAX_RUNS);
126	208	mkdirSync(resolve(REPO_ROOT, "content", "git-history"), { recursive: true });
127	209	writeFileSync(bundlePath, JSON.stringify(bundle, null, 2));
128		- console.log(`✓ tests at ${head.slice(0, 7)} (${branch}): ${passing}/${tests.length} pass, ${failing} fail → bundle (${bundle.runs.length} runs total)`);
	210	+ console.log(`✓ tests at ${head.slice(0, 7)} (${branch}): ${passing}/${tests.length} pass, ${failing} fail, ${placeholderTests.length} placeholder → bundle (${bundle.runs.length} runs total)`);
	211	+ if (placeholderTests.length > 0) {
	212	+ for (const p of placeholderTests) {
	213	+ console.log(` ⚠ placeholder: ${p.file} > ${p.name} (${p.reason})`);
	214	+ }
	215	+ }
	216	+}
	217	+
	218	+// ---------------------------------------------------------------------
	219	+// Historical mode: run the test suite at each of the last N commits
	220	+// that aren't already in the bundle. Opt-in via SAMA_HISTORY_DEPTH.
	221	+// ---------------------------------------------------------------------
	222	+
	223	+const runHistoricalCommit = (sha: string): boolean => {
	224	+ const wt = `/tmp/tdd-md-wt-${sha.slice(0, 12)}`;
	225	+ const junit = `/tmp/tdd-md-test-junit-${sha.slice(0, 12)}.xml`;
	226	+ // Cleanup any leftover from a previous failed run.
	227	+ spawnSync("git", ["worktree", "remove", "--force", wt], { cwd: REPO_ROOT });
	228	+ rmSync(wt, { recursive: true, force: true });
	229	+
	230	+ const add = spawnSync("git", ["worktree", "add", "--detach", wt, sha], {
	231	+ cwd: REPO_ROOT,
	232	+ encoding: "utf8",
	233	+ });
	234	+ if (add.status !== 0) {
	235	+ console.log(` ✗ worktree add failed for ${sha.slice(0, 7)}: ${add.stderr.trim()}`);
	236	+ return false;
	237	+ }
	238	+
	239	+ let added = false;
	240	+ try {
	241	+ // Symlink node_modules from the parent checkout. Works as long as
	242	+ // bun.lock didn't change between commits — true for almost every
	243	+ // commit on tdd.md. If it diverged, `bun test` will fail loudly
	244	+ // and we just skip that SHA.
	245	+ spawnSync("ln", ["-s", resolve(REPO_ROOT, "node_modules"), resolve(wt, "node_modules")]);
	246	+
	247	+ const ranAt = Date.now();
	248	+ spawnSync("bun", ["test", "--reporter=junit", `--reporter-outfile=${junit}`], {
	249	+ cwd: wt,
	250	+ stdio: "ignore",
	251	+ timeout: 120_000,
	252	+ });
	253	+ if (!existsSync(junit)) {
	254	+ console.log(` ✗ no junit output for ${sha.slice(0, 7)} — skipping`);
	255	+ return false;
	256	+ }
	257	+ const histXml = readFileSync(junit, "utf8");
	258	+ const histTests = parseJunit(histXml);
	259	+ if (histTests.length === 0) {
	260	+ console.log(` ⚠ ${sha.slice(0, 7)} produced 0 tests — likely deps mismatch, skipping`);
	261	+ return false;
	262	+ }
	263	+ const histPlaceholders: PlaceholderTest[] = [];
	264	+ for (const f of Array.from(new Set(histTests.map((t) => t.file).filter(Boolean)))) {
	265	+ const abs = resolve(wt, f);
	266	+ if (!existsSync(abs)) continue;
	267	+ histPlaceholders.push(...findPlaceholderTests(f, readFileSync(abs, "utf8")));
	268	+ }
	269	+ const histPassing = histTests.filter((t) => t.status === "pass").length;
	270	+ const histFailing = histTests.length - histPassing;
	271	+ const histDur = histTests.reduce((s, t) => s + t.durationMs, 0);
	272	+ const branchAtSha = sh("git", ["log", "-1", "--format=%D", sha]).split(",").map((s) => s.trim()).find((s) => s.startsWith("HEAD ->"))?.replace("HEAD -> ", "") ?? "(detached)";
	273	+
	274	+ bundle.runs.push({
	275	+ sha,
	276	+ branch: branchAtSha,
	277	+ ranAt,
	278	+ total: histTests.length,
	279	+ passing: histPassing,
	280	+ failing: histFailing,
	281	+ durationMs: histDur,
	282	+ tests: histTests,
	283	+ placeholderTests: histPlaceholders,
	284	+ });
	285	+ added = true;
	286	+ console.log(` ✓ ${sha.slice(0, 7)}: ${histPassing}/${histTests.length} pass, ${histFailing} fail, ${histPlaceholders.length} placeholder`);
	287	+ } finally {
	288	+ spawnSync("git", ["worktree", "remove", "--force", wt], { cwd: REPO_ROOT });
	289	+ rmSync(wt, { recursive: true, force: true });
	290	+ rmSync(junit, { force: true });
	291	+ }
	292	+ return added;
	293	+};
	294	+
	295	+if (HISTORY_DEPTH > 0) {
	296	+ console.log(`→ historical mode: walking last ${HISTORY_DEPTH} commits`);
	297	+ const recent = sh("git", ["log", `--max-count=${HISTORY_DEPTH + 1}`, "--pretty=format:%H"]).split("\n").slice(1); // skip HEAD
	298	+ let addedCount = 0;
	299	+ for (const sha of recent) {
	300	+ if (!sha) continue;
	301	+ if (bundle.runs.some((r) => r.sha === sha)) {
	302	+ console.log(` • ${sha.slice(0, 7)} already in bundle, skipping`);
	303	+ continue;
	304	+ }
	305	+ if (runHistoricalCommit(sha)) addedCount++;
	306	+ }
	307	+ if (addedCount > 0) {
	308	+ // Re-sort newest-first by ranAt before re-writing. The new
	309	+ // historical entries used Date.now() at the moment they ran, but
	310	+ // for chronology we want them positioned by commit author date.
	311	+ const tsByMessage = (s: string) => Date.parse(sh("git", ["log", "-1", "--format=%aI", s]));
	312	+ bundle.runs.sort((a, b) => tsByMessage(b.sha) - tsByMessage(a.sha));
	313	+ bundle.runs = bundle.runs.slice(0, MAX_RUNS);
	314	+ writeFileSync(bundlePath, JSON.stringify(bundle, null, 2));
	315	+ console.log(`✓ added ${addedCount} historical run${addedCount === 1 ? "" : "s"} → bundle (${bundle.runs.length} runs total)`);
	316	+ }
129	317	}

modified src/c14_github.ts +98 −0

@@ -222,6 +222,12 @@ export interface TestRecord {
222	222	durationMs: number;
223	223	}
224	224
	225	+export interface PlaceholderTest {
	226	+ name: string;
	227	+ file: string;
	228	+ reason: string;
	229	+}
	230	+
225	231	export interface TestRunRecord {
226	232	sha: string;
227	233	branch: string;
@@ -231,6 +237,9 @@ export interface TestRunRecord {
231	237	failing: number;
232	238	durationMs: number;
233	239	tests: TestRecord[];
	240	+ // Optional for backwards-compat with bundles written before the
	241	+ // placeholder-detection sliver shipped. Treat missing as [].
	242	+ placeholderTests?: PlaceholderTest[];
234	243	}
235	244
236	245	export interface TestBundle {
@@ -239,6 +248,95 @@ export interface TestBundle {
239	248	runs: TestRunRecord[];
240	249	}
241	250
	251	+// ---------------------------------------------------------------------
	252	+// SAMA verify support: tree listing via API (one call) + raw-content
	253	+// reads for every cXX_*.ts file (raw.githubusercontent.com, no rate
	254	+// limit). Used by /sama/verify to inspect any public repo without a
	255	+// token. Cached per (owner, name) for an hour.
	256	+// ---------------------------------------------------------------------
	257	+
	258	+export interface RepoTreeEntry {
	259	+ path: string;
	260	+ type: "blob" \| "tree" \| "commit";
	261	+ size?: number;
	262	+}
	263	+
	264	+interface RepoTree {
	265	+ defaultBranch: string;
	266	+ entries: RepoTreeEntry[];
	267	+ truncated: boolean;
	268	+}
	269	+
	270	+const TREE_TTL_MS = 60 * 60 * 1000;
	271	+const treeCache = new Map<string, { fetchedAt: number; tree: RepoTree }>();
	272	+
	273	+export const fetchRepoTree = async (
	274	+ repoOwner: string,
	275	+ repoName: string,
	276	+): Promise<RepoTree> => {
	277	+ const key = `${repoOwner}/${repoName}`;
	278	+ const cached = treeCache.get(key);
	279	+ if (cached && Date.now() - cached.fetchedAt < TREE_TTL_MS) return cached.tree;
	280	+
	281	+ const repoRes = await fetch(`https://api.github.com/repos/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}`, {
	282	+ headers: { Accept: "application/vnd.github+json", "User-Agent": "tdd.md" },
	283	+ });
	284	+ if (!repoRes.ok) {
	285	+ if (cached) return cached.tree;
	286	+ throw new Error(`GitHub repo lookup failed for ${repoOwner}/${repoName}: HTTP ${repoRes.status}`);
	287	+ }
	288	+ const repoMeta = (await repoRes.json()) as { default_branch?: string };
	289	+ const defaultBranch = repoMeta.default_branch ?? "main";
	290	+
	291	+ const treeRes = await fetch(
	292	+ `https://api.github.com/repos/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}/git/trees/${encodeURIComponent(defaultBranch)}?recursive=1`,
	293	+ { headers: { Accept: "application/vnd.github+json", "User-Agent": "tdd.md" } },
	294	+ );
	295	+ if (!treeRes.ok) {
	296	+ if (cached) return cached.tree;
	297	+ throw new Error(`GitHub tree fetch failed for ${repoOwner}/${repoName}: HTTP ${treeRes.status}`);
	298	+ }
	299	+ const data = (await treeRes.json()) as {
	300	+ tree?: Array<{ path: string; type: string; size?: number }>;
	301	+ truncated?: boolean;
	302	+ };
	303	+ const entries = (data.tree ?? []).map((e) => ({
	304	+ path: e.path,
	305	+ type: e.type as RepoTreeEntry["type"],
	306	+ size: e.size,
	307	+ }));
	308	+ const tree: RepoTree = { defaultBranch, entries, truncated: data.truncated ?? false };
	309	+ treeCache.set(key, { fetchedAt: Date.now(), tree });
	310	+ return tree;
	311	+};
	312	+
	313	+// Raw content fetch via raw.githubusercontent.com — no API rate limit.
	314	+// Per-call timeout via AbortController so a slow upstream can't tie up
	315	+// the verifier indefinitely.
	316	+export const fetchRepoRawFile = async (
	317	+ repoOwner: string,
	318	+ repoName: string,
	319	+ ref: string,
	320	+ path: string,
	321	+ timeoutMs = 10_000,
	322	+): Promise<string \| null> => {
	323	+ const url = `https://raw.githubusercontent.com/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}/${encodeURIComponent(ref)}/${path.split("/").map(encodeURIComponent).join("/")}`;
	324	+ const ctrl = new AbortController();
	325	+ const t = setTimeout(() => ctrl.abort(), timeoutMs);
	326	+ try {
	327	+ const res = await fetch(url, {
	328	+ headers: { "User-Agent": "tdd.md" },
	329	+ signal: ctrl.signal,
	330	+ });
	331	+ if (!res.ok) return null;
	332	+ return await res.text();
	333	+ } catch {
	334	+ return null;
	335	+ } finally {
	336	+ clearTimeout(t);
	337	+ }
	338	+};
	339	+
242	340	export const loadTestBundle = async (
243	341	repoOwner: string,
244	342	repoName: string,

modified src/c21_app.ts +163 −1

@@ -6,6 +6,7 @@ import {
6	6	renderPage,
7	7	renderNotFound,
8	8	htmlResponse,
	9	+ escape,
9	10	} from "./c51_render_layout.ts";
10	11	import {
11	12	projectsLandingMd,
@@ -23,7 +24,12 @@ import {
23	24	adminApiHeaders,
24	25	proxyToForgejo,
25	26	} from "./c14_forgejo.ts";
26		-import { fetchProjectConfig } from "./c14_github.ts";
	27	+import {
	28	+ fetchProjectConfig,
	29	+ fetchRepoTree,
	30	+ fetchRepoRawFile,
	31	+} from "./c14_github.ts";
	32	+import { verifySama, type SamaReport } from "./c32_sama_verify.ts";
27	33	import { listGames, loadGame } from "./c31_games.ts";
28	34	import { ALL_POSTS } from "./c31_blog.ts";
29	35	import { ALL_GUIDES } from "./c31_guides.ts";
@@ -557,6 +563,7 @@ ${rows}
557	563	snapshots: data.snapshots,
558	564	stability: data.stability,
559	565	unavailableNote,
	566	+ placeholderTests: data.placeholderTests,
560	567	}),
561	568	ogPath: "https://tdd.md/reports/live/tests",
562	569	});
@@ -666,6 +673,148 @@ ${rows}
666	673	return htmlResponse(html);
667	674	},
668	675
	676	+ "/sama/verify": async (req) => {
	677	+ const url = new URL(req.url);
	678	+ const repoArg = (url.searchParams.get("repo") ?? "").trim();
	679	+ const formMd = `# SAMA verify
	680	+
	681	+> Paste a public GitHub repo. tdd.md will run the four [SAMA disciplines](/sama) against the default branch — Sorted (lower never imports higher), Architecture (known layer prefixes), Modeled (sibling tests, types in c31_), Atomic* (~700-line split + placeholder-test detection) — and return a report. No clone, no token; just one tree-listing API call plus raw-content reads. Cached for an hour per repo.
	682	+
	683	+<form method="get" action="/sama/verify" class="sama-verify-form">
	684	+ <label>
	685	+ public GitHub repo:
	686	+ <input type="text" name="repo" placeholder="owner/name" required pattern="[^/\\s]+/[^/\\s]+" />
	687	+ </label>
	688	+ <button type="submit">verify</button>
	689	+</form>
	690	+
	691	+Try it on this site: [\`syntaxai/tdd.md\`](/sama/verify?repo=syntaxai/tdd.md) · or any public repo of your own.
	692	+
	693	+Limits: anonymous GitHub API quota is 60 requests/hour per IP. Each verify uses one tree-listing call; the rest of the work goes through raw.githubusercontent.com (uncapped). If the verifier returns "rate limit", come back later or use a token-authenticated proxy.
	694	+
	695	+[← /sama](/sama)
	696	+`;
	697	+
	698	+ if (!repoArg) {
	699	+ const html = await renderPage({
	700	+ title: "SAMA verify — tdd.md",
	701	+ description: "Paste a public GitHub repo, get the four SAMA disciplines verified mechanically: sorted (lower never imports higher), architecture (known layer prefixes), modeled (sibling tests), atomic (700-line + placeholder-test detection).",
	702	+ bodyMarkdown: formMd,
	703	+ ogPath: "https://tdd.md/sama/verify",
	704	+ active: "sama",
	705	+ });
	706	+ return htmlResponse(html);
	707	+ }
	708	+
	709	+ const m = /^([^\/\s]+)\/([^\/\s]+)$/.exec(repoArg);
	710	+ if (!m) {
	711	+ const html = await renderPage({
	712	+ title: "SAMA verify · bad input — tdd.md",
	713	+ description: "SAMA verify expects an owner/name repo identifier.",
	714	+ bodyMarkdown: `# SAMA verify\n\n> Couldn't parse \`${repoArg}\`. Use the form: \`owner/name\`.\n\n[← back](/sama/verify)\n`,
	715	+ ogPath: "https://tdd.md/sama/verify",
	716	+ active: "sama",
	717	+ noindex: true,
	718	+ });
	719	+ return htmlResponse(html, 400);
	720	+ }
	721	+
	722	+ const [, owner, name] = m;
	723	+ let report: SamaReport;
	724	+ try {
	725	+ // Dogfood short-circuit: tdd.md is a private repo, so the GitHub
	726	+ // API can't see it. When asked to verify ourselves, read the
	727	+ // source from the bundled `./src/` directory inside the container.
	728	+ // Same checks, same shape, same code path downstream.
	729	+ const isSelf = owner === LIVE_REPO_OWNER && name === LIVE_REPO_NAME;
	730	+ if (isSelf) {
	731	+ const { readdirSync, readFileSync } = await import("node:fs");
	732	+ const srcDir = "./src";
	733	+ const tsFiles = readdirSync(srcDir, { withFileTypes: true })
	734	+ .filter((e) => e.isFile() && e.name.endsWith(".ts"))
	735	+ .map((e) => e.name)
	736	+ .sort();
	737	+ const contents = new Map<string, string>();
	738	+ for (const f of tsFiles) {
	739	+ if (/^c\d{2}_/.test(f)) {
	740	+ contents.set(f, readFileSync(`${srcDir}/${f}`, "utf8"));
	741	+ }
	742	+ }
	743	+ report = verifySama({
	744	+ repoOwner: owner!,
	745	+ repoName: name!,
	746	+ defaultBranch: "main",
	747	+ srcPaths: tsFiles,
	748	+ contents,
	749	+ });
	750	+ } else {
	751	+ const tree = await fetchRepoTree(owner!, name!);
	752	+ const srcEntries = tree.entries
	753	+ .filter((e) => e.type === "blob" && e.path.startsWith("src/") && e.path.endsWith(".ts"))
	754	+ .slice(0, 200);
	755	+ const srcPaths = srcEntries.map((e) => e.path.slice("src/".length));
	756	+ const samaPaths = srcPaths.filter((p) => /^c\d{2}_/.test(p));
	757	+ const contents = new Map<string, string>();
	758	+ const fetches = await Promise.all(
	759	+ samaPaths.map(async (p) => [p, await fetchRepoRawFile(owner!, name!, tree.defaultBranch, `src/${p}`)] as const),
	760	+ );
	761	+ for (const [p, c] of fetches) {
	762	+ if (c !== null) contents.set(p, c);
	763	+ }
	764	+ report = verifySama({
	765	+ repoOwner: owner!,
	766	+ repoName: name!,
	767	+ defaultBranch: tree.defaultBranch,
	768	+ srcPaths,
	769	+ contents,
	770	+ });
	771	+ }
	772	+ } catch (e) {
	773	+ const msg = e instanceof Error ? e.message : String(e);
	774	+ const html = await renderPage({
	775	+ title: `SAMA verify · ${owner}/${name} · error — tdd.md`,
	776	+ description: `SAMA verify could not inspect ${owner}/${name}.`,
	777	+ bodyMarkdown: `# SAMA verify · \`${owner}/${name}\`\n\n> Couldn't fetch the repo: ${escape(msg)}\n\nMost common causes: the repo is private, the name is wrong, or you've hit GitHub's anonymous rate limit (60/hour). [← try another repo](/sama/verify)\n`,
	778	+ ogPath: `https://tdd.md/sama/verify?repo=${owner}/${name}`,
	779	+ active: "sama",
	780	+ noindex: true,
	781	+ });
	782	+ return htmlResponse(html, 502);
	783	+ }
	784	+
	785	+ const summary = report.overallPassed
	786	+ ? `> ✓ All four checks passed for [\`${report.repoSlug}\`](https://github.com/${report.repoSlug}) on \`${report.defaultBranch}\` (${report.samaFiles} SAMA files / ${report.testFiles} tests / ${report.totalSrcFiles} total in src/).`
	787	+ : `> ⚠ ${report.checks.filter((c) => !c.passed).length} of 4 checks failed for [\`${report.repoSlug}\`](https://github.com/${report.repoSlug}) on \`${report.defaultBranch}\`.`;
	788	+ const checkBlocks = report.checks
	789	+ .map((c) => {
	790	+ const status = c.passed ? "✓ pass" : `✗ ${c.violations.length} violation${c.violations.length === 1 ? "" : "s"}`;
	791	+ const violationsBlock = c.violations.length === 0
	792	+ ? ""
	793	+ : `\n\n${c.violations.slice(0, 20).map((v) => `- \`${escape(v.file)}\` — ${escape(v.detail)}`).join("\n")}${c.violations.length > 20 ? `\n- _...and ${c.violations.length - 20} more_` : ""}`;
	794	+ const noteBlock = c.note ? `\n\n_${escape(c.note)}_` : "";
	795	+ return `### ${c.letter} — ${c.property} · ${status}\n\nExamined ${c.examined} file${c.examined === 1 ? "" : "s"}.${violationsBlock}${noteBlock}`;
	796	+ })
	797	+ .join("\n\n");
	798	+ const reportMd = `# SAMA verify · \`${report.repoSlug}\`
	799	+
	800	+${summary}
	801	+
	802	+${checkBlocks}
	803	+
	804	+---
	805	+
	806	+[← verify another repo](/sama/verify) · [the four SAMA disciplines →](/sama) · [SAMA skill for your agent →](/sama/skill)
	807	+`;
	808	+ const html = await renderPage({
	809	+ title: `SAMA verify · ${report.repoSlug} — tdd.md`,
	810	+ description: `SAMA verification for ${report.repoSlug}: ${report.overallPassed ? "all four checks passed" : `${report.checks.filter((c) => !c.passed).length}/4 checks failed`}.`,
	811	+ bodyMarkdown: reportMd,
	812	+ ogPath: `https://tdd.md/sama/verify?repo=${report.repoSlug}`,
	813	+ active: "sama",
	814	+ });
	815	+ return htmlResponse(html);
	816	+ },
	817	+
669	818	"/sama": async () => {
670	819	const rows = ALL_SAMA
671	820	.map((d) => `\| [${d.letter} — ${d.title}](/sama/${d.slug}) \| ${d.rule} \|`)
@@ -703,6 +852,19 @@ curl -fsSL https://tdd.md/skills/sama.md -o ~/.claude/skills/sama.md
703	852
704	853	The skill is the same content as the four pages here, written in obra/superpowers SKILL.md format with frontmatter, an iron-rule statement, and a verification checklist your agent can run before merging. [Read it formatted →](/sama/skill) · [Raw markdown →](/skills/sama.md)
705	854
	855	+## verify any public repo
	856	+
	857	+Want to know whether a repo follows SAMA without reading its source? Paste the \`owner/name\` and tdd.md will run all four checks against the default branch — Sorted (the import-direction grep), Architecture (known layer prefixes), Modeled (sibling tests), Atomic (700-line + placeholder-test detection). Pass/fail per discipline, with violation lists. [verify a repo →](/sama/verify) · or try it on this site: [\`syntaxai/tdd.md\`](/sama/verify?repo=syntaxai/tdd.md).
	858	+
	859	+## the case behind it
	860	+
	861	+Two long-form pieces that argue why SAMA is shaped this way:
	862	+
	863	+- [The Claude Code harness postmortem read through TDD + SAMA](/blog/claude-code-harness-postmortem) — ThePaSch's r/ClaudeAI audit (40+ hidden reminders, 5 gag-order sites, 158 prompt versions in 11 days) read against the iron law and the verification grep. The harness is loud; the diff doesn't have to be.
	864	+- [Three patterns ten threads converge on](/blog/agentic-coding-corpus-three-patterns) — a six-month corpus of r/ClaudeAI, r/ClaudeCode, r/AgentsOfAI failure-mode threads. Per-pattern mitigation tables map each thread to the SAMA / iron-law rule that catches or prevents it.
	865	+
	866	+If you're reading these for the first time, the order to take them is harness postmortem → corpus → back here.
	867	+
706	868	## why these four together
707	869
708	870	Each property fixes a different failure mode:

modified src/c31_blog.ts +6 −0

@@ -12,6 +12,12 @@ export interface BlogEntry {
12	12	}
13	13
14	14	export const ALL_POSTS: BlogEntry[] = [
	15	+ {
	16	+ slug: "from-rules-to-checks",
	17	+ title: "From rules to checks: shipping what the corpus post promised",
	18	+ description: "The corpus post named three checks the discipline should run. This post is the receipt. Three slivers shipped: placeholder-test detection (live on /reports/live/tests), historical-commit testing via git worktree (opt-in via SAMA_HISTORY_DEPTH), and /sama/verify - a four-discipline report runnable against any public repo. The rules are now URLs you can hit.",
	19	+ date: "2026-05-09",
	20	+ },
15	21	{
16	22	slug: "agentic-coding-corpus-three-patterns",
17	23	title: "Three patterns ten threads converge on",

modified src/c32_real_tests.ts +5 −3

@@ -5,7 +5,7 @@
5	5	// Pure given the bundle + commits in (no I/O of its own beyond delegating
6	6	// to c14_github's bundle loader and commits fetcher).
7	7
8		-import { fetchRepoCommits, loadTestBundle } from "./c14_github.ts";
	8	+import { fetchRepoCommits, loadTestBundle, type PlaceholderTest } from "./c14_github.ts";
9	9	import type {
10	10	AgentReport,
11	11	TestFailure,
@@ -31,6 +31,7 @@ export interface LiveTestData {
31	31	runsCount: number;
32	32	ranAt: number \| null;
33	33	headSha: string \| null;
	34	+ placeholderTests: PlaceholderTest[];
34	35	}
35	36
36	37	export const buildLiveTestData = async (
@@ -39,12 +40,12 @@ export const buildLiveTestData = async (
39	40	): Promise<LiveTestData> => {
40	41	const bundle = await loadTestBundle(repoOwner, repoName);
41	42	if (!bundle \|\| bundle.runs.length === 0) {
42		- return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null };
	43	+ return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null, placeholderTests: [] };
43	44	}
44	45	const repoSlug = `${repoOwner}/${repoName}`;
45	46	const latest = bundle.runs[0];
46	47	if (!latest) {
47		- return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null };
	48	+ return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null, placeholderTests: [] };
48	49	}
49	50
50	51	// For "since" we want the oldest run that has this test as failing.
@@ -136,5 +137,6 @@ export const buildLiveTestData = async (
136	137	runsCount: bundle.runs.length,
137	138	ranAt: latest.ranAt,
138	139	headSha: latest.sha,
	140	+ placeholderTests: latest.placeholderTests ?? [],
139	141	};
140	142	};

added src/c32_sama_verify.ts +272 −0

@@ -0,0 +1,272 @@
	1	+// c32 — logic: pure SAMA verification given a repo's file tree and the
	2	+// contents of every cXX_*.ts file. Drives /sama/verify.
	3	+//
	4	+// Verifier is intentionally strict: a check passes iff there is zero
	5	+// evidence of violation. The four properties (S/A/M/A) each become one
	6	+// callable, and the top-level `verifySama(...)` runs them all and
	7	+// returns a SamaReport.
	8	+
	9	+export interface SamaViolation {
	10	+ file: string;
	11	+ detail: string;
	12	+}
	13	+
	14	+export interface SamaCheckResult {
	15	+ letter: "S" \| "A" \| "M" \| "A";
	16	+ property: "Sorted" \| "Architecture" \| "Modeled" \| "Atomic";
	17	+ passed: boolean;
	18	+ examined: number;
	19	+ violations: SamaViolation[];
	20	+ note?: string;
	21	+}
	22	+
	23	+export interface SamaReport {
	24	+ repoSlug: string;
	25	+ defaultBranch: string;
	26	+ totalSrcFiles: number;
	27	+ samaFiles: number;
	28	+ testFiles: number;
	29	+ checks: SamaCheckResult[];
	30	+ overallPassed: boolean;
	31	+ generatedAt: number;
	32	+}
	33	+
	34	+export interface SamaVerifyInput {
	35	+ repoOwner: string;
	36	+ repoName: string;
	37	+ defaultBranch: string;
	38	+ // src-relative paths, e.g. "c21_app.ts", "c31_blog.ts", "c32_session.test.ts"
	39	+ srcPaths: string[];
	40	+ // file path -> content. Contents only required for cXX_*.ts files
	41	+ // and *.test.ts files.
	42	+ contents: Map<string, string>;
	43	+}
	44	+
	45	+const SAMA_PREFIX = /^c(\d{2})_/;
	46	+
	47	+const isSamaFile = (p: string): boolean => SAMA_PREFIX.test(p) && p.endsWith(".ts");
	48	+const isTestFile = (p: string): boolean => p.endsWith(".test.ts");
	49	+
	50	+const layerOf = (filename: string): number \| null => {
	51	+ const m = SAMA_PREFIX.exec(filename);
	52	+ if (!m) return null;
	53	+ return parseInt(m[1] ?? "0", 10);
	54	+};
	55	+
	56	+// Pull import targets out of a TypeScript source. Recognizes both
	57	+// static `import ... from "./x.ts"` and dynamic `import("./x.ts")`.
	58	+// We only care about relative imports (the cross-layer ones).
	59	+const collectRelativeImports = (source: string): string[] => {
	60	+ const out: string[] = [];
	61	+ const staticRe = /\bfrom\s+["']\s*(\.\/[^"']+)["']/g;
	62	+ const dynRe = /\bimport\s\(\s["']\s*(\.\/[^"']+)["']/g;
	63	+ let m: RegExpExecArray \| null;
	64	+ while ((m = staticRe.exec(source)) !== null) if (m[1]) out.push(m[1]);
	65	+ while ((m = dynRe.exec(source)) !== null) if (m[1]) out.push(m[1]);
	66	+ return out;
	67	+};
	68	+
	69	+const importTargetFilename = (importPath: string): string => {
	70	+ // "./c14_github.ts" -> "c14_github.ts"
	71	+ return importPath.replace(/^\.\//, "");
	72	+};
	73	+
	74	+// S — Sorted. The rule, as practiced: foundation, data and logic layers
	75	+// (c1, c3) don't import UI (c5*+). c21 (handlers/composers) is the
	76	+// orchestration layer and is allowed to import anything; c51 (UI) is
	77	+// allowed to import models (c3*) for the data it renders. A strict
	78	+// "lower never imports higher" reading would forbid c21 → c31, which
	79	+// is the natural pattern (handler composes model). The actual
	80	+// constraint is one-directional: UI sits at the edge, never below.
	81	+const checkSorted = (input: SamaVerifyInput): SamaCheckResult => {
	82	+ const violations: SamaViolation[] = [];
	83	+ let examined = 0;
	84	+ for (const path of input.srcPaths) {
	85	+ if (!isSamaFile(path)) continue;
	86	+ examined++;
	87	+ const m = SAMA_PREFIX.exec(path);
	88	+ const prefix = m?.[1] ?? "";
	89	+ // Skip c2* (handlers, allowed to depend on anything) and c5*+ (UI,
	90	+ // its outbound deps are governed by other rules, not this one).
	91	+ if (!/^[13]/.test(prefix)) continue;
	92	+ const content = input.contents.get(path);
	93	+ if (!content) continue;
	94	+ for (const rawImport of collectRelativeImports(content)) {
	95	+ const target = importTargetFilename(rawImport);
	96	+ const targetMatch = SAMA_PREFIX.exec(target);
	97	+ const targetPrefix = targetMatch?.[1] ?? "";
	98	+ if (!targetPrefix) continue;
	99	+ if (/^[59]/.test(targetPrefix)) {
	100	+ violations.push({
	101	+ file: path,
	102	+ detail: `imports \`${target}\` (UI layer c${targetPrefix}_) from a non-UI/non-handler file (c${prefix}_) — UI sits at the edge, foundation/data/logic must not depend on it`,
	103	+ });
	104	+ }
	105	+ }
	106	+ }
	107	+ return {
	108	+ letter: "S",
	109	+ property: "Sorted",
	110	+ passed: violations.length === 0,
	111	+ examined,
	112	+ violations,
	113	+ note: examined === 0
	114	+ ? "no cXX_*.ts files found in the project — the convention isn't applied here"
	115	+ : undefined,
	116	+ };
	117	+};
	118	+
	119	+// A — Architecture. Each prefix is a known layer; flag unknown prefixes.
	120	+const KNOWN_LAYERS = new Set(["11", "13", "14", "21", "31", "32", "51"]);
	121	+const checkArchitecture = (input: SamaVerifyInput): SamaCheckResult => {
	122	+ const violations: SamaViolation[] = [];
	123	+ let examined = 0;
	124	+ for (const path of input.srcPaths) {
	125	+ if (!isSamaFile(path)) continue;
	126	+ examined++;
	127	+ const m = SAMA_PREFIX.exec(path);
	128	+ const prefix = m?.[1] ?? "";
	129	+ if (!KNOWN_LAYERS.has(prefix)) {
	130	+ violations.push({
	131	+ file: path,
	132	+ detail: `unknown layer prefix \`c${prefix}_\` (known: c11, c13, c14, c21, c31, c32, c51)`,
	133	+ });
	134	+ }
	135	+ }
	136	+ return {
	137	+ letter: "A",
	138	+ property: "Architecture",
	139	+ passed: violations.length === 0,
	140	+ examined,
	141	+ violations,
	142	+ };
	143	+};
	144	+
	145	+// M — Modeled. Tests live next to source. Every cXX_<name>.ts (non-data)
	146	+// should have a sibling cXX_<name>.test.ts. Pure data files (registries
	147	+// like c31_blog.ts that are just an exported array) often legitimately
	148	+// have no behaviour to test, so we soften this check by requiring a
	149	+// sibling for c32_*.ts (logic) at minimum, and reporting a list of c31
	150	+// files without siblings as informational rather than hard violations.
	151	+const checkModeled = (input: SamaVerifyInput): SamaCheckResult => {
	152	+ const violations: SamaViolation[] = [];
	153	+ const informational: SamaViolation[] = [];
	154	+ let examined = 0;
	155	+ const present = new Set(input.srcPaths);
	156	+ for (const path of input.srcPaths) {
	157	+ if (!isSamaFile(path) \|\| isTestFile(path)) continue;
	158	+ examined++;
	159	+ const sibling = path.replace(/\.ts$/, ".test.ts");
	160	+ if (present.has(sibling)) continue;
	161	+ const layer = layerOf(path);
	162	+ if (layer === 32) {
	163	+ violations.push({ file: path, detail: `no sibling test file at \`${sibling}\`` });
	164	+ } else if (layer === 31) {
	165	+ informational.push({ file: path, detail: `no sibling test (often fine for pure data registries; flag if logic accumulates)` });
	166	+ }
	167	+ }
	168	+ const passed = violations.length === 0;
	169	+ const note = informational.length > 0
	170	+ ? `${informational.length} c31_* file${informational.length === 1 ? "" : "s"} without a sibling test — usually fine for pure-data registries, flag if logic accumulates: ${informational.map((v) => v.file).join(", ")}`
	171	+ : undefined;
	172	+ return {
	173	+ letter: "M",
	174	+ property: "Modeled",
	175	+ passed,
	176	+ examined,
	177	+ violations,
	178	+ note,
	179	+ };
	180	+};
	181	+
	182	+// A — Atomic. ~700-line split rule. Flag any cXX_*.ts over 700 lines.
	183	+// Also flag placeholder tests (zero expect() calls in test body) as
	184	+// part of the same pass — they're a structural violation of the
	185	+// testing surface that Atomic owns.
	186	+const findPlaceholderTestsLite = (file: string, content: string): SamaViolation[] => {
	187	+ const out: SamaViolation[] = [];
	188	+ const re = /\b(test\|it)\s$\s(["'`])((?:\\.\|(?!\2).))\2\s,\s(?:async\s+)?(?:\([^)]$\|[^=()]?)\s=>\s*\{/g;
	189	+ let m: RegExpExecArray \| null;
	190	+ while ((m = re.exec(content)) !== null) {
	191	+ const name = m[3] ?? "";
	192	+ const startBrace = re.lastIndex - 1;
	193	+ let depth = 1;
	194	+ let i = startBrace + 1;
	195	+ let inString: string \| null = null;
	196	+ while (i < content.length && depth > 0) {
	197	+ const c = content[i];
	198	+ if (inString !== null) {
	199	+ if (c === "\\") { i += 2; continue; }
	200	+ if (c === inString) inString = null;
	201	+ } else {
	202	+ if (c === '"' \|\| c === "'" \|\| c === "`") inString = c;
	203	+ else if (c === "/" && content[i + 1] === "/") {
	204	+ while (i < content.length && content[i] !== "\n") i++;
	205	+ continue;
	206	+ } else if (c === "/" && content[i + 1] === "*") {
	207	+ i += 2;
	208	+ while (i < content.length - 1 && !(content[i] === "*" && content[i + 1] === "/")) i++;
	209	+ i += 2;
	210	+ continue;
	211	+ } else if (c === "{") depth++;
	212	+ else if (c === "}") depth--;
	213	+ }
	214	+ i++;
	215	+ }
	216	+ const body = content.slice(startBrace + 1, i - 1);
	217	+ const expectCount = (body.match(/\bexpect\s*\(/g) ?? []).length;
	218	+ if (expectCount === 0) {
	219	+ out.push({ file, detail: `placeholder test \`${name}\` — zero \`expect()\` calls` });
	220	+ }
	221	+ }
	222	+ return out;
	223	+};
	224	+
	225	+const checkAtomic = (input: SamaVerifyInput): SamaCheckResult => {
	226	+ const violations: SamaViolation[] = [];
	227	+ let examined = 0;
	228	+ for (const path of input.srcPaths) {
	229	+ if (!isSamaFile(path)) continue;
	230	+ examined++;
	231	+ const content = input.contents.get(path);
	232	+ if (!content) continue;
	233	+ const lineCount = content.split("\n").length;
	234	+ if (lineCount > 700) {
	235	+ violations.push({
	236	+ file: path,
	237	+ detail: `${lineCount} lines (over the 700-line split threshold — split per UI/data domain)`,
	238	+ });
	239	+ }
	240	+ if (isTestFile(path)) {
	241	+ violations.push(...findPlaceholderTestsLite(path, content));
	242	+ }
	243	+ }
	244	+ return {
	245	+ letter: "A",
	246	+ property: "Atomic",
	247	+ passed: violations.length === 0,
	248	+ examined,
	249	+ violations,
	250	+ };
	251	+};
	252	+
	253	+export const verifySama = (input: SamaVerifyInput): SamaReport => {
	254	+ const samaPaths = input.srcPaths.filter(isSamaFile);
	255	+ const testPaths = samaPaths.filter(isTestFile);
	256	+ const checks = [
	257	+ checkSorted(input),
	258	+ checkArchitecture(input),
	259	+ checkModeled(input),
	260	+ checkAtomic(input),
	261	+ ];
	262	+ return {
	263	+ repoSlug: `${input.repoOwner}/${input.repoName}`,
	264	+ defaultBranch: input.defaultBranch,
	265	+ totalSrcFiles: input.srcPaths.length,
	266	+ samaFiles: samaPaths.length,
	267	+ testFiles: testPaths.length,
	268	+ checks,
	269	+ overallPassed: checks.every((c) => c.passed),
	270	+ generatedAt: Date.now(),
	271	+ };
	272	+};

modified src/c51_render_reports.ts +20 −1

@@ -38,6 +38,9 @@ export interface TestsOverviewContext {
38	38	// When the runner sliver isn't wired (live mode, today), pass a
39	39	// placeholder note instead of the snapshot+stability sections.
40	40	unavailableNote?: string;
	41	+ // Placeholder-test detection: tests with zero `expect()` calls in
	42	+ // their body. Surfaces the failure mode from r/ClaudeCode 1qix264.
	43	+ placeholderTests?: { name: string; file: string; reason: string }[];
41	44	}
42	45
43	46	const trendArrow = (delta: number): { glyph: string; cls: string } =>
@@ -275,6 +278,20 @@ ${ctx.bannerHtml}
275	278	const failing = ctx.snapshots.reduce((s, r) => s + r.failing, 0);
276	279	const snapshots = ctx.snapshots.map(snapshotBlock).join("\n");
277	280	const stabRows = ctx.stability.map(stabilityRow).join("\n");
	281	+ const placeholders = ctx.placeholderTests ?? [];
	282	+ const placeholderBlock = placeholders.length === 0
	283	+ ? `## placeholder tests
	284	+
	285	+> No placeholder tests detected at this snapshot. A placeholder is a test whose body contains zero \`expect()\` calls — covered in [the corpus post](/blog/agentic-coding-corpus-three-patterns) as the failure mode from r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass rate"). Detection runs on every deploy.
	286	+`
	287	+ : `## placeholder tests · ⚠ ${placeholders.length} flagged
	288	+
	289	+> A placeholder test is one whose body contains zero \`expect()\` calls — empty body, comment-only stub, or string-literal body. Covered in [the corpus post](/blog/agentic-coding-corpus-three-patterns) as the failure mode from r/ClaudeCode 1qix264. The judge would refuse a merge that includes any of these.
	290	+
	291	+\| test \| file \| reason \|
	292	+\|---\|---\|---\|
	293	+${placeholders.map((p) => `\| ${escape(p.name)} \| \`${escape(p.file)}\` \| ${escape(p.reason)} \|`).join("\n")}
	294	+`;
278	295	return `# tests overview
279	296
280	297	${ctx.bannerHtml}
@@ -287,7 +304,9 @@ ${ctx.bannerHtml}
287	304	${snapshots}
288	305	</div>
289	306
290		-Total: ${total.toLocaleString()} tests · <span class="green">${passing.toLocaleString()} passing</span> · <span class="${failing > 0 ? "red" : "muted"}">${failing.toLocaleString()} failing</span>.
	307	+Total: ${total.toLocaleString()} tests · <span class="green">${passing.toLocaleString()} passing</span> · <span class="${failing > 0 ? "red" : "muted"}">${failing.toLocaleString()} failing</span>${placeholders.length > 0 ? ` · <span class="red">${placeholders.length} placeholder ⚠</span>` : ""}.
	308	+
	309	+${placeholderBlock}
291	310
292	311	## test stability · ${ctx.period}
293	312

raw .diff