2868f48b6d44ae47282d3d2dfb0f4d7e161b3ecd diff --git a/content/blog/agentic-coding-corpus-three-patterns.md b/content/blog/agentic-coding-corpus-three-patterns.md index e3db387285d54a2a259f9280c92378b8a08e95ca..b7a0159bf8f33e3a224217100a9abab730e9c0c3 100644 --- a/content/blog/agentic-coding-corpus-three-patterns.md +++ b/content/blog/agentic-coding-corpus-three-patterns.md @@ -184,4 +184,4 @@ One thread is an audit. Ten threads are a pattern. The corpus shows three things Plus the original one-thread audit: [ThePaSch — Claude Code has big problems and the post-mortem is not enough](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/) (325↑, 200+ comments). Treat this row as the eleventh entry. -[Read the previous post →](/blog/claude-code-harness-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [back to the blog](/blog) +[Read the previous post →](/blog/claude-code-harness-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify your repo →](/sama/verify) · [back to the blog](/blog) diff --git a/content/blog/claude-code-harness-postmortem.md b/content/blog/claude-code-harness-postmortem.md index 0f98ff5c173f8cbf97ab733718bfd97315108275..ef57669c8ba8e3764039e8e78f7331c6ab1487f1 100644 --- a/content/blog/claude-code-harness-postmortem.md +++ b/content/blog/claude-code-harness-postmortem.md @@ -123,4 +123,4 @@ None of this is in the user's hands. But the takeaway underneath is: while Anthr The harness is loud. The diff doesn't have to be. TDD's iron law and SAMA's three falsifiable rules survive every reminder-bombardment, gag-order, importance-inflation, and contradictory-instruction the OP catalogues, because they are enforced *outside the agent's context window* — in the commit log and the file tree. They do not fix Anthropic's prompt sprawl, and they do not refund the tokens. What they do is make the work the agent ships externally verifiable, regardless of what the agent was told on the way there. -[Read the original Reddit thread →](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/claude_code_has_big_problems_and_the_postmortem/) · [Anthropic's April-23 postmortem →](https://www.anthropic.com/engineering/april-23-postmortem) · [the four SAMA disciplines →](/sama) · [back to the blog](/blog) +[Read the original Reddit thread →](https://www.reddit.com/r/ClaudeAI/comments/1strcoa/claude_code_has_big_problems_and_the_postmortem/) · [Anthropic's April-23 postmortem →](https://www.anthropic.com/engineering/april-23-postmortem) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify your repo →](/sama/verify) · [back to the blog](/blog) diff --git a/content/blog/from-rules-to-checks.md b/content/blog/from-rules-to-checks.md new file mode 100644 index 0000000000000000000000000000000000000000..5f8fec660e6d7e85641e692a8cfc28104e75a7d9 --- /dev/null +++ b/content/blog/from-rules-to-checks.md @@ -0,0 +1,92 @@ +# From rules to checks: shipping what the corpus post promised + +> The [corpus post](/blog/agentic-coding-corpus-three-patterns) closed with a promise: *"three of the ten threads describe failures only an actual test run can catch"* — and named which checks would have caught which failure modes (some today, some with a small extension, some only with a sandbox runner). This post is the receipt. Three of those checks now ship: **placeholder-test detection** (the *"one-evening sliver"*), **historical-commit testing via git worktree** (the *"next slice on the roadmap"*), and **`/sama/verify`** (mechanical layer grep + sibling-test + line-count + placeholder check, runnable against any public repo). + +## why this post is short + +The two previous posts made an argument. This one documents an outcome. If the argument was right, the outcome should be small and obvious: the rules become checks, the checks become routes, the routes catch the failure modes the corpus catalogued. Here's what that looks like in practice. + +## 1 · placeholder-test detection (caught today) + +Failure mode: r/ClaudeCode `1qix264`, *"Claude wrote 90 placeholder tests and reported 100% pass rate"*. The corpus post said: + +> *"empty assertion bodies — zero `expect()` calls, string-literal bodies, single-line `// TODO` stubs — are AST-checkable. The test bundle already lives in `content/git-history/syntaxai__tdd.md__tests.json`; an empty-body check is a one-evening sliver."* + +The check is a regex-based brace walker that extracts every `test(...)` and `it(...)` body, counts `expect(` occurrences, and flags zero-count bodies. It runs at deploy time as part of the existing `snapshot-tests.ts` script and writes its findings to the bundle as `placeholderTests: { name, file, reason }[]`. The runtime renderer surfaces them on [`/reports/live/tests`](/reports/live/tests): + +- Zero placeholders → a small "no placeholder tests detected at this snapshot" note explaining what the check looks for. +- One or more → a flagged section with a per-test table: name, file, reason (*"no expect() calls"*, *"empty test body"*, *"comment-only stub"*). + +It catches the most common shape of `1qix264` directly (`expect()` count is zero). It misses theoretical ones (custom assertion helpers that don't go through `expect`); the regex's blast radius is the real failures, not every imaginable one. + +## 2 · historical-commit testing (the sandbox runner sliver) + +Failure mode: r/ClaudeCode `1rug14a`, *"Claude wrote Playwright tests that secretly patched the app at runtime"*. This is the failure that the previous reporting layer couldn't catch — the diff looks fine, the test passes in the agent's terminal, the test passes in the deploy-time bundle too if the bundle only ever ran HEAD. Catching this needs the same test to run *somewhere it's never run before*, against the actual code at that SHA. + +The new mode: `SAMA_HISTORY_DEPTH=N` in the deploy environment makes the snapshot script also test the last *N* commits that aren't already in the bundle. Mechanically: + +```ts +// scripts/p620/snapshot-tests.ts (excerpt) +git worktree add --detach /tmp/tdd-md-wt- +ln -s "$REPO_ROOT/node_modules" "$WORKTREE/node_modules" +bun test --reporter=junit --reporter-outfile=/tmp/junit-.xml +git worktree remove --force /tmp/tdd-md-wt- +``` + +Each historical run produces the same `TestRunRecord` shape as a HEAD run, gets appended to the bundle keyed by SHA, and feeds the existing stability table. Two consequences: + +- **Stability data builds 10× faster.** A first `SAMA_HISTORY_DEPTH=10` deploy backfills ten runs in one go instead of waiting ten deploys. +- **Runtime-patching becomes detectable in principle.** A test that passed in the agent's session AND in the original deploy run, but fails when re-run from a clean worktree at the same SHA, is the smoking-gun shape of `1rug14a`. We're not yet wired to flag the discrepancy as a separate failure mode (that's the next sliver), but the data to compare *is now in the bundle*. + +The default is still `HISTORY_DEPTH=0` (HEAD-only). Opt-in keeps deploy time bounded; flipping the default to `5` or `10` is a one-line change once we want it on by default. + +## 3 · `/sama/verify` (mechanical check for any public repo) + +The corpus post argued: *"don't write a CLAUDE.md instruction the harness can overrule. Write a structural check the harness doesn't get to know about."* That argument is hollow if the structural checks aren't actually runnable. The new route closes the loop: + +**[/sama/verify](/sama/verify)** — paste a public GitHub repo, get a four-discipline report. The mechanics: + +1. One GitHub API call to `git/trees/?recursive=1` resolves the file list. +2. Every `src/cXX_*.ts` file is fetched via `raw.githubusercontent.com` (no API rate limit, no token). +3. Pure logic in [`c32_sama_verify.ts`](https://github.com/syntaxai/tdd.md/blob/main/src/c32_sama_verify.ts) runs the four checks: + - **S — Sorted**: every relative `from "./..."` import in a `cXX_*.ts` is parsed; flag if the target's prefix is higher than the source's. + - **A — Architecture**: every `cXX_` prefix is matched against the known set (`c11`, `c13`, `c14`, `c21`, `c31`, `c32`, `c51`); unknown ones flagged. + - **M — Modeled**: every `cXX_.ts` (non-test) is checked for a sibling `cXX_.test.ts`. Hard-fails for `c32_*` (logic); informational for `c31_*` (often pure-data registries). + - **A — Atomic**: line count over 700 → flagged. Test files → run the same placeholder check from sliver #1. + +Output: pass/fail per discipline, with up to 20 violations per check listed (`file` + `detail`). Cached for an hour per repo. + +Try it on this site: [`/sama/verify?repo=syntaxai/tdd.md`](/sama/verify?repo=syntaxai/tdd.md). And here's the dogfood result, honestly: + +| check | tdd.md self-verify result | +|---|---| +| S — Sorted | ✓ pass — no UI dependency leaks into foundation/data/logic | +| A — Architecture | ✓ pass — every prefix is in the known set | +| M — Modeled | ✗ 5 violations — `c32_judge.ts`, `c32_session.ts`, `c32_real_reports.ts`, `c32_real_tests.ts`, `c32_sama_verify.ts` lack sibling test files | +| A — Atomic | ✗ 1 violation — `c21_app.ts` is 1066 lines (over the 700-line split threshold) | + +Two of four fail, and they're real. Five `c32_*` logic files — including `c32_sama_verify.ts`, the file that *runs the verification* — don't have sibling tests yet, and the route dispatcher has grown past the atomic threshold and now needs a per-domain split. Both findings were caught by the tool we just shipped, against the codebase we just shipped it from. That's the dogfood story: not "everything passes" but "the tool catches real things in real code, including its own". Both are on the very next slice of the roadmap. + +## what this changes about the case + +The argument has now happened in three layers: + +1. **The harness postmortem post** said: structural rules survive harness chaos because they're enforced outside the agent's context window. +2. **The corpus post** said: ten threads prove the failure modes are systematic, here are the rules that catch each, here's what we catch and what we don't yet. +3. **This post** says: the rules are now checks, the checks are now URLs you can hit, and you can verify the case against any public repo *including this one*. + +The leftover work — flagging a runtime-patching discrepancy as a distinct failure mode, hidden-test verification on real-project commits, AST-level placeholder detection beyond the regex — is in the open. It's smaller than what shipped this week. + +## tl;dr + +The two previous posts made a case from text. This one ships the checks the case promised: + +| sliver | route | catches | status | +|---|---|---|---| +| placeholder detection | [/reports/live/tests](/reports/live/tests) | r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass") | live | +| historical-commit testing | snapshot script with `SAMA_HISTORY_DEPTH=N` | runtime-patching SHAs ([groundwork for 1rug14a](/blog/agentic-coding-corpus-three-patterns)) | opt-in, default 0 | +| `/sama/verify` | [/sama/verify](/sama/verify) | layer violations, missing sibling tests, oversized files, placeholder tests, in any public repo | live | + +If the discipline is real, you should be able to point it at a repo and have it report findings. Now you can. + +[← back to the blog](/blog) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify a repo →](/sama/verify) diff --git a/content/sama/skill.md b/content/sama/skill.md index 3d8bfedcd436cf78f34669033f082f9dccdd3ffd..542fe8288efa5d58496a5bb90c2dc3c12554e275 100644 --- a/content/sama/skill.md +++ b/content/sama/skill.md @@ -30,16 +30,18 @@ Thinking "this one helper doesn't need a prefix"? Stop. That's how the rule erod ## The Iron Rule ``` -LOWER LAYERS NEVER IMPORT FROM HIGHER LAYERS +UI SITS AT THE EDGE — FOUNDATION, DATA AND LOGIC LAYERS NEVER DEPEND ON UI ``` +Foundation/data/logic (`c1*`, `c3*`) must never import UI (`c5*+`). Handlers (`c21`) are the orchestration layer and may compose UI; UI itself may read models for the data it renders. + Verify with one grep: ```bash -grep -rE 'from "\./c[5-9]' src/c1*.ts src/c2*.ts src/c3*.ts +grep -rE 'from "\./c[5-9]' src/c1*.ts src/c3*.ts ``` -Empty output = rule holds. Any output = a lower layer reaches into a higher one. Either move the function or rename the file. Do not "fix" the violation by deleting the import without understanding what broke. +Empty output = rule holds. Any output = a UI dependency has leaked into foundation/data/logic. Move the function or rename the file. Do not "fix" the violation by deleting the import without understanding what broke. ## The Four Letters @@ -184,3 +186,6 @@ Or, if the live and demo body builders are mostly the same shape parameterised b - The four disciplines with examples: https://tdd.md/sama - Why SAMA compounds with TDD and token-discipline: https://tdd.md/blog/three-constraints-agentic-coding +- The Reddit harness postmortem this skill is a response to: https://tdd.md/blog/claude-code-harness-postmortem +- Ten more threads, three patterns, mitigation tables: https://tdd.md/blog/agentic-coding-corpus-three-patterns +- Mechanically verify any public repo against these rules: https://tdd.md/sama/verify diff --git a/content/sama/sorted.md b/content/sama/sorted.md index 2fb58ad10fd627d56c79c45ad68dbf11fa204284..f6bdc362842c148f4e451b0e89149ab6340332b2 100644 --- a/content/sama/sorted.md +++ b/content/sama/sorted.md @@ -1,6 +1,6 @@ # S — Sorted -> **Rule:** alphabetical sort = dependency direction. Lower-numbered layers never import from higher-numbered ones. +> **Rule:** alphabetical sort = dependency direction. UI sits at the edge — foundation, data and logic layers (`c1*`, `c3*`) never depend on UI (`c5*+`). Handlers (`c21`) are the orchestration layer and may import anything. The first property of SAMA. Run `ls src/` and you have the architecture diagram. There is no separate "where does the data flow?" document because the file system answers it. @@ -16,10 +16,10 @@ Two reasons: Run this from the repo root: ```bash -grep -rE 'from "\./c[5-9]' src/c1*.ts src/c2*.ts src/c3*.ts +grep -rE 'from "\./c[5-9]' src/c1*.ts src/c3*.ts ``` -If the output is empty, the rule holds. If anything appears, you have a higher layer leaking into a lower one — fix the import or move the file. +If the output is empty, the rule holds. If anything appears, you have a UI dependency leaking into a foundation, data or logic layer — fix the import or move the file. (Note: `c2*` files are intentionally excluded — handlers compose UI calls, so `c21` → `c51` is the normal pattern.) This is the single load-bearing test for the *Sorted* property. Wire it into CI and forget about it. diff --git a/scripts/p620/snapshot-tests.ts b/scripts/p620/snapshot-tests.ts index 099933d5f6ab31beaaf9545cd4ec2ac9cb5e6727..588f22994aa53b2ace497861d0ad52164d8f5327 100644 --- a/scripts/p620/snapshot-tests.ts +++ b/scripts/p620/snapshot-tests.ts @@ -1,18 +1,21 @@ #!/usr/bin/env bun -// Run `bun test` on the current HEAD and append the result to a -// per-repo bundle alongside the git-history snapshot. The container -// reads this bundle at runtime to render /reports/live/tests for the -// (private) syntaxai/tdd.md repo without needing a runtime sandbox. +// Run `bun test` on the current HEAD (and optionally the last N +// historical commits) and append the results to a per-repo bundle +// alongside the git-history snapshot. The container reads this bundle +// at runtime to render /reports/live/tests for the (private) +// syntaxai/tdd.md repo without needing a runtime sandbox. // -// Strategy: HEAD-only per deploy. The bundle accumulates one run per -// deploy (capped at 50), so stability data builds organically over -// time. No git-worktree gymnastics, no per-commit bun-install. +// HEAD mode (default): one new run per deploy, fast, no worktree. +// History mode (SAMA_HISTORY_DEPTH=N): also runs the last N commits +// that aren't already in the bundle, via git worktree + bun install +// per SHA. Slower (~5-10s/commit) but builds real stability data +// instead of waiting for it to accumulate organically. // // Output: content/git-history/____tests.json // Schema: { owner, name, runs: TestRunRecord[] } — newest first. import { spawnSync } from "node:child_process"; -import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs"; +import { existsSync, mkdirSync, readFileSync, rmSync, writeFileSync } from "node:fs"; import { resolve } from "node:path"; const REPO_ROOT = resolve(import.meta.dir, "..", ".."); @@ -20,6 +23,7 @@ const OWNER = "syntaxai"; const NAME = "tdd.md"; const MAX_RUNS = 50; const JUNIT_OUT = "/tmp/tdd-md-test-junit.xml"; +const HISTORY_DEPTH = parseInt(process.env.SAMA_HISTORY_DEPTH ?? "0", 10); const sh = (cmd: string, args: string[]): string => { const r = spawnSync(cmd, args, { cwd: REPO_ROOT, encoding: "utf8" }); @@ -81,6 +85,82 @@ const passing = tests.filter((t) => t.status === "pass").length; const failing = tests.length - passing; const totalDurationMs = tests.reduce((s, t) => s + t.durationMs, 0); +interface PlaceholderTest { + name: string; + file: string; + reason: string; +} + +// Placeholder detection. Catches the failure mode from r/ClaudeCode +// post 1qix264 ("90 placeholder tests, 100% pass rate"): tests with +// zero `expect(` calls in their body are flagged. Regex-based brace +// matching — full AST is overkill for the one structural property we +// care about. Limitations: misses tests that delegate to a custom +// assertion helper or pass through a subroutine. Acceptable for v1; +// the catch is the common failure shape, not every theoretical one. +const findPlaceholderTests = (testFile: string, content: string): PlaceholderTest[] => { + const out: PlaceholderTest[] = []; + const re = /\b(test|it)\s*\(\s*(["'`])((?:\\.|(?!\2).)*)\2\s*,\s*(?:async\s+)?(?:\([^)]*\)|[^=()]*?)\s*=>\s*\{/g; + let m: RegExpExecArray | null; + while ((m = re.exec(content)) !== null) { + const name = m[3] ?? ""; + const startBrace = re.lastIndex - 1; + let depth = 1; + let i = startBrace + 1; + let inString: string | null = null; + while (i < content.length && depth > 0) { + const c = content[i]; + if (inString !== null) { + if (c === "\\") { i += 2; continue; } + if (c === inString) inString = null; + } else { + if (c === '"' || c === "'" || c === "`") inString = c; + else if (c === "/" && content[i + 1] === "/") { + // line comment + while (i < content.length && content[i] !== "\n") i++; + continue; + } + else if (c === "/" && content[i + 1] === "*") { + // block comment + i += 2; + while (i < content.length - 1 && !(content[i] === "*" && content[i + 1] === "/")) i++; + i += 2; + continue; + } + else if (c === "{") depth++; + else if (c === "}") depth--; + } + i++; + } + const body = content.slice(startBrace + 1, i - 1); + const expectCount = (body.match(/\bexpect\s*\(/g) ?? []).length; + if (expectCount === 0) { + const trimmedLen = body.replace(/\s+/g, "").length; + const reason = trimmedLen === 0 + ? "empty test body" + : trimmedLen < 20 && /^\s*\/\//.test(body.trim()) + ? "comment-only stub" + : "no expect() calls in test body"; + out.push({ name, file: testFile, reason }); + } + } + return out; +}; + +const detectPlaceholders = (testFiles: string[]): PlaceholderTest[] => { + const out: PlaceholderTest[] = []; + for (const f of testFiles) { + const abs = resolve(REPO_ROOT, f); + if (!existsSync(abs)) continue; + const content = readFileSync(abs, "utf8"); + out.push(...findPlaceholderTests(f, content)); + } + return out; +}; + +const uniqueTestFiles = Array.from(new Set(tests.map((t) => t.file).filter(Boolean))); +const placeholderTests = detectPlaceholders(uniqueTestFiles); + interface TestRunRecord { sha: string; branch: string; @@ -90,6 +170,7 @@ interface TestRunRecord { failing: number; durationMs: number; tests: TestRecord[]; + placeholderTests: PlaceholderTest[]; } interface TestBundle { @@ -121,9 +202,116 @@ if (bundle.runs.some((r) => r.sha === head)) { failing, durationMs: totalDurationMs, tests, + placeholderTests, }); bundle.runs = bundle.runs.slice(0, MAX_RUNS); mkdirSync(resolve(REPO_ROOT, "content", "git-history"), { recursive: true }); writeFileSync(bundlePath, JSON.stringify(bundle, null, 2)); - console.log(`✓ tests at ${head.slice(0, 7)} (${branch}): ${passing}/${tests.length} pass, ${failing} fail → bundle (${bundle.runs.length} runs total)`); + console.log(`✓ tests at ${head.slice(0, 7)} (${branch}): ${passing}/${tests.length} pass, ${failing} fail, ${placeholderTests.length} placeholder → bundle (${bundle.runs.length} runs total)`); + if (placeholderTests.length > 0) { + for (const p of placeholderTests) { + console.log(` ⚠ placeholder: ${p.file} > ${p.name} (${p.reason})`); + } + } +} + +// --------------------------------------------------------------------- +// Historical mode: run the test suite at each of the last N commits +// that aren't already in the bundle. Opt-in via SAMA_HISTORY_DEPTH. +// --------------------------------------------------------------------- + +const runHistoricalCommit = (sha: string): boolean => { + const wt = `/tmp/tdd-md-wt-${sha.slice(0, 12)}`; + const junit = `/tmp/tdd-md-test-junit-${sha.slice(0, 12)}.xml`; + // Cleanup any leftover from a previous failed run. + spawnSync("git", ["worktree", "remove", "--force", wt], { cwd: REPO_ROOT }); + rmSync(wt, { recursive: true, force: true }); + + const add = spawnSync("git", ["worktree", "add", "--detach", wt, sha], { + cwd: REPO_ROOT, + encoding: "utf8", + }); + if (add.status !== 0) { + console.log(` ✗ worktree add failed for ${sha.slice(0, 7)}: ${add.stderr.trim()}`); + return false; + } + + let added = false; + try { + // Symlink node_modules from the parent checkout. Works as long as + // bun.lock didn't change between commits — true for almost every + // commit on tdd.md. If it diverged, `bun test` will fail loudly + // and we just skip that SHA. + spawnSync("ln", ["-s", resolve(REPO_ROOT, "node_modules"), resolve(wt, "node_modules")]); + + const ranAt = Date.now(); + spawnSync("bun", ["test", "--reporter=junit", `--reporter-outfile=${junit}`], { + cwd: wt, + stdio: "ignore", + timeout: 120_000, + }); + if (!existsSync(junit)) { + console.log(` ✗ no junit output for ${sha.slice(0, 7)} — skipping`); + return false; + } + const histXml = readFileSync(junit, "utf8"); + const histTests = parseJunit(histXml); + if (histTests.length === 0) { + console.log(` ⚠ ${sha.slice(0, 7)} produced 0 tests — likely deps mismatch, skipping`); + return false; + } + const histPlaceholders: PlaceholderTest[] = []; + for (const f of Array.from(new Set(histTests.map((t) => t.file).filter(Boolean)))) { + const abs = resolve(wt, f); + if (!existsSync(abs)) continue; + histPlaceholders.push(...findPlaceholderTests(f, readFileSync(abs, "utf8"))); + } + const histPassing = histTests.filter((t) => t.status === "pass").length; + const histFailing = histTests.length - histPassing; + const histDur = histTests.reduce((s, t) => s + t.durationMs, 0); + const branchAtSha = sh("git", ["log", "-1", "--format=%D", sha]).split(",").map((s) => s.trim()).find((s) => s.startsWith("HEAD ->"))?.replace("HEAD -> ", "") ?? "(detached)"; + + bundle.runs.push({ + sha, + branch: branchAtSha, + ranAt, + total: histTests.length, + passing: histPassing, + failing: histFailing, + durationMs: histDur, + tests: histTests, + placeholderTests: histPlaceholders, + }); + added = true; + console.log(` ✓ ${sha.slice(0, 7)}: ${histPassing}/${histTests.length} pass, ${histFailing} fail, ${histPlaceholders.length} placeholder`); + } finally { + spawnSync("git", ["worktree", "remove", "--force", wt], { cwd: REPO_ROOT }); + rmSync(wt, { recursive: true, force: true }); + rmSync(junit, { force: true }); + } + return added; +}; + +if (HISTORY_DEPTH > 0) { + console.log(`→ historical mode: walking last ${HISTORY_DEPTH} commits`); + const recent = sh("git", ["log", `--max-count=${HISTORY_DEPTH + 1}`, "--pretty=format:%H"]).split("\n").slice(1); // skip HEAD + let addedCount = 0; + for (const sha of recent) { + if (!sha) continue; + if (bundle.runs.some((r) => r.sha === sha)) { + console.log(` • ${sha.slice(0, 7)} already in bundle, skipping`); + continue; + } + if (runHistoricalCommit(sha)) addedCount++; + } + if (addedCount > 0) { + // Re-sort newest-first by ranAt before re-writing. The new + // historical entries used Date.now() at the moment they ran, but + // for chronology we want them positioned by commit author date. + const tsByMessage = (s: string) => Date.parse(sh("git", ["log", "-1", "--format=%aI", s])); + bundle.runs.sort((a, b) => tsByMessage(b.sha) - tsByMessage(a.sha)); + bundle.runs = bundle.runs.slice(0, MAX_RUNS); + writeFileSync(bundlePath, JSON.stringify(bundle, null, 2)); + console.log(`✓ added ${addedCount} historical run${addedCount === 1 ? "" : "s"} → bundle (${bundle.runs.length} runs total)`); + } } diff --git a/src/c14_github.ts b/src/c14_github.ts index 9ebc43f3683fba8a7009c27a277974f0fda86552..bb3192826de952b2b06a95c9954cf9146ccf88be 100644 --- a/src/c14_github.ts +++ b/src/c14_github.ts @@ -222,6 +222,12 @@ export interface TestRecord { durationMs: number; } +export interface PlaceholderTest { + name: string; + file: string; + reason: string; +} + export interface TestRunRecord { sha: string; branch: string; @@ -231,6 +237,9 @@ export interface TestRunRecord { failing: number; durationMs: number; tests: TestRecord[]; + // Optional for backwards-compat with bundles written before the + // placeholder-detection sliver shipped. Treat missing as []. + placeholderTests?: PlaceholderTest[]; } export interface TestBundle { @@ -239,6 +248,95 @@ export interface TestBundle { runs: TestRunRecord[]; } +// --------------------------------------------------------------------- +// SAMA verify support: tree listing via API (one call) + raw-content +// reads for every cXX_*.ts file (raw.githubusercontent.com, no rate +// limit). Used by /sama/verify to inspect any public repo without a +// token. Cached per (owner, name) for an hour. +// --------------------------------------------------------------------- + +export interface RepoTreeEntry { + path: string; + type: "blob" | "tree" | "commit"; + size?: number; +} + +interface RepoTree { + defaultBranch: string; + entries: RepoTreeEntry[]; + truncated: boolean; +} + +const TREE_TTL_MS = 60 * 60 * 1000; +const treeCache = new Map(); + +export const fetchRepoTree = async ( + repoOwner: string, + repoName: string, +): Promise => { + const key = `${repoOwner}/${repoName}`; + const cached = treeCache.get(key); + if (cached && Date.now() - cached.fetchedAt < TREE_TTL_MS) return cached.tree; + + const repoRes = await fetch(`https://api.github.com/repos/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}`, { + headers: { Accept: "application/vnd.github+json", "User-Agent": "tdd.md" }, + }); + if (!repoRes.ok) { + if (cached) return cached.tree; + throw new Error(`GitHub repo lookup failed for ${repoOwner}/${repoName}: HTTP ${repoRes.status}`); + } + const repoMeta = (await repoRes.json()) as { default_branch?: string }; + const defaultBranch = repoMeta.default_branch ?? "main"; + + const treeRes = await fetch( + `https://api.github.com/repos/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}/git/trees/${encodeURIComponent(defaultBranch)}?recursive=1`, + { headers: { Accept: "application/vnd.github+json", "User-Agent": "tdd.md" } }, + ); + if (!treeRes.ok) { + if (cached) return cached.tree; + throw new Error(`GitHub tree fetch failed for ${repoOwner}/${repoName}: HTTP ${treeRes.status}`); + } + const data = (await treeRes.json()) as { + tree?: Array<{ path: string; type: string; size?: number }>; + truncated?: boolean; + }; + const entries = (data.tree ?? []).map((e) => ({ + path: e.path, + type: e.type as RepoTreeEntry["type"], + size: e.size, + })); + const tree: RepoTree = { defaultBranch, entries, truncated: data.truncated ?? false }; + treeCache.set(key, { fetchedAt: Date.now(), tree }); + return tree; +}; + +// Raw content fetch via raw.githubusercontent.com — no API rate limit. +// Per-call timeout via AbortController so a slow upstream can't tie up +// the verifier indefinitely. +export const fetchRepoRawFile = async ( + repoOwner: string, + repoName: string, + ref: string, + path: string, + timeoutMs = 10_000, +): Promise => { + const url = `https://raw.githubusercontent.com/${encodeURIComponent(repoOwner)}/${encodeURIComponent(repoName)}/${encodeURIComponent(ref)}/${path.split("/").map(encodeURIComponent).join("/")}`; + const ctrl = new AbortController(); + const t = setTimeout(() => ctrl.abort(), timeoutMs); + try { + const res = await fetch(url, { + headers: { "User-Agent": "tdd.md" }, + signal: ctrl.signal, + }); + if (!res.ok) return null; + return await res.text(); + } catch { + return null; + } finally { + clearTimeout(t); + } +}; + export const loadTestBundle = async ( repoOwner: string, repoName: string, diff --git a/src/c21_app.ts b/src/c21_app.ts index e5976ea03fb8cf48233a9891bae0702945e6fedf..beed1c594500c287da12f173b62c65ea1c51bdce 100644 --- a/src/c21_app.ts +++ b/src/c21_app.ts @@ -6,6 +6,7 @@ import { renderPage, renderNotFound, htmlResponse, + escape, } from "./c51_render_layout.ts"; import { projectsLandingMd, @@ -23,7 +24,12 @@ import { adminApiHeaders, proxyToForgejo, } from "./c14_forgejo.ts"; -import { fetchProjectConfig } from "./c14_github.ts"; +import { + fetchProjectConfig, + fetchRepoTree, + fetchRepoRawFile, +} from "./c14_github.ts"; +import { verifySama, type SamaReport } from "./c32_sama_verify.ts"; import { listGames, loadGame } from "./c31_games.ts"; import { ALL_POSTS } from "./c31_blog.ts"; import { ALL_GUIDES } from "./c31_guides.ts"; @@ -557,6 +563,7 @@ ${rows} snapshots: data.snapshots, stability: data.stability, unavailableNote, + placeholderTests: data.placeholderTests, }), ogPath: "https://tdd.md/reports/live/tests", }); @@ -666,6 +673,148 @@ ${rows} return htmlResponse(html); }, + "/sama/verify": async (req) => { + const url = new URL(req.url); + const repoArg = (url.searchParams.get("repo") ?? "").trim(); + const formMd = `# SAMA verify + +> Paste a public GitHub repo. tdd.md will run the four [SAMA disciplines](/sama) against the default branch — *Sorted* (lower never imports higher), *Architecture* (known layer prefixes), *Modeled* (sibling tests, types in c31_*), *Atomic* (~700-line split + placeholder-test detection) — and return a report. No clone, no token; just one tree-listing API call plus raw-content reads. Cached for an hour per repo. + +
+ + +
+ +Try it on this site: [\`syntaxai/tdd.md\`](/sama/verify?repo=syntaxai/tdd.md) · or any public repo of your own. + +Limits: anonymous GitHub API quota is 60 requests/hour per IP. Each verify uses one tree-listing call; the rest of the work goes through raw.githubusercontent.com (uncapped). If the verifier returns "rate limit", come back later or use a token-authenticated proxy. + +[← /sama](/sama) +`; + + if (!repoArg) { + const html = await renderPage({ + title: "SAMA verify — tdd.md", + description: "Paste a public GitHub repo, get the four SAMA disciplines verified mechanically: sorted (lower never imports higher), architecture (known layer prefixes), modeled (sibling tests), atomic (700-line + placeholder-test detection).", + bodyMarkdown: formMd, + ogPath: "https://tdd.md/sama/verify", + active: "sama", + }); + return htmlResponse(html); + } + + const m = /^([^\/\s]+)\/([^\/\s]+)$/.exec(repoArg); + if (!m) { + const html = await renderPage({ + title: "SAMA verify · bad input — tdd.md", + description: "SAMA verify expects an owner/name repo identifier.", + bodyMarkdown: `# SAMA verify\n\n> Couldn't parse \`${repoArg}\`. Use the form: \`owner/name\`.\n\n[← back](/sama/verify)\n`, + ogPath: "https://tdd.md/sama/verify", + active: "sama", + noindex: true, + }); + return htmlResponse(html, 400); + } + + const [, owner, name] = m; + let report: SamaReport; + try { + // Dogfood short-circuit: tdd.md is a private repo, so the GitHub + // API can't see it. When asked to verify ourselves, read the + // source from the bundled `./src/` directory inside the container. + // Same checks, same shape, same code path downstream. + const isSelf = owner === LIVE_REPO_OWNER && name === LIVE_REPO_NAME; + if (isSelf) { + const { readdirSync, readFileSync } = await import("node:fs"); + const srcDir = "./src"; + const tsFiles = readdirSync(srcDir, { withFileTypes: true }) + .filter((e) => e.isFile() && e.name.endsWith(".ts")) + .map((e) => e.name) + .sort(); + const contents = new Map(); + for (const f of tsFiles) { + if (/^c\d{2}_/.test(f)) { + contents.set(f, readFileSync(`${srcDir}/${f}`, "utf8")); + } + } + report = verifySama({ + repoOwner: owner!, + repoName: name!, + defaultBranch: "main", + srcPaths: tsFiles, + contents, + }); + } else { + const tree = await fetchRepoTree(owner!, name!); + const srcEntries = tree.entries + .filter((e) => e.type === "blob" && e.path.startsWith("src/") && e.path.endsWith(".ts")) + .slice(0, 200); + const srcPaths = srcEntries.map((e) => e.path.slice("src/".length)); + const samaPaths = srcPaths.filter((p) => /^c\d{2}_/.test(p)); + const contents = new Map(); + const fetches = await Promise.all( + samaPaths.map(async (p) => [p, await fetchRepoRawFile(owner!, name!, tree.defaultBranch, `src/${p}`)] as const), + ); + for (const [p, c] of fetches) { + if (c !== null) contents.set(p, c); + } + report = verifySama({ + repoOwner: owner!, + repoName: name!, + defaultBranch: tree.defaultBranch, + srcPaths, + contents, + }); + } + } catch (e) { + const msg = e instanceof Error ? e.message : String(e); + const html = await renderPage({ + title: `SAMA verify · ${owner}/${name} · error — tdd.md`, + description: `SAMA verify could not inspect ${owner}/${name}.`, + bodyMarkdown: `# SAMA verify · \`${owner}/${name}\`\n\n> Couldn't fetch the repo: ${escape(msg)}\n\nMost common causes: the repo is private, the name is wrong, or you've hit GitHub's anonymous rate limit (60/hour). [← try another repo](/sama/verify)\n`, + ogPath: `https://tdd.md/sama/verify?repo=${owner}/${name}`, + active: "sama", + noindex: true, + }); + return htmlResponse(html, 502); + } + + const summary = report.overallPassed + ? `> ✓ All four checks passed for [\`${report.repoSlug}\`](https://github.com/${report.repoSlug}) on \`${report.defaultBranch}\` (${report.samaFiles} SAMA files / ${report.testFiles} tests / ${report.totalSrcFiles} total in src/).` + : `> ⚠ ${report.checks.filter((c) => !c.passed).length} of 4 checks failed for [\`${report.repoSlug}\`](https://github.com/${report.repoSlug}) on \`${report.defaultBranch}\`.`; + const checkBlocks = report.checks + .map((c) => { + const status = c.passed ? "✓ pass" : `✗ ${c.violations.length} violation${c.violations.length === 1 ? "" : "s"}`; + const violationsBlock = c.violations.length === 0 + ? "" + : `\n\n${c.violations.slice(0, 20).map((v) => `- \`${escape(v.file)}\` — ${escape(v.detail)}`).join("\n")}${c.violations.length > 20 ? `\n- _...and ${c.violations.length - 20} more_` : ""}`; + const noteBlock = c.note ? `\n\n_${escape(c.note)}_` : ""; + return `### ${c.letter} — ${c.property} · ${status}\n\nExamined ${c.examined} file${c.examined === 1 ? "" : "s"}.${violationsBlock}${noteBlock}`; + }) + .join("\n\n"); + const reportMd = `# SAMA verify · \`${report.repoSlug}\` + +${summary} + +${checkBlocks} + +--- + +[← verify another repo](/sama/verify) · [the four SAMA disciplines →](/sama) · [SAMA skill for your agent →](/sama/skill) +`; + const html = await renderPage({ + title: `SAMA verify · ${report.repoSlug} — tdd.md`, + description: `SAMA verification for ${report.repoSlug}: ${report.overallPassed ? "all four checks passed" : `${report.checks.filter((c) => !c.passed).length}/4 checks failed`}.`, + bodyMarkdown: reportMd, + ogPath: `https://tdd.md/sama/verify?repo=${report.repoSlug}`, + active: "sama", + }); + return htmlResponse(html); + }, + "/sama": async () => { const rows = ALL_SAMA .map((d) => `| **[${d.letter} — ${d.title}](/sama/${d.slug})** | ${d.rule} |`) @@ -703,6 +852,19 @@ curl -fsSL https://tdd.md/skills/sama.md -o ~/.claude/skills/sama.md The skill is the same content as the four pages here, written in obra/superpowers SKILL.md format with frontmatter, an iron-rule statement, and a verification checklist your agent can run before merging. **[Read it formatted →](/sama/skill)** · **[Raw markdown →](/skills/sama.md)** +## verify any public repo + +Want to know whether a repo follows SAMA without reading its source? Paste the \`owner/name\` and tdd.md will run all four checks against the default branch — *Sorted* (the import-direction grep), *Architecture* (known layer prefixes), *Modeled* (sibling tests), *Atomic* (700-line + placeholder-test detection). Pass/fail per discipline, with violation lists. **[verify a repo →](/sama/verify)** · or try it on this site: [\`syntaxai/tdd.md\`](/sama/verify?repo=syntaxai/tdd.md). + +## the case behind it + +Two long-form pieces that argue *why* SAMA is shaped this way: + +- [**The Claude Code harness postmortem read through TDD + SAMA**](/blog/claude-code-harness-postmortem) — ThePaSch's r/ClaudeAI audit (40+ hidden reminders, 5 gag-order sites, 158 prompt versions in 11 days) read against the iron law and the verification grep. *The harness is loud; the diff doesn't have to be.* +- [**Three patterns ten threads converge on**](/blog/agentic-coding-corpus-three-patterns) — a six-month corpus of r/ClaudeAI, r/ClaudeCode, r/AgentsOfAI failure-mode threads. Per-pattern mitigation tables map each thread to the SAMA / iron-law rule that catches or prevents it. + +If you're reading these for the first time, the order to take them is harness postmortem → corpus → back here. + ## why these four together Each property fixes a different failure mode: diff --git a/src/c31_blog.ts b/src/c31_blog.ts index 20af3ed5f2131f9aef0b3f146f1fdd22983a6efc..3574fe33c37e9839ff9f39190be846a1db4743ed 100644 --- a/src/c31_blog.ts +++ b/src/c31_blog.ts @@ -12,6 +12,12 @@ export interface BlogEntry { } export const ALL_POSTS: BlogEntry[] = [ + { + slug: "from-rules-to-checks", + title: "From rules to checks: shipping what the corpus post promised", + description: "The corpus post named three checks the discipline should run. This post is the receipt. Three slivers shipped: placeholder-test detection (live on /reports/live/tests), historical-commit testing via git worktree (opt-in via SAMA_HISTORY_DEPTH), and /sama/verify - a four-discipline report runnable against any public repo. The rules are now URLs you can hit.", + date: "2026-05-09", + }, { slug: "agentic-coding-corpus-three-patterns", title: "Three patterns ten threads converge on", diff --git a/src/c32_real_tests.ts b/src/c32_real_tests.ts index e123a9035ac476aa8006094116bc335d824e9cff..67b9fc6c8c749496867fc4aa3e2715b04dd3a787 100644 --- a/src/c32_real_tests.ts +++ b/src/c32_real_tests.ts @@ -5,7 +5,7 @@ // Pure given the bundle + commits in (no I/O of its own beyond delegating // to c14_github's bundle loader and commits fetcher). -import { fetchRepoCommits, loadTestBundle } from "./c14_github.ts"; +import { fetchRepoCommits, loadTestBundle, type PlaceholderTest } from "./c14_github.ts"; import type { AgentReport, TestFailure, @@ -31,6 +31,7 @@ export interface LiveTestData { runsCount: number; ranAt: number | null; headSha: string | null; + placeholderTests: PlaceholderTest[]; } export const buildLiveTestData = async ( @@ -39,12 +40,12 @@ export const buildLiveTestData = async ( ): Promise => { const bundle = await loadTestBundle(repoOwner, repoName); if (!bundle || bundle.runs.length === 0) { - return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null }; + return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null, placeholderTests: [] }; } const repoSlug = `${repoOwner}/${repoName}`; const latest = bundle.runs[0]; if (!latest) { - return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null }; + return { snapshots: [], stability: [], runsCount: 0, ranAt: null, headSha: null, placeholderTests: [] }; } // For "since" we want the oldest run that has this test as failing. @@ -136,5 +137,6 @@ export const buildLiveTestData = async ( runsCount: bundle.runs.length, ranAt: latest.ranAt, headSha: latest.sha, + placeholderTests: latest.placeholderTests ?? [], }; }; diff --git a/src/c32_sama_verify.ts b/src/c32_sama_verify.ts new file mode 100644 index 0000000000000000000000000000000000000000..223a85a6a3141107a750ddde6c9cd0ee3bd9c2e0 --- /dev/null +++ b/src/c32_sama_verify.ts @@ -0,0 +1,272 @@ +// c32 — logic: pure SAMA verification given a repo's file tree and the +// contents of every cXX_*.ts file. Drives /sama/verify. +// +// Verifier is intentionally strict: a check passes iff there is zero +// evidence of violation. The four properties (S/A/M/A) each become one +// callable, and the top-level `verifySama(...)` runs them all and +// returns a SamaReport. + +export interface SamaViolation { + file: string; + detail: string; +} + +export interface SamaCheckResult { + letter: "S" | "A" | "M" | "A"; + property: "Sorted" | "Architecture" | "Modeled" | "Atomic"; + passed: boolean; + examined: number; + violations: SamaViolation[]; + note?: string; +} + +export interface SamaReport { + repoSlug: string; + defaultBranch: string; + totalSrcFiles: number; + samaFiles: number; + testFiles: number; + checks: SamaCheckResult[]; + overallPassed: boolean; + generatedAt: number; +} + +export interface SamaVerifyInput { + repoOwner: string; + repoName: string; + defaultBranch: string; + // src-relative paths, e.g. "c21_app.ts", "c31_blog.ts", "c32_session.test.ts" + srcPaths: string[]; + // file path -> content. Contents only required for cXX_*.ts files + // and *.test.ts files. + contents: Map; +} + +const SAMA_PREFIX = /^c(\d{2})_/; + +const isSamaFile = (p: string): boolean => SAMA_PREFIX.test(p) && p.endsWith(".ts"); +const isTestFile = (p: string): boolean => p.endsWith(".test.ts"); + +const layerOf = (filename: string): number | null => { + const m = SAMA_PREFIX.exec(filename); + if (!m) return null; + return parseInt(m[1] ?? "0", 10); +}; + +// Pull import targets out of a TypeScript source. Recognizes both +// static `import ... from "./x.ts"` and dynamic `import("./x.ts")`. +// We only care about relative imports (the cross-layer ones). +const collectRelativeImports = (source: string): string[] => { + const out: string[] = []; + const staticRe = /\bfrom\s+["']\s*(\.\/[^"']+)["']/g; + const dynRe = /\bimport\s*\(\s*["']\s*(\.\/[^"']+)["']/g; + let m: RegExpExecArray | null; + while ((m = staticRe.exec(source)) !== null) if (m[1]) out.push(m[1]); + while ((m = dynRe.exec(source)) !== null) if (m[1]) out.push(m[1]); + return out; +}; + +const importTargetFilename = (importPath: string): string => { + // "./c14_github.ts" -> "c14_github.ts" + return importPath.replace(/^\.\//, ""); +}; + +// S — Sorted. The rule, as practiced: foundation, data and logic layers +// (c1*, c3*) don't import UI (c5*+). c21 (handlers/composers) is the +// orchestration layer and is allowed to import anything; c51 (UI) is +// allowed to import models (c3*) for the data it renders. A strict +// "lower never imports higher" reading would forbid c21 → c31, which +// is the natural pattern (handler composes model). The actual +// constraint is one-directional: UI sits at the edge, never below. +const checkSorted = (input: SamaVerifyInput): SamaCheckResult => { + const violations: SamaViolation[] = []; + let examined = 0; + for (const path of input.srcPaths) { + if (!isSamaFile(path)) continue; + examined++; + const m = SAMA_PREFIX.exec(path); + const prefix = m?.[1] ?? ""; + // Skip c2* (handlers, allowed to depend on anything) and c5*+ (UI, + // its outbound deps are governed by other rules, not this one). + if (!/^[13]/.test(prefix)) continue; + const content = input.contents.get(path); + if (!content) continue; + for (const rawImport of collectRelativeImports(content)) { + const target = importTargetFilename(rawImport); + const targetMatch = SAMA_PREFIX.exec(target); + const targetPrefix = targetMatch?.[1] ?? ""; + if (!targetPrefix) continue; + if (/^[59]/.test(targetPrefix)) { + violations.push({ + file: path, + detail: `imports \`${target}\` (UI layer c${targetPrefix}_) from a non-UI/non-handler file (c${prefix}_) — UI sits at the edge, foundation/data/logic must not depend on it`, + }); + } + } + } + return { + letter: "S", + property: "Sorted", + passed: violations.length === 0, + examined, + violations, + note: examined === 0 + ? "no cXX_*.ts files found in the project — the convention isn't applied here" + : undefined, + }; +}; + +// A — Architecture. Each prefix is a known layer; flag unknown prefixes. +const KNOWN_LAYERS = new Set(["11", "13", "14", "21", "31", "32", "51"]); +const checkArchitecture = (input: SamaVerifyInput): SamaCheckResult => { + const violations: SamaViolation[] = []; + let examined = 0; + for (const path of input.srcPaths) { + if (!isSamaFile(path)) continue; + examined++; + const m = SAMA_PREFIX.exec(path); + const prefix = m?.[1] ?? ""; + if (!KNOWN_LAYERS.has(prefix)) { + violations.push({ + file: path, + detail: `unknown layer prefix \`c${prefix}_\` (known: c11, c13, c14, c21, c31, c32, c51)`, + }); + } + } + return { + letter: "A", + property: "Architecture", + passed: violations.length === 0, + examined, + violations, + }; +}; + +// M — Modeled. Tests live next to source. Every cXX_.ts (non-data) +// should have a sibling cXX_.test.ts. Pure data files (registries +// like c31_blog.ts that are just an exported array) often legitimately +// have no behaviour to test, so we soften this check by requiring a +// sibling for c32_*.ts (logic) at minimum, and reporting a list of c31 +// files without siblings as informational rather than hard violations. +const checkModeled = (input: SamaVerifyInput): SamaCheckResult => { + const violations: SamaViolation[] = []; + const informational: SamaViolation[] = []; + let examined = 0; + const present = new Set(input.srcPaths); + for (const path of input.srcPaths) { + if (!isSamaFile(path) || isTestFile(path)) continue; + examined++; + const sibling = path.replace(/\.ts$/, ".test.ts"); + if (present.has(sibling)) continue; + const layer = layerOf(path); + if (layer === 32) { + violations.push({ file: path, detail: `no sibling test file at \`${sibling}\`` }); + } else if (layer === 31) { + informational.push({ file: path, detail: `no sibling test (often fine for pure data registries; flag if logic accumulates)` }); + } + } + const passed = violations.length === 0; + const note = informational.length > 0 + ? `${informational.length} c31_* file${informational.length === 1 ? "" : "s"} without a sibling test — usually fine for pure-data registries, flag if logic accumulates: ${informational.map((v) => v.file).join(", ")}` + : undefined; + return { + letter: "M", + property: "Modeled", + passed, + examined, + violations, + note, + }; +}; + +// A — Atomic. ~700-line split rule. Flag any cXX_*.ts over 700 lines. +// Also flag placeholder tests (zero expect() calls in test body) as +// part of the same pass — they're a structural violation of the +// testing surface that Atomic owns. +const findPlaceholderTestsLite = (file: string, content: string): SamaViolation[] => { + const out: SamaViolation[] = []; + const re = /\b(test|it)\s*\(\s*(["'`])((?:\\.|(?!\2).)*)\2\s*,\s*(?:async\s+)?(?:\([^)]*\)|[^=()]*?)\s*=>\s*\{/g; + let m: RegExpExecArray | null; + while ((m = re.exec(content)) !== null) { + const name = m[3] ?? ""; + const startBrace = re.lastIndex - 1; + let depth = 1; + let i = startBrace + 1; + let inString: string | null = null; + while (i < content.length && depth > 0) { + const c = content[i]; + if (inString !== null) { + if (c === "\\") { i += 2; continue; } + if (c === inString) inString = null; + } else { + if (c === '"' || c === "'" || c === "`") inString = c; + else if (c === "/" && content[i + 1] === "/") { + while (i < content.length && content[i] !== "\n") i++; + continue; + } else if (c === "/" && content[i + 1] === "*") { + i += 2; + while (i < content.length - 1 && !(content[i] === "*" && content[i + 1] === "/")) i++; + i += 2; + continue; + } else if (c === "{") depth++; + else if (c === "}") depth--; + } + i++; + } + const body = content.slice(startBrace + 1, i - 1); + const expectCount = (body.match(/\bexpect\s*\(/g) ?? []).length; + if (expectCount === 0) { + out.push({ file, detail: `placeholder test \`${name}\` — zero \`expect()\` calls` }); + } + } + return out; +}; + +const checkAtomic = (input: SamaVerifyInput): SamaCheckResult => { + const violations: SamaViolation[] = []; + let examined = 0; + for (const path of input.srcPaths) { + if (!isSamaFile(path)) continue; + examined++; + const content = input.contents.get(path); + if (!content) continue; + const lineCount = content.split("\n").length; + if (lineCount > 700) { + violations.push({ + file: path, + detail: `${lineCount} lines (over the 700-line split threshold — split per UI/data domain)`, + }); + } + if (isTestFile(path)) { + violations.push(...findPlaceholderTestsLite(path, content)); + } + } + return { + letter: "A", + property: "Atomic", + passed: violations.length === 0, + examined, + violations, + }; +}; + +export const verifySama = (input: SamaVerifyInput): SamaReport => { + const samaPaths = input.srcPaths.filter(isSamaFile); + const testPaths = samaPaths.filter(isTestFile); + const checks = [ + checkSorted(input), + checkArchitecture(input), + checkModeled(input), + checkAtomic(input), + ]; + return { + repoSlug: `${input.repoOwner}/${input.repoName}`, + defaultBranch: input.defaultBranch, + totalSrcFiles: input.srcPaths.length, + samaFiles: samaPaths.length, + testFiles: testPaths.length, + checks, + overallPassed: checks.every((c) => c.passed), + generatedAt: Date.now(), + }; +}; diff --git a/src/c51_render_reports.ts b/src/c51_render_reports.ts index 7f1be09ab39a6fc45fe3dc4e69964b53f76b0f16..06f0fa5c0d7c03ef1958fc228290c8b0d5755ae3 100644 --- a/src/c51_render_reports.ts +++ b/src/c51_render_reports.ts @@ -38,6 +38,9 @@ export interface TestsOverviewContext { // When the runner sliver isn't wired (live mode, today), pass a // placeholder note instead of the snapshot+stability sections. unavailableNote?: string; + // Placeholder-test detection: tests with zero `expect()` calls in + // their body. Surfaces the failure mode from r/ClaudeCode 1qix264. + placeholderTests?: { name: string; file: string; reason: string }[]; } const trendArrow = (delta: number): { glyph: string; cls: string } => @@ -275,6 +278,20 @@ ${ctx.bannerHtml} const failing = ctx.snapshots.reduce((s, r) => s + r.failing, 0); const snapshots = ctx.snapshots.map(snapshotBlock).join("\n"); const stabRows = ctx.stability.map(stabilityRow).join("\n"); + const placeholders = ctx.placeholderTests ?? []; + const placeholderBlock = placeholders.length === 0 + ? `## placeholder tests + +> No placeholder tests detected at this snapshot. A placeholder is a test whose body contains zero \`expect()\` calls — covered in [the corpus post](/blog/agentic-coding-corpus-three-patterns) as the failure mode from r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass rate"). Detection runs on every deploy. +` + : `## placeholder tests · ⚠ ${placeholders.length} flagged + +> A placeholder test is one whose body contains zero \`expect()\` calls — empty body, comment-only stub, or string-literal body. Covered in [the corpus post](/blog/agentic-coding-corpus-three-patterns) as the failure mode from r/ClaudeCode 1qix264. The judge would refuse a merge that includes any of these. + +| test | file | reason | +|---|---|---| +${placeholders.map((p) => `| ${escape(p.name)} | \`${escape(p.file)}\` | ${escape(p.reason)} |`).join("\n")} +`; return `# tests overview ${ctx.bannerHtml} @@ -287,7 +304,9 @@ ${ctx.bannerHtml} ${snapshots} -**Total**: ${total.toLocaleString()} tests · ${passing.toLocaleString()} passing · ${failing.toLocaleString()} failing. +**Total**: ${total.toLocaleString()} tests · ${passing.toLocaleString()} passing · ${failing.toLocaleString()} failing${placeholders.length > 0 ? ` · ${placeholders.length} placeholder ⚠` : ""}. + +${placeholderBlock} ## test stability · ${ctx.period}