# From rules to checks: shipping what the corpus post promised

> The [corpus post](/blog/2026-05/agentic-coding-corpus-three-patterns) closed with a promise: *"three of the ten threads describe failures only an actual test run can catch"* — and named which checks would have caught which failure modes (some today, some with a small extension, some only with a sandbox runner). This post is the receipt. Three of those checks now ship: **placeholder-test detection** (the *"one-evening sliver"*), **historical-commit testing via git worktree** (the *"next slice on the roadmap"*), and **`/sama/verify`** (mechanical layer grep + sibling-test + line-count + placeholder check, runnable against any public repo).

## why this post is short

The two previous posts made an argument. This one documents an outcome. If the argument was right, the outcome should be small and obvious: the rules become checks, the checks become routes, the routes catch the failure modes the corpus catalogued. Here's what that looks like in practice.

## 1 · placeholder-test detection (caught today)

Failure mode: r/ClaudeCode `1qix264`, *"Claude wrote 90 placeholder tests and reported 100% pass rate"*. The corpus post said:

> *"empty assertion bodies — zero `expect()` calls, string-literal bodies, single-line `// TODO` stubs — are AST-checkable. The test bundle already lives in `content/git-history/syntaxai__tdd.md__tests.json`; an empty-body check is a one-evening sliver."*

The check is a regex-based brace walker that extracts every `test(...)` and `it(...)` body, counts `expect(` occurrences, and flags zero-count bodies. It runs at deploy time as part of the existing `snapshot-tests.ts` script and writes its findings to the bundle as `placeholderTests: { name, file, reason }[]`. The runtime renderer surfaces them on [`/reports/live/tests`](/reports/live/tests):

- Zero placeholders → a small "no placeholder tests detected at this snapshot" note explaining what the check looks for.
- One or more → a flagged section with a per-test table: name, file, reason (*"no expect() calls"*, *"empty test body"*, *"comment-only stub"*).

It catches the most common shape of `1qix264` directly (`expect()` count is zero). It misses theoretical ones (custom assertion helpers that don't go through `expect`); the regex's blast radius is the real failures, not every imaginable one.

## 2 · historical-commit testing (the sandbox runner sliver)

Failure mode: r/ClaudeCode `1rug14a`, *"Claude wrote Playwright tests that secretly patched the app at runtime"*. This is the failure that the previous reporting layer couldn't catch — the diff looks fine, the test passes in the agent's terminal, the test passes in the deploy-time bundle too if the bundle only ever ran HEAD. Catching this needs the same test to run *somewhere it's never run before*, against the actual code at that SHA.

The new mode: `SAMA_HISTORY_DEPTH=N` in the deploy environment makes the snapshot script also test the last *N* commits that aren't already in the bundle. Mechanically:

```ts
// scripts/p620/snapshot-tests.ts (excerpt)
git worktree add --detach /tmp/tdd-md-wt-<sha> <sha>
ln -s "$REPO_ROOT/node_modules" "$WORKTREE/node_modules"
bun test --reporter=junit --reporter-outfile=/tmp/junit-<sha>.xml
git worktree remove --force /tmp/tdd-md-wt-<sha>
```

Each historical run produces the same `TestRunRecord` shape as a HEAD run, gets appended to the bundle keyed by SHA, and feeds the existing stability table. Two consequences:

- **Stability data builds 10× faster.** A first `SAMA_HISTORY_DEPTH=10` deploy backfills ten runs in one go instead of waiting ten deploys.
- **Runtime-patching becomes detectable in principle.** A test that passed in the agent's session AND in the original deploy run, but fails when re-run from a clean worktree at the same SHA, is the smoking-gun shape of `1rug14a`. We're not yet wired to flag the discrepancy as a separate failure mode (that's the next sliver), but the data to compare *is now in the bundle*.

The default is still `HISTORY_DEPTH=0` (HEAD-only). Opt-in keeps deploy time bounded; flipping the default to `5` or `10` is a one-line change once we want it on by default.

## 3 · `/sama/verify` (mechanical check for any public repo)

The corpus post argued: *"don't write a CLAUDE.md instruction the harness can overrule. Write a structural check the harness doesn't get to know about."* That argument is hollow if the structural checks aren't actually runnable. The new route closes the loop:

**[/sama/verify](/sama/verify)** — paste a public GitHub repo, get a four-discipline report. The mechanics:

1. One GitHub API call to `git/trees/<default-branch>?recursive=1` resolves the file list.
2. Every `src/cXX_*.ts` file is fetched via `raw.githubusercontent.com` (no API rate limit, no token).
3. Pure logic in [`c32_sama_verify.ts`](https://github.com/syntaxai/tdd.md/blob/main/src/c32_sama_verify.ts) runs the four checks:
   - **S — Sorted**: every relative `from "./..."` import in a `cXX_*.ts` is parsed; flag if the target's prefix is higher than the source's.
   - **A — Architecture**: every `cXX_` prefix is matched against the known set (`c11`, `c13`, `c14`, `c21`, `c31`, `c32`, `c51`); unknown ones flagged.
   - **M — Modeled**: every `cXX_<name>.ts` (non-test) is checked for a sibling `cXX_<name>.test.ts`. Hard-fails for `c32_*` (logic); informational for `c31_*` (often pure-data registries).
   - **A — Atomic**: line count over 700 → flagged. Test files → run the same placeholder check from sliver #1.

Output: pass/fail per discipline, with up to 20 violations per check listed (`file` + `detail`). Cached for an hour per repo.

Try it on this site: [`/sama/verify?repo=syntaxai/tdd.md`](/sama/verify?repo=syntaxai/tdd.md). And here's the dogfood result, honestly:

| check | tdd.md self-verify result |
|---|---|
| S — Sorted | ✓ pass — no UI dependency leaks into foundation/data/logic |
| A — Architecture | ✓ pass — every prefix is in the known set |
| M — Modeled | ✗ 5 violations — `c32_judge.ts`, `c32_session.ts`, `c32_real_reports.ts`, `c32_real_tests.ts`, `c32_sama_verify.ts` lack sibling test files |
| A — Atomic | ✗ 1 violation — `c21_app.ts` is 1066 lines (over the 700-line split threshold) |

Two of four fail, and they're real. Five `c32_*` logic files — including `c32_sama_verify.ts`, the file that *runs the verification* — don't have sibling tests yet, and the route dispatcher has grown past the atomic threshold and now needs a per-domain split. Both findings were caught by the tool we just shipped, against the codebase we just shipped it from. That's the dogfood story: not "everything passes" but "the tool catches real things in real code, including its own". Both are on the very next slice of the roadmap.

## what this changes about the case

The argument has now happened in three layers:

1. **The harness postmortem post** said: structural rules survive harness chaos because they're enforced outside the agent's context window.
2. **The corpus post** said: ten threads prove the failure modes are systematic, here are the rules that catch each, here's what we catch and what we don't yet.
3. **This post** says: the rules are now checks, the checks are now URLs you can hit, and you can verify the case against any public repo *including this one*.

The leftover work — flagging a runtime-patching discrepancy as a distinct failure mode, hidden-test verification on real-project commits, AST-level placeholder detection beyond the regex — is in the open. It's smaller than what shipped this week.

## tl;dr

The two previous posts made a case from text. This one ships the checks the case promised:

| sliver | route | catches | status |
|---|---|---|---|
| placeholder detection | [/reports/live/tests](/reports/live/tests) | r/ClaudeCode 1qix264 ("90 placeholder tests, 100% pass") | live |
| historical-commit testing | snapshot script with `SAMA_HISTORY_DEPTH=N` | runtime-patching SHAs ([groundwork for 1rug14a](/blog/2026-05/agentic-coding-corpus-three-patterns)) | opt-in, default 0 |
| `/sama/verify` | [/sama/verify](/sama/verify) | layer violations, missing sibling tests, oversized files, placeholder tests, in any public repo | live |

If the discipline is real, you should be able to point it at a repo and have it report findings. Now you can.

[← back to the blog](/blog) · [the four SAMA disciplines →](/sama) · [drop SAMA into your agent →](/sama/skill) · [verify a repo →](/sama/verify)