# When the deploy lies: three bugs hidden by one silent error suppressor The two prior posts in this thread were clean rounds: the verifier named a violation, I produced the named artifact, the verifier flipped green. Atomic-700 on `c21_app.ts` → split per domain → ✓. Modeled on four `c32_*.ts` files → add the four siblings → ✓. Encouraging stories about mechanical enforcement. This post is the messy round. It's the one that taught me that mechanical enforcement only works if the pipeline that runs it is itself running. ## The visible bug `/reports/live` is the public live-data demo: real commit history for this repo, rendered into a TDD-discipline scorecard, refreshed on every deploy. On 2026-05-22 the header read: ``` tdd-discipline report · 2026-05-03 → 2026-05-10 ``` Twelve days of staleness on a page that calls itself "live." I'd shipped seven commits across the previous rounds and none of them appeared. ## Why nobody noticed for 12 days The deploy script in git-mode invoked the snapshot generator over ssh: ```bash ssh "$SSH_HOST" "cd ~/$REMOTE_SRC_DIR && bun scripts/p620/snapshot-git-history.ts" 2>/dev/null \ || echo " ⚠ snapshot-git-history skipped (script may live outside the rsync exclude — non-fatal)" ``` Two clauses are doing the damage: - `2>/dev/null` discards stderr — including the error message we'd want. - `|| echo " ⚠ ... non-fatal"` turns a real failure into a printed warning. Worse, the warning text *blames the wrong thing* ("script may live outside the rsync exclude") so anyone who DID see the warning would file it under "harmless artifact of rsync vs git mode" and move on. The actual failure: there's no `bun` on the p620 host. Bun lives only inside the tdd-md container image. The ssh tried to invoke a binary that doesn't exist on PATH; the shell returned 127; the warning fired; the deploy continued; the snapshot file's timestamp stayed at May 11. Twelve days. Every deploy. Both of the previous "clean rounds" deployed through this same broken path and updated the *site* but not the *live data*. The blog posts about going green were themselves served by a deploy script that was lying about its own snapshot step. ## Fix 1, and what it revealed The fix is structurally trivial: run the script *inside* the container where bun lives, by mounting the working tree as a volume: ```bash ssh "$SSH_HOST" "podman run --rm \ -v \$HOME/$REMOTE_SRC_DIR:/work:Z \ --workdir /work \ $IMAGE_TAG \ bun scripts/p620/snapshot-git-history.ts" \ || { echo '✗ snapshot-git-history failed'; exit 1; } ``` The `:Z` is the Fedora SELinux relabel — the script process inside needs to be able to read/write the bind mount. The `|| { echo ✗; exit 1 }` replaces the swallow with a real failure mode. No more silent skips. After this fix landed, `/reports/live` immediately caught up: ``` tdd-discipline report · 2026-05-03 → 2026-05-22 ``` So far so good. But the moment I looked at `/reports/live/tests`, the sibling test-stability page, the timestamp said: ``` last run 2026-05-10 · 17 runs cumulative ``` Same staleness. Different cause. ## The second silent failure Looking at the deploy script again, the **rsync** escape hatch runs both snapshot scripts: ```bash ( cd "$REPO_ROOT" && bun scripts/p620/snapshot-git-history.ts ) || ... ( cd "$REPO_ROOT" && bun scripts/p620/snapshot-tests.ts ) || ... ``` The **git-mode** happy path runs only the first one. When the deploy flow switched from rsync to git as the default a while back, the test-snapshot step got dropped on the floor and nobody noticed — because the test-stability page was always 17 cumulative runs old, and "old enough that nobody questioned the number" is one of the failure modes that a verifier can't detect. Fix 2: add the second podman-run step, with one wrinkle. Unlike `snapshot-git-history` (which is pure git + filesystem), `snapshot-tests` calls `bun test`, which needs `node_modules` to resolve `marked` and `node-html-parser`. The bind-mounted host directory has no `node_modules` (the host has no Bun). But the image already ships them at `/app/node_modules`. So: ```bash podman run --rm -v $HOME/src/tdd.md:/work:Z --workdir /work $IMAGE_TAG \ sh -c 'ln -sfn /app/node_modules node_modules && bun scripts/p620/snapshot-tests.ts' ``` Symlink the container's `node_modules` into the work directory, then let the script use it. The symlink persists on the host between deploys but points at a path inside the container — harmless dead-link outside the next podman-run, valid inside. ## Two more bugs, surfaced by the snapshot actually running When the next deploy ran with both snapshots wired in, the live page now read: ``` Total: 193 tests · 192 passing · 1 failing · 1 placeholder ⚠ ``` 193 pass locally, every time I run them. 192 pass + 1 fail + 1 placeholder on the container. Two bugs that had been hiding behind "the test suite never actually ran in the deploy pipeline." ### Bug A: a 1-in-16 flaky test The failing test was one I wrote in the prior round: ```ts test("verifySession rejects a cookie with a forged signature", async () => { const cookie = await signSession("eve"); const tampered = cookie.replace(/.$/, "0"); const result = await verifySession(tampered); expect(result).toBeNull(); }); ``` `replace(/.$/, "0")` replaces the last character with "0". When the HMAC signature's last hex digit *is already* "0" — which happens with probability 1/16, since SHA-256 hex output is uniform — the "tampered" string is identical to the original, the signature verifies, the function returns `"eve"`, and the assertion fails. Local runs masked this because the random draws (the timestamp going into the signed payload) happened to never produce a `0`-ending sig. The first run that actually ran in CI hit the unlucky draw and exposed it. Fix: read the last char, flip to a digit it definitely isn't: ```ts const lastChar = cookie.slice(-1); const tampered = cookie.slice(0, -1) + (lastChar === "f" ? "0" : "f"); expect(tampered).not.toBe(cookie); // loudly fail if a future regression collides ``` Five runs in a row, every one passes. Determinism restored. ### Bug B: the verifier's own test, flagged by its own check The placeholder warning pointed at: ``` src/c32_sama_verify.test.ts > does nothing ``` `c32_sama_verify.ts` is the verifier itself. Its test file holds a fixture: ```ts test("Atomic: placeholder test (zero expect calls) is flagged", () => { const placeholderFixture = `test("does nothing", () => { /* TODO */ })`; // ... feed it to the verifier, assert the verifier flags it }); ``` The string `test("does nothing", () => { /* TODO */ })` is a *fixture* — a literal example of what a placeholder test looks like, fed to the verifier so we can assert the verifier catches it. It's not a real test. The verifier itself handles this correctly. It uses a `stripStringsAndComments` helper to mask out string literals before running its `test()`-finder regex over the source. So when the verifier scans `c32_sama_verify.test.ts`, it sees the fixture as whitespace, doesn't pick it up, and reports zero placeholders in that file. But `snapshot-tests.ts` — the deploy-time generator that feeds `/reports/live/tests` — duplicated the regex *without* the strip-strings step. So it grepped the raw source, found the fixture inside the backtick string, treated it as a real `test()` call, walked its (TODO-only) body, counted zero `expect()` calls, and flagged it. The deploy-time detector was flagging the very test that proves the runtime detector works. Fix: export `stripStringsAndComments` from `c32_sama_verify.ts` and use the same mask-index pattern in the snapshot script: ```ts import { stripStringsAndComments } from "../../src/c32_sama_verify.ts"; // ... const mask = stripStringsAndComments(content); while ((m = re.exec(content)) !== null) { // If the match position is whitespace in the mask, the original // was inside a string or comment — skip. if (mask[m.index] === " " || mask[m.index] === "\n") continue; // ... rest of the body-walking logic } ``` DRYing the helper across the two places that need the same string-aware behaviour. Now the snapshot agrees with the verifier. ## What the cascade was actually telling me The bug count for ronde 4 looks bad: a 12-day staleness, a flaky test, a false-positive in the deploy-time detector. Three independent problems. But the *order* is the part worth looking at. Each fix made the next one visible: 1. Deploy script ran the snapshot step → file's timestamp moved → `/reports/live` started reporting current commits. 2. Deploy script ran the test snapshot → tests actually ran in the deploy pipeline → the flaky test surfaced (because previously it never ran in CI), and the false-positive surfaced (because previously the snapshot was 12 days old and that particular fixture had been added since then). 3. Each fix's success was the precondition for the next bug to be visible. The cascade isn't proof the system is fragile. It's proof that the system was *blind* — a layer of silent error suppression had hidden every downstream failure, so they accumulated without being detected. The fix was less "patch three things" than "remove the lie and watch what falls out." This is the same shape as TDD's iron rule applied to *infrastructure* rather than to source: you can't trust a pass you didn't run. The deploy-pipeline checks `bun test` exits zero — but only if `bun test` *ran*. If the call returns 127 (command not found) and the deploy script swallows it, every later assertion is hollow. `/reports/live` showing all-green for 12 days was perfectly compatible with the test suite being completely broken. The only way to know is to delete the swallowing. ## Why this is the empirical case for SAMA, not against it A naive reading is "the codebase had three bugs you didn't catch." The fairer reading is: the codebase had *one* bug — silent error suppression in a deploy script — and the other two were latent consequences that the verifier *would have* caught the moment they ran. Removing the silence took ~15 minutes. Once silence was gone, both hidden bugs surfaced *on the very next deploy*, with line numbers and file paths, in two cells of a public web page. That's the empirical pattern SAMA's pitch turns on, scaled to the infrastructure layer: - **Verification has to be observable.** A check that runs into `2>/dev/null` is indistinguishable from a check that passes. - **The cost of removing silence is low.** A `||` swallow → `|| { echo ✗; exit 1; }` is a one-line change. A `2>/dev/null` → `2>&1` is one word. - **Removing silence pays compounding returns.** Three bugs hidden by one suppressor — each one would have been instantly diagnosable if the surface had been honest. ## What this still doesn't prove It doesn't prove that exposing every failure produces a useful signal. Some failures *should* be tolerated (best-effort cleanup, optional caches), and over-strict failure handling can break production for trivial reasons. The judgement is *which* failures: in this case, `snapshot-git-history` running was load-bearing for the public claim that `/reports/live` reflects the current repo. Treating its failure as "non-fatal" was a category error. The general principle the cascade demonstrates: in a system whose value proposition is *the artefacts a reviewer can replay*, the pipeline that produces those artefacts has the same audit requirements as the source code does. Silent failures in the pipeline are violations of the standard the same way silent failures in the source would be. --- **See for yourself:** - Live: (date window is now current) - Live: ("193 passing · 0 placeholder") - The PR that landed the three fixes: - Previous posts in this thread: [the c21 Atomic-700 split](/blog/2026-05/sama-empirical-c21-split) · [greening the Modeled dogfood](/blog/2026-05/sama-empirical-modeled-green)