syntaxai/tdd.md · main · content / blog / deploy-that-lies-cascade.md

deploy-that-lies-cascade.md 311 lines · 12107 bytes raw · source

When the deploy lies: three bugs hidden by one silent error suppressor

The two prior posts in this thread were clean rounds: the verifier named a violation, I produced the named artifact, the verifier flipped green. Atomic-700 on c21_app.ts → split per domain → ✓. Modeled on four c32_*.ts files → add the four siblings → ✓. Encouraging stories about mechanical enforcement.

This post is the messy round. It's the one that taught me that mechanical enforcement only works if the pipeline that runs it is itself running.

The visible bug

/reports/live is the public live-data demo: real commit history for this repo, rendered into a TDD-discipline scorecard, refreshed on every deploy. On 2026-05-22 the header read:

tdd-discipline report · 2026-05-03 → 2026-05-10

Twelve days of staleness on a page that calls itself "live." I'd shipped seven commits across the previous rounds and none of them appeared.

Why nobody noticed for 12 days

The deploy script in git-mode invoked the snapshot generator over ssh:

ssh "$SSH_HOST" "cd ~/$REMOTE_SRC_DIR && bun scripts/p620/snapshot-git-history.ts" 2>/dev/null \
    || echo "  ⚠ snapshot-git-history skipped (script may live outside the rsync exclude — non-fatal)"

Two clauses are doing the damage:

  • 2>/dev/null discards stderr — including the error message we'd want.
  • || echo " ⚠ ... non-fatal" turns a real failure into a printed warning. Worse, the warning text blames the wrong thing ("script may live outside the rsync exclude") so anyone who DID see the warning would file it under "harmless artifact of rsync vs git mode" and move on.

The actual failure: there's no bun on the p620 host. Bun lives only inside the tdd-md container image. The ssh tried to invoke a binary that doesn't exist on PATH; the shell returned 127; the warning fired; the deploy continued; the snapshot file's timestamp stayed at May 11.

Twelve days. Every deploy. Both of the previous "clean rounds" deployed through this same broken path and updated the site but not the live data. The blog posts about going green were themselves served by a deploy script that was lying about its own snapshot step.

Fix 1, and what it revealed

The fix is structurally trivial: run the script inside the container where bun lives, by mounting the working tree as a volume:

ssh "$SSH_HOST" "podman run --rm \
  -v \$HOME/$REMOTE_SRC_DIR:/work:Z \
  --workdir /work \
  $IMAGE_TAG \
  bun scripts/p620/snapshot-git-history.ts" \
    || { echo '✗ snapshot-git-history failed'; exit 1; }

The :Z is the Fedora SELinux relabel — the script process inside needs to be able to read/write the bind mount. The || { echo ✗; exit 1 } replaces the swallow with a real failure mode. No more silent skips.

After this fix landed, /reports/live immediately caught up:

tdd-discipline report · 2026-05-03 → 2026-05-22

So far so good. But the moment I looked at /reports/live/tests, the sibling test-stability page, the timestamp said:

last run 2026-05-10 · 17 runs cumulative

Same staleness. Different cause.

The second silent failure

Looking at the deploy script again, the rsync escape hatch runs both snapshot scripts:

( cd "$REPO_ROOT" && bun scripts/p620/snapshot-git-history.ts ) || ...
( cd "$REPO_ROOT" && bun scripts/p620/snapshot-tests.ts )         || ...

The git-mode happy path runs only the first one. When the deploy flow switched from rsync to git as the default a while back, the test-snapshot step got dropped on the floor and nobody noticed — because the test-stability page was always 17 cumulative runs old, and "old enough that nobody questioned the number" is one of the failure modes that a verifier can't detect.

Fix 2: add the second podman-run step, with one wrinkle. Unlike snapshot-git-history (which is pure git + filesystem), snapshot-tests calls bun test, which needs node_modules to resolve marked and node-html-parser. The bind-mounted host directory has no node_modules (the host has no Bun). But the image already ships them at /app/node_modules. So:

podman run --rm -v $HOME/src/tdd.md:/work:Z --workdir /work $IMAGE_TAG \
  sh -c 'ln -sfn /app/node_modules node_modules && bun scripts/p620/snapshot-tests.ts'

Symlink the container's node_modules into the work directory, then let the script use it. The symlink persists on the host between deploys but points at a path inside the container — harmless dead-link outside the next podman-run, valid inside.

Two more bugs, surfaced by the snapshot actually running

When the next deploy ran with both snapshots wired in, the live page now read:

Total: 193 tests · 192 passing · 1 failing · 1 placeholder ⚠

193 pass locally, every time I run them. 192 pass + 1 fail + 1 placeholder on the container. Two bugs that had been hiding behind "the test suite never actually ran in the deploy pipeline."

Bug A: a 1-in-16 flaky test

The failing test was one I wrote in the prior round:

test("verifySession rejects a cookie with a forged signature", async () => {
  const cookie = await signSession("eve");
  const tampered = cookie.replace(/.$/, "0");
  const result = await verifySession(tampered);
  expect(result).toBeNull();
});

replace(/.$/, "0") replaces the last character with "0". When the HMAC signature's last hex digit is already "0" — which happens with probability 1/16, since SHA-256 hex output is uniform — the "tampered" string is identical to the original, the signature verifies, the function returns "eve", and the assertion fails.

Local runs masked this because the random draws (the timestamp going into the signed payload) happened to never produce a 0-ending sig. The first run that actually ran in CI hit the unlucky draw and exposed it.

Fix: read the last char, flip to a digit it definitely isn't:

const lastChar = cookie.slice(-1);
const tampered = cookie.slice(0, -1) + (lastChar === "f" ? "0" : "f");
expect(tampered).not.toBe(cookie);  // loudly fail if a future regression collides

Five runs in a row, every one passes. Determinism restored.

Bug B: the verifier's own test, flagged by its own check

The placeholder warning pointed at:

src/c32_sama_verify.test.ts > does nothing

c32_sama_verify.ts is the verifier itself. Its test file holds a fixture:

test("Atomic: placeholder test (zero expect calls) is flagged", () => {
  const placeholderFixture = `test("does nothing", () => { /* TODO */ })`;
  // ... feed it to the verifier, assert the verifier flags it
});

The string test("does nothing", () => { /* TODO */ }) is a fixture — a literal example of what a placeholder test looks like, fed to the verifier so we can assert the verifier catches it. It's not a real test.

The verifier itself handles this correctly. It uses a stripStringsAndComments helper to mask out string literals before running its test()-finder regex over the source. So when the verifier scans c32_sama_verify.test.ts, it sees the fixture as whitespace, doesn't pick it up, and reports zero placeholders in that file.

But snapshot-tests.ts — the deploy-time generator that feeds /reports/live/tests — duplicated the regex without the strip-strings step. So it grepped the raw source, found the fixture inside the backtick string, treated it as a real test() call, walked its (TODO-only) body, counted zero expect() calls, and flagged it.

The deploy-time detector was flagging the very test that proves the runtime detector works.

Fix: export stripStringsAndComments from c32_sama_verify.ts and use the same mask-index pattern in the snapshot script:

import { stripStringsAndComments } from "../../src/c32_sama_verify.ts";
// ...
const mask = stripStringsAndComments(content);
while ((m = re.exec(content)) !== null) {
  // If the match position is whitespace in the mask, the original
  // was inside a string or comment — skip.
  if (mask[m.index] === " " || mask[m.index] === "\n") continue;
  // ... rest of the body-walking logic
}

DRYing the helper across the two places that need the same string-aware behaviour. Now the snapshot agrees with the verifier.

What the cascade was actually telling me

The bug count for ronde 4 looks bad: a 12-day staleness, a flaky test, a false-positive in the deploy-time detector. Three independent problems.

But the order is the part worth looking at. Each fix made the next one visible:

  1. Deploy script ran the snapshot step → file's timestamp moved → /reports/live started reporting current commits.
  2. Deploy script ran the test snapshot → tests actually ran in the deploy pipeline → the flaky test surfaced (because previously it never ran in CI), and the false-positive surfaced (because previously the snapshot was 12 days old and that particular fixture had been added since then).
  3. Each fix's success was the precondition for the next bug to be visible.

The cascade isn't proof the system is fragile. It's proof that the system was blind — a layer of silent error suppression had hidden every downstream failure, so they accumulated without being detected. The fix was less "patch three things" than "remove the lie and watch what falls out."

This is the same shape as TDD's iron rule applied to infrastructure rather than to source: you can't trust a pass you didn't run. The deploy-pipeline checks bun test exits zero — but only if bun test ran. If the call returns 127 (command not found) and the deploy script swallows it, every later assertion is hollow.

/reports/live showing all-green for 12 days was perfectly compatible with the test suite being completely broken. The only way to know is to delete the swallowing.

Why this is the empirical case for SAMA, not against it

A naive reading is "the codebase had three bugs you didn't catch." The fairer reading is: the codebase had one bug — silent error suppression in a deploy script — and the other two were latent consequences that the verifier would have caught the moment they ran. Removing the silence took ~15 minutes. Once silence was gone, both hidden bugs surfaced on the very next deploy, with line numbers and file paths, in two cells of a public web page.

That's the empirical pattern SAMA's pitch turns on, scaled to the infrastructure layer:

  • Verification has to be observable. A check that runs into 2>/dev/null is indistinguishable from a check that passes.
  • The cost of removing silence is low. A || swallow → || { echo ✗; exit 1; } is a one-line change. A 2>/dev/null2>&1 is one word.
  • Removing silence pays compounding returns. Three bugs hidden by one suppressor — each one would have been instantly diagnosable if the surface had been honest.

What this still doesn't prove

It doesn't prove that exposing every failure produces a useful signal. Some failures should be tolerated (best-effort cleanup, optional caches), and over-strict failure handling can break production for trivial reasons. The judgement is which failures: in this case, snapshot-git-history running was load-bearing for the public claim that /reports/live reflects the current repo. Treating its failure as "non-fatal" was a category error.

The general principle the cascade demonstrates: in a system whose value proposition is the artefacts a reviewer can replay, the pipeline that produces those artefacts has the same audit requirements as the source code does. Silent failures in the pipeline are violations of the standard the same way silent failures in the source would be.


See for yourself: