# When the deploy lies: three bugs hidden by one silent error suppressor

The two prior posts in this thread were clean rounds: the verifier
named a violation, I produced the named artifact, the verifier flipped
green. Atomic-700 on `c21_app.ts` → split per domain → ✓. Modeled on
four `c32_*.ts` files → add the four siblings → ✓. Encouraging stories
about mechanical enforcement.

This post is the messy round. It's the one that taught me that
mechanical enforcement only works if the pipeline that runs it is
itself running.

## The visible bug

`/reports/live` is the public live-data demo: real commit history for
this repo, rendered into a TDD-discipline scorecard, refreshed on every
deploy. On 2026-05-22 the header read:

```
tdd-discipline report · 2026-05-03 → 2026-05-10
```

Twelve days of staleness on a page that calls itself "live." I'd
shipped seven commits across the previous rounds and none of them
appeared.

## Why nobody noticed for 12 days

The deploy script in git-mode invoked the snapshot generator over ssh:

```bash
ssh "$SSH_HOST" "cd ~/$REMOTE_SRC_DIR && bun scripts/p620/snapshot-git-history.ts" 2>/dev/null \
    || echo "  ⚠ snapshot-git-history skipped (script may live outside the rsync exclude — non-fatal)"
```

Two clauses are doing the damage:

- `2>/dev/null` discards stderr — including the error message we'd want.
- `|| echo "  ⚠ ... non-fatal"` turns a real failure into a printed
  warning. Worse, the warning text *blames the wrong thing*
  ("script may live outside the rsync exclude") so anyone who DID see
  the warning would file it under "harmless artifact of rsync vs git
  mode" and move on.

The actual failure: there's no `bun` on the p620 host. Bun lives only
inside the tdd-md container image. The ssh tried to invoke a binary
that doesn't exist on PATH; the shell returned 127; the warning fired;
the deploy continued; the snapshot file's timestamp stayed at May 11.

Twelve days. Every deploy. Both of the previous "clean rounds" deployed
through this same broken path and updated the *site* but not the
*live data*. The blog posts about going green were themselves served by
a deploy script that was lying about its own snapshot step.

## Fix 1, and what it revealed

The fix is structurally trivial: run the script *inside* the container
where bun lives, by mounting the working tree as a volume:

```bash
ssh "$SSH_HOST" "podman run --rm \
  -v \$HOME/$REMOTE_SRC_DIR:/work:Z \
  --workdir /work \
  $IMAGE_TAG \
  bun scripts/p620/snapshot-git-history.ts" \
    || { echo '✗ snapshot-git-history failed'; exit 1; }
```

The `:Z` is the Fedora SELinux relabel — the script process inside
needs to be able to read/write the bind mount. The `||
{ echo ✗; exit 1 }` replaces the swallow with a real failure mode. No
more silent skips.

After this fix landed, `/reports/live` immediately caught up:

```
tdd-discipline report · 2026-05-03 → 2026-05-22
```

So far so good. But the moment I looked at `/reports/live/tests`, the
sibling test-stability page, the timestamp said:

```
last run 2026-05-10 · 17 runs cumulative
```

Same staleness. Different cause.

## The second silent failure

Looking at the deploy script again, the **rsync** escape hatch runs
both snapshot scripts:

```bash
( cd "$REPO_ROOT" && bun scripts/p620/snapshot-git-history.ts ) || ...
( cd "$REPO_ROOT" && bun scripts/p620/snapshot-tests.ts )         || ...
```

The **git-mode** happy path runs only the first one. When the deploy
flow switched from rsync to git as the default a while back, the
test-snapshot step got dropped on the floor and nobody noticed —
because the test-stability page was always 17 cumulative runs old, and
"old enough that nobody questioned the number" is one of the failure
modes that a verifier can't detect.

Fix 2: add the second podman-run step, with one wrinkle. Unlike
`snapshot-git-history` (which is pure git + filesystem), `snapshot-tests`
calls `bun test`, which needs `node_modules` to resolve `marked` and
`node-html-parser`. The bind-mounted host directory has no
`node_modules` (the host has no Bun). But the image already ships
them at `/app/node_modules`. So:

```bash
podman run --rm -v $HOME/src/tdd.md:/work:Z --workdir /work $IMAGE_TAG \
  sh -c 'ln -sfn /app/node_modules node_modules && bun scripts/p620/snapshot-tests.ts'
```

Symlink the container's `node_modules` into the work directory, then
let the script use it. The symlink persists on the host between
deploys but points at a path inside the container — harmless dead-link
outside the next podman-run, valid inside.

## Two more bugs, surfaced by the snapshot actually running

When the next deploy ran with both snapshots wired in, the live page
now read:

```
Total: 193 tests · 192 passing · 1 failing · 1 placeholder ⚠
```

193 pass locally, every time I run them. 192 pass + 1 fail + 1
placeholder on the container. Two bugs that had been hiding behind
"the test suite never actually ran in the deploy pipeline."

### Bug A: a 1-in-16 flaky test

The failing test was one I wrote in the prior round:

```ts
test("verifySession rejects a cookie with a forged signature", async () => {
  const cookie = await signSession("eve");
  const tampered = cookie.replace(/.$/, "0");
  const result = await verifySession(tampered);
  expect(result).toBeNull();
});
```

`replace(/.$/, "0")` replaces the last character with "0". When the
HMAC signature's last hex digit *is already* "0" — which happens with
probability 1/16, since SHA-256 hex output is uniform — the
"tampered" string is identical to the original, the signature
verifies, the function returns `"eve"`, and the assertion fails.

Local runs masked this because the random draws (the timestamp going
into the signed payload) happened to never produce a `0`-ending sig.
The first run that actually ran in CI hit the unlucky draw and
exposed it.

Fix: read the last char, flip to a digit it definitely isn't:

```ts
const lastChar = cookie.slice(-1);
const tampered = cookie.slice(0, -1) + (lastChar === "f" ? "0" : "f");
expect(tampered).not.toBe(cookie);  // loudly fail if a future regression collides
```

Five runs in a row, every one passes. Determinism restored.

### Bug B: the verifier's own test, flagged by its own check

The placeholder warning pointed at:

```
src/c32_sama_verify.test.ts > does nothing
```

`c32_sama_verify.ts` is the verifier itself. Its test file holds a
fixture:

```ts
test("Atomic: placeholder test (zero expect calls) is flagged", () => {
  const placeholderFixture = `test("does nothing", () => { /* TODO */ })`;
  // ... feed it to the verifier, assert the verifier flags it
});
```

The string `test("does nothing", () => { /* TODO */ })` is a *fixture*
— a literal example of what a placeholder test looks like, fed to the
verifier so we can assert the verifier catches it. It's not a real
test.

The verifier itself handles this correctly. It uses a
`stripStringsAndComments` helper to mask out string literals before
running its `test()`-finder regex over the source. So when the
verifier scans `c32_sama_verify.test.ts`, it sees the fixture as
whitespace, doesn't pick it up, and reports zero placeholders in that
file.

But `snapshot-tests.ts` — the deploy-time generator that feeds
`/reports/live/tests` — duplicated the regex *without* the
strip-strings step. So it grepped the raw source, found the fixture
inside the backtick string, treated it as a real `test()` call, walked
its (TODO-only) body, counted zero `expect()` calls, and flagged it.

The deploy-time detector was flagging the very test that proves the
runtime detector works.

Fix: export `stripStringsAndComments` from `c32_sama_verify.ts` and
use the same mask-index pattern in the snapshot script:

```ts
import { stripStringsAndComments } from "../../src/c32_sama_verify.ts";
// ...
const mask = stripStringsAndComments(content);
while ((m = re.exec(content)) !== null) {
  // If the match position is whitespace in the mask, the original
  // was inside a string or comment — skip.
  if (mask[m.index] === " " || mask[m.index] === "\n") continue;
  // ... rest of the body-walking logic
}
```

DRYing the helper across the two places that need the same string-aware
behaviour. Now the snapshot agrees with the verifier.

## What the cascade was actually telling me

The bug count for ronde 4 looks bad: a 12-day staleness, a flaky test,
a false-positive in the deploy-time detector. Three independent
problems.

But the *order* is the part worth looking at. Each fix made the next
one visible:

1. Deploy script ran the snapshot step → file's timestamp moved →
   `/reports/live` started reporting current commits.
2. Deploy script ran the test snapshot → tests actually ran in the
   deploy pipeline → the flaky test surfaced (because previously it
   never ran in CI), and the false-positive surfaced (because
   previously the snapshot was 12 days old and that particular
   fixture had been added since then).
3. Each fix's success was the precondition for the next bug to be
   visible.

The cascade isn't proof the system is fragile. It's proof that the
system was *blind* — a layer of silent error suppression had hidden
every downstream failure, so they accumulated without being detected.
The fix was less "patch three things" than "remove the lie and watch
what falls out."

This is the same shape as TDD's iron rule applied to *infrastructure*
rather than to source: you can't trust a pass you didn't run. The
deploy-pipeline checks `bun test` exits zero — but only if `bun test`
*ran*. If the call returns 127 (command not found) and the deploy
script swallows it, every later assertion is hollow.

`/reports/live` showing all-green for 12 days was perfectly compatible
with the test suite being completely broken. The only way to know is
to delete the swallowing.

## Why this is the empirical case for SAMA, not against it

A naive reading is "the codebase had three bugs you didn't catch."
The fairer reading is: the codebase had *one* bug — silent error
suppression in a deploy script — and the other two were latent
consequences that the verifier *would have* caught the moment they
ran. Removing the silence took ~15 minutes. Once silence was gone, both
hidden bugs surfaced *on the very next deploy*, with line numbers and
file paths, in two cells of a public web page.

That's the empirical pattern SAMA's pitch turns on, scaled to the
infrastructure layer:

- **Verification has to be observable.** A check that runs into
  `2>/dev/null` is indistinguishable from a check that passes.
- **The cost of removing silence is low.** A `||` swallow → `||
  { echo ✗; exit 1; }` is a one-line change. A `2>/dev/null` →
  `2>&1` is one word.
- **Removing silence pays compounding returns.** Three bugs hidden by
  one suppressor — each one would have been instantly diagnosable if
  the surface had been honest.

## What this still doesn't prove

It doesn't prove that exposing every failure produces a useful signal.
Some failures *should* be tolerated (best-effort cleanup, optional
caches), and over-strict failure handling can break production for
trivial reasons. The judgement is *which* failures: in this case,
`snapshot-git-history` running was load-bearing for the public claim
that `/reports/live` reflects the current repo. Treating its failure
as "non-fatal" was a category error.

The general principle the cascade demonstrates: in a system whose value
proposition is *the artefacts a reviewer can replay*, the pipeline
that produces those artefacts has the same audit requirements as the
source code does. Silent failures in the pipeline are violations of
the standard the same way silent failures in the source would be.

---

**See for yourself:**

- Live: <https://tdd.md/reports/live> (date window is now current)
- Live: <https://tdd.md/reports/live/tests> ("193 passing · 0 placeholder")
- The PR that landed the three fixes:
  <https://github.com/syntaxai/tdd.md/pull/14>
- Previous posts in this thread:
  [the c21 Atomic-700 split](/blog/2026-05/sama-empirical-c21-split) ·
  [greening the Modeled dogfood](/blog/2026-05/sama-empirical-modeled-green)