a629228a8bb97895aeb744c21929e416fd2dbb02 diff --git a/content/blog/sama-v2-verifier-second-opinion-gap.md b/content/blog/sama-v2-verifier-second-opinion-gap.md new file mode 100644 index 0000000000000000000000000000000000000000..0d38b23cc368f1d748290c2fa837b250841aa533 --- /dev/null +++ b/content/blog/sama-v2-verifier-second-opinion-gap.md @@ -0,0 +1,120 @@ +# The verifier has no second opinion + +Every load-bearing claim on this site has an independent oracle that confirms it. + +- The §5 workingSetFit measurements are pinned to external repos at specific SHAs — anyone can clone `BurntSushi/ripgrep@4519153e` and recount the files. +- The URL refactor wall-clock measurements are timestamps in git history — `git log --format=%ct` is the second opinion. +- The /goal contracts are in `goals/` AND in PR bodies AND in conversation transcripts — three redundant captures. +- The deploy succeeded? `curl https://tdd.md/healthz` is the oracle, independent of the deploy script. +- The sitemap is correct? Compare it to `ALL_POSTS` in the registry — two different views of the same data. +- Every blog post claim links back to its driving /goal AND its merge commit. + +The chain holds. Every other artifact passes its own audit. There's one exception, and it's at the heart of the entire structural claim: + +**`/sama/v2/verify` reports `7 / 7 ✓`. The only oracle that confirms it is the program that emitted it.** + +![Every claim has an oracle, except the verifier's](/images/verifier-no-oracle-gap.png?v=1) + +## The §0 fine print + +The SAMA v2 spec at /sama/v2 §0 says: + +> *"The verifier is a deterministic program; that claim is only auditable if a human can reproduce it from the data."* + +Reread that closely. A human reproducing the verdict means running [`src/b32_sama_v2_verify.ts`](/GIT/tdd.md/blob/main/src/b32_sama_v2_verify.ts) on the same source tree. The program is in git, the source tree is in git, both are deterministic — so a human gets the same `7/7 ✓` answer. That's **reproducibility**. + +Reproducibility is not the same as **independent validation**. A buggy verifier that was specifically written to pass the codebase it was designed against would emit `7/7 ✓` deterministically forever. Every human who ran it would reproduce that result. The verdict would be reproducible *and wrong* at the same time. + +The site's entire empirical claim rests on the verifier being right. Not just deterministic — *right*. And "right" means *agreement with an independent reading of the spec*. There has been no independent reading. There has been one TypeScript program, written by the same person who wrote the spec it verifies, run against the same codebase it was designed for. The chain has its final link unsecured. + +## The concrete demonstration this week + +Tonight, a draft for a second verifier landed at the repo root: [`cli.md`](/GIT/tdd.md/blob/main/cli.md). A shell-native SAMA v2 verifier sketched in three phases of an email thread, ending with a "100% SAMA v2 compliant" file structure: + +``` +src/ +├── a0_main.sh # Layer 3 - Entry ← wrong +├── b1_checks.sh # Layer 1 - Core +├── b2_graph.sh # Layer 2 - Adapter +├── c1_utils.sh # Layer 1 - Core ← wrong +├── c2_constants.sh # Layer 0 - Pure ← wrong +``` + +The mapping is **backwards**. This repo's canonical convention is `a*_` = Layer 0, `b*_` = Layer 1, `c*_` = Layer 2, `d*_` = Layer 3 — the SAMA §1.1 layer order matches lex-sort. Under the cli.md mapping, lex-sort gives `a0, b1, b2, c1, c2` with layer order `3, 1, 2, 1, 0`. That's **not sorted at all** — it would fail §4.1 of its own checks. + +The person who drafted the email knew the spec, sees this codebase every day, and still got the prefix-to-layer mapping inverted. Not as a typo — as a confident description of "100% SAMA v2 compliant" structure. The spec is hard to read correctly *even by someone who wrote it*. + +If the spec is this easy to misread, what would catch a similar misreading in the TS verifier? Only a second independent implementation that reads the spec and disagrees. That's the missing oracle. + +## The fix + +![Two verifiers, one spec, one verdict — §6 evolution mechanism in action](/images/verifier-two-implementations.png?v=1) + +Build a second verifier in a fundamentally different language, on different runtime primitives, then make them agree. + +- **Different language**: TypeScript vs POSIX shell. No shared parser, no shared regex library, no shared filesystem API. +- **Different runtime**: Bun's JavaScript engine vs `bash` + `find` + `grep` + `awk`. +- **Different primitives**: `Bun.file` + `Glob` vs `find -type f -name`. Both read the same bytes; both interpret them through completely separate code paths. +- **Same spec read independently**: each implementer reads /sama/v2 prose alone, writes their checks, then they're cross-verified. + +The agreement mechanism is one shell script — call it `cross-verify.sh`: + +```bash +ts_verdict=$(bun run src/b32_sama_v2_verify.ts) +shell_verdict=$(tools/sama-cli/sama check) +if [ "$ts_verdict" = "$shell_verdict" ]; then + echo "empirical 7/7 ✓ — two implementations agree" + exit 0 +else + echo "spec pressure point: implementations disagree" + diff <(echo "$ts_verdict") <(echo "$shell_verdict") + exit 1 +fi +``` + +When both agree on `7/7 ✓`, the verdict is empirical. When they disagree on a specific check, the §6 evolution-policy machinery activates: the disagreement is **the spec's pressure point** — the place where the prose admits multiple readings, and the spec has to be either resolved or amended. + +This is exactly the empirical-chain pattern the rest of the site is built around. /blog/2026-05/sama-v2-workingset-cross-repo-baseline turned `workingSetFit` from "one number for one repo" into "eight numbers across eight repos, all from the same emitter." Going from N=1 to N=8 *measured* turns a property claim into a data claim. The same shape applies to verifier verdicts: N=1 implementation is a program; N=2 independent implementations producing the same verdict is data. + +## Why this is a SAMA v2 self-violation (and how) + +This post parallels two prior drama posts: + +- [/blog/2026-05/sama-v2-goal-chain-gap](/blog/2026-05/sama-v2-goal-chain-gap) said: every artifact is in git, except the /goal. Now the /goal is in git. +- [/blog/2026-05/sama-v2-on-ramp-gap](/blog/2026-05/sama-v2-on-ramp-gap) said: every artifact has a URL, except the on-ramp. Now there's a `CONTRIBUTING.md` at `/contributing`. + +This post says: **every claim has an oracle, except the verifier's verdict itself.** The fix is a second oracle. Same structural shape as the previous two — find a load-bearing artifact that's missing, build it under SAMA v2 discipline, watch the chain ratchet. + +The pattern that emerges across the three: + +| drama post | missing artifact | fix | +|---|---|---| +| goal-chain-gap | the /goal contract that drove each PR | `goals/.md` archive + workflow lock-in | +| on-ramp-gap | the on-ramp document for new contributors | `CONTRIBUTING.md` + `/contributing` route | +| verifier-second-opinion-gap | the independent oracle for `/sama/v2/verify` | `tools/sama-cli/` shell verifier + `cross-verify.sh` | + +Three load-bearing audits, three independent fixes, all under the same discipline. The thing that makes SAMA v2 self-coherent is exactly this: when an audit surfaces a gap in *the discipline itself*, the discipline absorbs the gap as a new artifact, mechanically. Not philosophically — by writing a file in a specific layer with a specific name and a specific sibling test. + +## What lands when the second verifier ships + +The `/goal` for this work is already on-site as a pending entry: [/goals/sama-cli-shell-verifier](/goals/sama-cli-shell-verifier). When it fires: + +- `tools/sama-cli/` directory exists with the canonical layer mapping (a=Pure, b=Core, c=Adapter, d=Entry — explicitly correcting the cli.md mistake). +- Each of the seven §4 checks implemented twice — once in TS (existing), once in shell (new) — reading the same spec prose. +- `cross-verify.sh` runs both, asserts identical verdicts. CI fails if they disagree. +- Self-conformance: `tools/sama-cli/sama check` against `tools/sama-cli/src/` returns `7/7 ✓`. The shell verifier verifies itself under the same rules. +- /sama/v2/verify still reports `7/7 ✓` — same number, but now it's `7/7 ✓ × 2`, agreed-upon by two implementations. + +The blog post that follows the /goal's merge documents which checks the two verifiers agreed on byte-for-byte versus which required the spec prose to disambiguate. That's the load-bearing data — not "they both said 7/7," but "here are the specific places where the spec was ambiguous enough that two careful readers got different answers, and here's how the prose resolved each." + +## The next empirical knowledge + +After both verifiers ship and agree on this codebase, the next falsifiable claim is straightforward: + +> *"The two-verifier agreement holds across the §5 cross-repo measurement corpus. Each of the eight external repos (`ripgrep`, `dive`, `bat`, `fd`, `eza`, `lazygit`, `cli/gh`, `WordPress Open Graph plugin`) produces an identical multi-check verdict from both verifiers."* + +That's eight more datapoints. If they all agree, the spec is genuinely reproducible from prose alone. If even one repo causes a disagreement, the spec has an ambiguity that's now *located* — and §6 evolution-policy says: resolve it in the spec, update both verifiers, re-run. Each disagreement is one bit of structural learning about where the spec is fragile. + +The TS verifier has been telling us "this codebase scores 7/7 ✓" for forty PRs. After PR #58 fires and the shell verifier lands, that claim becomes "two independent implementations of the spec, in different languages, on different runtimes, both read it as 7/7 ✓." Same number; entirely different epistemic status. + +The chain ratchets one final time. The verifier finally has its second opinion. diff --git a/public/images/verifier-no-oracle-gap.png b/public/images/verifier-no-oracle-gap.png new file mode 100644 index 0000000000000000000000000000000000000000..f2482c1a24c17e4cee67f3f2e9157af50db7cb00 Binary files /dev/null and b/public/images/verifier-no-oracle-gap.png differ diff --git a/public/images/verifier-no-oracle-gap.svg b/public/images/verifier-no-oracle-gap.svg new file mode 100644 index 0000000000000000000000000000000000000000..b3678025a0ddcee1eb90ae81da7d08b4b2bcd5a9 --- /dev/null +++ b/public/images/verifier-no-oracle-gap.svg @@ -0,0 +1,73 @@ + + + + + + The empirical chain — every claim has a second opinion, except one + Every claim has an oracle. Except the verifier's. + /sama/v2/verify reports 7/7 ✓ — and the only oracle that confirms it is the program that emitted it. + + + + + EMPIRICAL CLAIM + INDEPENDENT ORACLE + VERDICT + + + + + + + Source code correctness + CI tests (independent of impl) + ✓ has oracle + + §5 workingSetFit (n=8 cross-repo) + external repos, pinned SHAs + ✓ has oracle + + URL refactor wall-clock cost + timestamps in git history + ✓ has oracle + + Blog post claims + /goal contract + commit history + ✓ has oracle + + /goal contract authenticity + PR body + goals/ verbatim + ✓ has oracle + + Deploy actually shipped + curl on live URL + ✓ has oracle + + Sitemap correctness + registry comparison + ✓ has oracle + + Frontmatter parsing + sibling test fixtures + ✓ has oracle + + + + /sama/v2/verify says "7/7 ✓" + — only the program itself — + ✗ NO ORACLE + + + + + + The self-violation: + §0 says "the verifier is a deterministic program; that claim is only auditable if a human can reproduce it from the data." + Yes — by running the same program. That's reproducibility, not independent validation. A buggy verifier that's biased toward + passing the codebase it was written against would still emit 7/7 ✓ deterministically. Forty PRs preaching auditability — and + the gate at the heart of every merge has had exactly one implementation reading it. + + + + https://tdd.md + diff --git a/public/images/verifier-two-implementations.png b/public/images/verifier-two-implementations.png new file mode 100644 index 0000000000000000000000000000000000000000..7790662a501751b6e335ac5dd344beb9bac098fa Binary files /dev/null and b/public/images/verifier-two-implementations.png differ diff --git a/public/images/verifier-two-implementations.svg b/public/images/verifier-two-implementations.svg new file mode 100644 index 0000000000000000000000000000000000000000..e629d29c83c9b489600a541891b8dbc1ac98541c --- /dev/null +++ b/public/images/verifier-two-implementations.svg @@ -0,0 +1,95 @@ + + + + + + The fix — two independent implementations of the same spec + If both agree on 7/7 ✓, the verdict is empirical. + Different language, different runtime, same spec read independently. Disagreement is the spec's pressure point. + + + + + + + + TS verifier (existing) + — the current canonical — + + FILE + src/b32_sama_v2_verify.ts + + RUNTIME + Bun (TypeScript) + + PRIMITIVES + Bun.file · Glob · readdir + + SURFACE + /sama/v2/verify (live) + + VERDICT + 7 / 7 ✓ + + + + cross-verify.sh + — the empirical gate — + + INPUT + both verdicts + + CHECK + byte-for-byte equality + + IF AGREE + → empirical 7/7 ✓ + + IF DISAGREE + → §6 pressure point + resolve via spec prose + + EXIT + 0 = agree · 1 = disagree + + + + Shell verifier (NEW) + — the independent oracle — + + FILE + tools/sama-cli/sama check + + RUNTIME + POSIX shell (bash) + + PRIMITIVES + find · grep · awk · wc + + SURFACE + CLI + cross-verify hook + + VERDICT + 7 / 7 ✓ + + + + + + + + + + + + + + Falsifiable claim — the §6 evolution mechanism in action: + "Two independent implementations of the SAMA v2 §4 spec, in different languages on different runtimes, will produce + identical verdicts against any spec-conforming codebase. If they disagree on this repo's 7/7 ✓, one is wrong — and per §0 + the disagreement is resolvable from the spec prose alone. Disagreements ARE the spec's pressure points." + + + + https://tdd.md + diff --git a/src/a31_blog.ts b/src/a31_blog.ts index 0a69cff0b55dd2e6dd3928c06d9f77f298509af5..6adfc6a6e310367109011bd4330d894fbcd8673c 100644 --- a/src/a31_blog.ts +++ b/src/a31_blog.ts @@ -12,6 +12,12 @@ export interface BlogEntry { } export const ALL_POSTS: BlogEntry[] = [ + { + slug: "sama-v2-verifier-second-opinion-gap", + title: "The verifier has no second opinion", + description: "Every load-bearing claim on tdd.md has an independent oracle that confirms it. §5 workingSetFit numbers are pinned to external repos at specific SHAs (anyone can clone and recount). URL refactor wall-clock measurements are timestamps in git history. /goal contracts live in goals/ AND PR bodies AND conversation transcripts. The deploy succeeded? curl on /healthz is the oracle, independent of the deploy script. The sitemap is correct? Compare it to ALL_POSTS. Every blog post claim links back to its driving /goal and merge commit. The chain holds — every artifact passes its own audit. There's one exception, and it's at the heart of the entire structural claim: /sama/v2/verify reports 7/7 ✓, and the only oracle that confirms it is the program that emitted it. The §0 spec says 'the verifier is a deterministic program; that claim is only auditable if a human can reproduce it from the data' — but reproducibility means running the same program; that's not independent validation. A buggy verifier biased toward passing the codebase it was written against would emit 7/7 ✓ deterministically forever. This week the gap got a concrete demonstration: a draft of a second verifier landed at the repo root (cli.md), and its 'SAMA v2 compliant' file structure had the prefix-to-layer mapping inverted (a=Layer 3, c=Layer 0/1 — backwards from the canonical a=Pure, b=Core, c=Adapter, d=Entry). Someone who reads the spec daily still got the structure wrong. If the spec is this easy to misread, what catches a similar misreading in the TS verifier? Only a second independent implementation that disagrees. The fix: build the shell verifier proposed in cli.md (with the layer mapping corrected) at tools/sama-cli/. Different language (POSIX shell vs TS), different runtime (bash vs Bun), different primitives (find/grep/awk vs Bun.file/Glob), same spec read independently. A cross-verify.sh script runs both and asserts identical verdicts — agreement is empirical 7/7 ✓; disagreement is a §6 spec-prose pressure point. Third drama post in the structural-self-audit series: chain-gap (every artifact in git except the /goal — fixed), on-ramp-gap (every artifact has a URL except the on-ramp — fixed), verifier-second-opinion-gap (every claim has an oracle except the verifier's — /goal /goals/sama-cli-shell-verifier proposed). Falsifiable next: two-verifier agreement holds across the n=8 §5 cross-repo measurement corpus. If even one repo causes disagreement, the spec has a located ambiguity; §6 evolution-policy resolves it in the prose.", + date: "2026-05-25", + }, { slug: "sama-v2-portability-boundary-found", title: "21 minutes 23 seconds — the portability boundary is empirically located",