syntaxai/tdd.md · commit a629228

Blog post: the verifier has no second opinion (third drama in the self-audit series)

Third drama post parallel to chain-gap + on-ramp-gap. Argues that /sama/v2/verify's 7/7 ✓ verdict has no independent oracle — only the program that emitted it. §0 reproducibility (anyone can run it) is not the same as independent validation (a second implementation agrees).

Concrete demonstration: this week's cli.md draft at the repo root had the layer-prefix mapping inverted (a=Layer 3, c=Layer 0/1) while claiming '100% SAMA v2 compliant'. Spec is hard to read correctly even by someone who reads it daily. The only thing that catches such misreadings in the TS verifier is a second independent implementation reading the same spec independently.

Two watermarked images: artifact-vs-oracle table (with the dramatic 'NO ORACLE' row), and a three-column TS-verifier-cross-verify-shell-verifier diagram showing how byte-for-byte agreement becomes the empirical gate.

Plants the next-step bridge: the /goal at /goals/sama-cli-shell-verifier is already on-site as pending; this drama-post argues why firing it matters.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
author
syntaxai <[email protected]>
date
2026-05-26 06:36:16 +01:00
parent
ad637d0
commit
a629228a8bb97895aeb744c21929e416fd2dbb02

6 files changed · +294 −0

added content/blog/sama-v2-verifier-second-opinion-gap.md +120 −0
@@ -0,0 +1,120 @@
1+# The verifier has no second opinion
2+
3+Every load-bearing claim on this site has an independent oracle that confirms it.
4+
5+- The §5 workingSetFit measurements are pinned to external repos at specific SHAs — anyone can clone `BurntSushi/ripgrep@4519153e` and recount the files.
6+- The URL refactor wall-clock measurements are timestamps in git history — `git log --format=%ct` is the second opinion.
7+- The /goal contracts are in `goals/` AND in PR bodies AND in conversation transcripts — three redundant captures.
8+- The deploy succeeded? `curl https://tdd.md/healthz` is the oracle, independent of the deploy script.
9+- The sitemap is correct? Compare it to `ALL_POSTS` in the registry — two different views of the same data.
10+- Every blog post claim links back to its driving /goal AND its merge commit.
11+
12+The chain holds. Every other artifact passes its own audit. There's one exception, and it's at the heart of the entire structural claim:
13+
14+**`/sama/v2/verify` reports `7 / 7 ✓`. The only oracle that confirms it is the program that emitted it.**
15+
16+![Every claim has an oracle, except the verifier's](/images/verifier-no-oracle-gap.png?v=1)
17+
18+## The §0 fine print
19+
20+The SAMA v2 spec at /sama/v2 §0 says:
21+
22+> *"The verifier is a deterministic program; that claim is only auditable if a human can reproduce it from the data."*
23+
24+Reread that closely. A human reproducing the verdict means running [`src/b32_sama_v2_verify.ts`](/GIT/tdd.md/blob/main/src/b32_sama_v2_verify.ts) on the same source tree. The program is in git, the source tree is in git, both are deterministic — so a human gets the same `7/7 ✓` answer. That's **reproducibility**.
25+
26+Reproducibility is not the same as **independent validation**. A buggy verifier that was specifically written to pass the codebase it was designed against would emit `7/7 ✓` deterministically forever. Every human who ran it would reproduce that result. The verdict would be reproducible *and wrong* at the same time.
27+
28+The site's entire empirical claim rests on the verifier being right. Not just deterministic — *right*. And "right" means *agreement with an independent reading of the spec*. There has been no independent reading. There has been one TypeScript program, written by the same person who wrote the spec it verifies, run against the same codebase it was designed for. The chain has its final link unsecured.
29+
30+## The concrete demonstration this week
31+
32+Tonight, a draft for a second verifier landed at the repo root: [`cli.md`](/GIT/tdd.md/blob/main/cli.md). A shell-native SAMA v2 verifier sketched in three phases of an email thread, ending with a "100% SAMA v2 compliant" file structure:
33+
34+```
35+src/
36+├── a0_main.sh # Layer 3 - Entry ← wrong
37+├── b1_checks.sh # Layer 1 - Core
38+├── b2_graph.sh # Layer 2 - Adapter
39+├── c1_utils.sh # Layer 1 - Core ← wrong
40+├── c2_constants.sh # Layer 0 - Pure ← wrong
41+```
42+
43+The mapping is **backwards**. This repo's canonical convention is `a*_` = Layer 0, `b*_` = Layer 1, `c*_` = Layer 2, `d*_` = Layer 3 — the SAMA §1.1 layer order matches lex-sort. Under the cli.md mapping, lex-sort gives `a0, b1, b2, c1, c2` with layer order `3, 1, 2, 1, 0`. That's **not sorted at all** — it would fail §4.1 of its own checks.
44+
45+The person who drafted the email knew the spec, sees this codebase every day, and still got the prefix-to-layer mapping inverted. Not as a typo — as a confident description of "100% SAMA v2 compliant" structure. The spec is hard to read correctly *even by someone who wrote it*.
46+
47+If the spec is this easy to misread, what would catch a similar misreading in the TS verifier? Only a second independent implementation that reads the spec and disagrees. That's the missing oracle.
48+
49+## The fix
50+
51+![Two verifiers, one spec, one verdict — §6 evolution mechanism in action](/images/verifier-two-implementations.png?v=1)
52+
53+Build a second verifier in a fundamentally different language, on different runtime primitives, then make them agree.
54+
55+- **Different language**: TypeScript vs POSIX shell. No shared parser, no shared regex library, no shared filesystem API.
56+- **Different runtime**: Bun's JavaScript engine vs `bash` + `find` + `grep` + `awk`.
57+- **Different primitives**: `Bun.file` + `Glob` vs `find -type f -name`. Both read the same bytes; both interpret them through completely separate code paths.
58+- **Same spec read independently**: each implementer reads /sama/v2 prose alone, writes their checks, then they're cross-verified.
59+
60+The agreement mechanism is one shell script — call it `cross-verify.sh`:
61+
62+```bash
63+ts_verdict=$(bun run src/b32_sama_v2_verify.ts)
64+shell_verdict=$(tools/sama-cli/sama check)
65+if [ "$ts_verdict" = "$shell_verdict" ]; then
66+ echo "empirical 7/7 ✓ — two implementations agree"
67+ exit 0
68+else
69+ echo "spec pressure point: implementations disagree"
70+ diff <(echo "$ts_verdict") <(echo "$shell_verdict")
71+ exit 1
72+fi
73+```
74+
75+When both agree on `7/7 ✓`, the verdict is empirical. When they disagree on a specific check, the §6 evolution-policy machinery activates: the disagreement is **the spec's pressure point** — the place where the prose admits multiple readings, and the spec has to be either resolved or amended.
76+
77+This is exactly the empirical-chain pattern the rest of the site is built around. /blog/2026-05/sama-v2-workingset-cross-repo-baseline turned `workingSetFit` from "one number for one repo" into "eight numbers across eight repos, all from the same emitter." Going from N=1 to N=8 *measured* turns a property claim into a data claim. The same shape applies to verifier verdicts: N=1 implementation is a program; N=2 independent implementations producing the same verdict is data.
78+
79+## Why this is a SAMA v2 self-violation (and how)
80+
81+This post parallels two prior drama posts:
82+
83+- [/blog/2026-05/sama-v2-goal-chain-gap](/blog/2026-05/sama-v2-goal-chain-gap) said: every artifact is in git, except the /goal. Now the /goal is in git.
84+- [/blog/2026-05/sama-v2-on-ramp-gap](/blog/2026-05/sama-v2-on-ramp-gap) said: every artifact has a URL, except the on-ramp. Now there's a `CONTRIBUTING.md` at `/contributing`.
85+
86+This post says: **every claim has an oracle, except the verifier's verdict itself.** The fix is a second oracle. Same structural shape as the previous two — find a load-bearing artifact that's missing, build it under SAMA v2 discipline, watch the chain ratchet.
87+
88+The pattern that emerges across the three:
89+
90+| drama post | missing artifact | fix |
91+|---|---|---|
92+| goal-chain-gap | the /goal contract that drove each PR | `goals/<slug>.md` archive + workflow lock-in |
93+| on-ramp-gap | the on-ramp document for new contributors | `CONTRIBUTING.md` + `/contributing` route |
94+| verifier-second-opinion-gap | the independent oracle for `/sama/v2/verify` | `tools/sama-cli/` shell verifier + `cross-verify.sh` |
95+
96+Three load-bearing audits, three independent fixes, all under the same discipline. The thing that makes SAMA v2 self-coherent is exactly this: when an audit surfaces a gap in *the discipline itself*, the discipline absorbs the gap as a new artifact, mechanically. Not philosophically — by writing a file in a specific layer with a specific name and a specific sibling test.
97+
98+## What lands when the second verifier ships
99+
100+The `/goal` for this work is already on-site as a pending entry: [/goals/sama-cli-shell-verifier](/goals/sama-cli-shell-verifier). When it fires:
101+
102+- `tools/sama-cli/` directory exists with the canonical layer mapping (a=Pure, b=Core, c=Adapter, d=Entry — explicitly correcting the cli.md mistake).
103+- Each of the seven §4 checks implemented twice — once in TS (existing), once in shell (new) — reading the same spec prose.
104+- `cross-verify.sh` runs both, asserts identical verdicts. CI fails if they disagree.
105+- Self-conformance: `tools/sama-cli/sama check` against `tools/sama-cli/src/` returns `7/7 ✓`. The shell verifier verifies itself under the same rules.
106+- /sama/v2/verify still reports `7/7 ✓` — same number, but now it's `7/7 ✓ × 2`, agreed-upon by two implementations.
107+
108+The blog post that follows the /goal's merge documents which checks the two verifiers agreed on byte-for-byte versus which required the spec prose to disambiguate. That's the load-bearing data — not "they both said 7/7," but "here are the specific places where the spec was ambiguous enough that two careful readers got different answers, and here's how the prose resolved each."
109+
110+## The next empirical knowledge
111+
112+After both verifiers ship and agree on this codebase, the next falsifiable claim is straightforward:
113+
114+> *"The two-verifier agreement holds across the §5 cross-repo measurement corpus. Each of the eight external repos (`ripgrep`, `dive`, `bat`, `fd`, `eza`, `lazygit`, `cli/gh`, `WordPress Open Graph plugin`) produces an identical multi-check verdict from both verifiers."*
115+
116+That's eight more datapoints. If they all agree, the spec is genuinely reproducible from prose alone. If even one repo causes a disagreement, the spec has an ambiguity that's now *located* — and §6 evolution-policy says: resolve it in the spec, update both verifiers, re-run. Each disagreement is one bit of structural learning about where the spec is fragile.
117+
118+The TS verifier has been telling us "this codebase scores 7/7 ✓" for forty PRs. After PR #58 fires and the shell verifier lands, that claim becomes "two independent implementations of the spec, in different languages, on different runtimes, both read it as 7/7 ✓." Same number; entirely different epistemic status.
119+
120+The chain ratchets one final time. The verifier finally has its second opinion.
added public/images/verifier-no-oracle-gap.png +0 −0
added public/images/verifier-no-oracle-gap.svg +73 −0
@@ -0,0 +1,73 @@
1+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 720" width="1200" height="720">
2+ <rect width="1200" height="720" fill="#0a0a0a"/>
3+
4+ <!-- Header -->
5+ <g font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace">
6+ <text x="80" y="46" font-size="20" font-weight="600" fill="#909090">The empirical chain — every claim has a second opinion, except one</text>
7+ <text x="80" y="92" font-size="30" font-weight="700" fill="#e8e8e8">Every claim has an oracle. Except the verifier's.</text>
8+ <text x="80" y="120" font-size="14" fill="#7a7a7a">/sama/v2/verify reports 7/7 ✓ — and the only oracle that confirms it is the program that emitted it.</text>
9+ </g>
10+
11+ <!-- Column headers -->
12+ <g font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace" font-size="13" font-weight="600" letter-spacing="2">
13+ <text x="100" y="172" fill="#909090">EMPIRICAL CLAIM</text>
14+ <text x="610" y="172" fill="#909090">INDEPENDENT ORACLE</text>
15+ <text x="1000" y="172" fill="#909090">VERDICT</text>
16+ </g>
17+ <line x1="80" y1="184" x2="1120" y2="184" stroke="#2a2a2a" stroke-width="1"/>
18+
19+ <!-- Rows -->
20+ <g font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace" font-size="15">
21+
22+ <text x="100" y="216" fill="#c8c8c8">Source code correctness</text>
23+ <text x="610" y="216" fill="#8a8a8a">CI tests (independent of impl)</text>
24+ <text x="1000" y="216" fill="#7ec77e">✓ has oracle</text>
25+
26+ <text x="100" y="246" fill="#c8c8c8">§5 workingSetFit (n=8 cross-repo)</text>
27+ <text x="610" y="246" fill="#8a8a8a">external repos, pinned SHAs</text>
28+ <text x="1000" y="246" fill="#7ec77e">✓ has oracle</text>
29+
30+ <text x="100" y="276" fill="#c8c8c8">URL refactor wall-clock cost</text>
31+ <text x="610" y="276" fill="#8a8a8a">timestamps in git history</text>
32+ <text x="1000" y="276" fill="#7ec77e">✓ has oracle</text>
33+
34+ <text x="100" y="306" fill="#c8c8c8">Blog post claims</text>
35+ <text x="610" y="306" fill="#8a8a8a">/goal contract + commit history</text>
36+ <text x="1000" y="306" fill="#7ec77e">✓ has oracle</text>
37+
38+ <text x="100" y="336" fill="#c8c8c8">/goal contract authenticity</text>
39+ <text x="610" y="336" fill="#8a8a8a">PR body + goals/ verbatim</text>
40+ <text x="1000" y="336" fill="#7ec77e">✓ has oracle</text>
41+
42+ <text x="100" y="366" fill="#c8c8c8">Deploy actually shipped</text>
43+ <text x="610" y="366" fill="#8a8a8a">curl on live URL</text>
44+ <text x="1000" y="366" fill="#7ec77e">✓ has oracle</text>
45+
46+ <text x="100" y="396" fill="#c8c8c8">Sitemap correctness</text>
47+ <text x="610" y="396" fill="#8a8a8a">registry comparison</text>
48+ <text x="1000" y="396" fill="#7ec77e">✓ has oracle</text>
49+
50+ <text x="100" y="426" fill="#c8c8c8">Frontmatter parsing</text>
51+ <text x="610" y="426" fill="#8a8a8a">sibling test fixtures</text>
52+ <text x="1000" y="426" fill="#7ec77e">✓ has oracle</text>
53+
54+ <!-- The dramatic row -->
55+ <rect x="80" y="446" width="1040" height="40" fill="#2a1010" stroke="#7c2020" stroke-width="1.5" rx="4"/>
56+ <text x="100" y="472" fill="#e88080" font-weight="700">/sama/v2/verify says "7/7 ✓"</text>
57+ <text x="610" y="472" fill="#c87070">— only the program itself —</text>
58+ <text x="1000" y="472" fill="#e85050" font-weight="700">✗ NO ORACLE</text>
59+ </g>
60+
61+ <!-- Bottom callout -->
62+ <rect x="80" y="522" width="1040" height="140" fill="#1a1010" stroke="#3a2020" stroke-width="1" rx="6"/>
63+ <g font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace">
64+ <text x="100" y="552" font-size="16" font-weight="600" fill="#e88080">The self-violation:</text>
65+ <text x="100" y="578" font-size="14" fill="#c8c8c8">§0 says "the verifier is a deterministic program; that claim is only auditable if a human can reproduce it from the data."</text>
66+ <text x="100" y="602" font-size="14" fill="#c8c8c8">Yes — by running the same program. That's reproducibility, not independent validation. A buggy verifier that's biased toward</text>
67+ <text x="100" y="626" font-size="14" fill="#c8c8c8">passing the codebase it was written against would still emit 7/7 ✓ deterministically. Forty PRs preaching auditability — and</text>
68+ <text x="100" y="650" font-size="14" fill="#c8c8c8">the gate at the heart of every merge has had exactly one implementation reading it.</text>
69+ </g>
70+
71+ <!-- Watermark -->
72+ <text x="1120" y="704" text-anchor="end" font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace" font-size="12" fill="#5a5a5a">https://tdd.md</text>
73+</svg>
added public/images/verifier-two-implementations.png +0 −0
added public/images/verifier-two-implementations.svg +95 −0
@@ -0,0 +1,95 @@
1+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 700" width="1200" height="700">
2+ <rect width="1200" height="700" fill="#0a0a0a"/>
3+
4+ <!-- Header -->
5+ <g font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace">
6+ <text x="80" y="46" font-size="20" font-weight="600" fill="#909090">The fix — two independent implementations of the same spec</text>
7+ <text x="80" y="92" font-size="32" font-weight="700" fill="#e8e8e8">If both agree on 7/7 ✓, the verdict is empirical.</text>
8+ <text x="80" y="120" font-size="14" fill="#7a7a7a">Different language, different runtime, same spec read independently. Disagreement is the spec's pressure point.</text>
9+ </g>
10+
11+ <!-- Three columns -->
12+ <g font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace">
13+
14+ <!-- TS verifier column (existing) -->
15+ <rect x="60" y="156" width="340" height="380" fill="#1a1a1a" stroke="#4a8a8a" stroke-width="1.5" rx="6"/>
16+ <text x="80" y="186" font-size="18" font-weight="600" fill="#4a8a8a">TS verifier (existing)</text>
17+ <text x="80" y="210" font-size="13" fill="#8a8a8a">— the current canonical —</text>
18+
19+ <text x="80" y="252" font-size="13" fill="#909090" letter-spacing="1">FILE</text>
20+ <text x="80" y="276" font-size="13" fill="#c8c8c8">src/b32_sama_v2_verify.ts</text>
21+
22+ <text x="80" y="312" font-size="13" fill="#909090" letter-spacing="1">RUNTIME</text>
23+ <text x="80" y="336" font-size="13" fill="#c8c8c8">Bun (TypeScript)</text>
24+
25+ <text x="80" y="372" font-size="13" fill="#909090" letter-spacing="1">PRIMITIVES</text>
26+ <text x="80" y="396" font-size="13" fill="#c8c8c8">Bun.file · Glob · readdir</text>
27+
28+ <text x="80" y="432" font-size="13" fill="#909090" letter-spacing="1">SURFACE</text>
29+ <text x="80" y="456" font-size="13" fill="#c8c8c8">/sama/v2/verify (live)</text>
30+
31+ <text x="80" y="492" font-size="13" fill="#909090" letter-spacing="1">VERDICT</text>
32+ <text x="80" y="516" font-size="20" fill="#4a8a8a" font-weight="700">7 / 7 ✓</text>
33+
34+ <!-- Agreement column (middle) -->
35+ <rect x="430" y="156" width="340" height="380" fill="#1a1a1a" stroke="#c89a3a" stroke-width="2" rx="6"/>
36+ <text x="450" y="186" font-size="18" font-weight="600" fill="#c89a3a">cross-verify.sh</text>
37+ <text x="450" y="210" font-size="13" fill="#8a8a8a">— the empirical gate —</text>
38+
39+ <text x="450" y="252" font-size="13" fill="#909090" letter-spacing="1">INPUT</text>
40+ <text x="450" y="276" font-size="13" fill="#c8c8c8">both verdicts</text>
41+
42+ <text x="450" y="312" font-size="13" fill="#909090" letter-spacing="1">CHECK</text>
43+ <text x="450" y="336" font-size="13" fill="#c8c8c8">byte-for-byte equality</text>
44+
45+ <text x="450" y="372" font-size="13" fill="#909090" letter-spacing="1">IF AGREE</text>
46+ <text x="450" y="396" font-size="13" fill="#7ec77e">→ empirical 7/7 ✓</text>
47+
48+ <text x="450" y="432" font-size="13" fill="#909090" letter-spacing="1">IF DISAGREE</text>
49+ <text x="450" y="456" font-size="13" fill="#e8c89a">→ §6 pressure point</text>
50+ <text x="450" y="476" font-size="12" fill="#8a8a8a">resolve via spec prose</text>
51+
52+ <text x="450" y="510" font-size="13" fill="#909090" letter-spacing="1">EXIT</text>
53+ <text x="450" y="534" font-size="13" fill="#c8c8c8">0 = agree · 1 = disagree</text>
54+
55+ <!-- Shell verifier column (proposed) -->
56+ <rect x="800" y="156" width="340" height="380" fill="#1a1a1a" stroke="#4a8a8a" stroke-width="1.5" rx="6"/>
57+ <text x="820" y="186" font-size="18" font-weight="600" fill="#4a8a8a">Shell verifier (NEW)</text>
58+ <text x="820" y="210" font-size="13" fill="#8a8a8a">— the independent oracle —</text>
59+
60+ <text x="820" y="252" font-size="13" fill="#909090" letter-spacing="1">FILE</text>
61+ <text x="820" y="276" font-size="13" fill="#c8c8c8">tools/sama-cli/sama check</text>
62+
63+ <text x="820" y="312" font-size="13" fill="#909090" letter-spacing="1">RUNTIME</text>
64+ <text x="820" y="336" font-size="13" fill="#c8c8c8">POSIX shell (bash)</text>
65+
66+ <text x="820" y="372" font-size="13" fill="#909090" letter-spacing="1">PRIMITIVES</text>
67+ <text x="820" y="396" font-size="13" fill="#c8c8c8">find · grep · awk · wc</text>
68+
69+ <text x="820" y="432" font-size="13" fill="#909090" letter-spacing="1">SURFACE</text>
70+ <text x="820" y="456" font-size="13" fill="#c8c8c8">CLI + cross-verify hook</text>
71+
72+ <text x="820" y="492" font-size="13" fill="#909090" letter-spacing="1">VERDICT</text>
73+ <text x="820" y="516" font-size="20" fill="#4a8a8a" font-weight="700">7 / 7 ✓</text>
74+ </g>
75+
76+ <!-- Arrows connecting -->
77+ <g stroke="#8a8a8a" stroke-width="1.5" fill="none">
78+ <line x1="400" y1="346" x2="430" y2="346"/>
79+ <line x1="800" y1="346" x2="770" y2="346"/>
80+ </g>
81+ <polygon points="426,341 432,346 426,351" fill="#8a8a8a"/>
82+ <polygon points="774,341 768,346 774,351" fill="#8a8a8a"/>
83+
84+ <!-- Bottom callout -->
85+ <rect x="80" y="556" width="1040" height="104" fill="#101a10" stroke="#1f3f1f" stroke-width="1" rx="6"/>
86+ <g font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace">
87+ <text x="100" y="586" font-size="16" font-weight="600" fill="#7ec77e">Falsifiable claim — the §6 evolution mechanism in action:</text>
88+ <text x="100" y="612" font-size="14" fill="#c8c8c8">"Two independent implementations of the SAMA v2 §4 spec, in different languages on different runtimes, will produce</text>
89+ <text x="100" y="634" font-size="14" fill="#c8c8c8">identical verdicts against any spec-conforming codebase. If they disagree on this repo's 7/7 ✓, one is wrong — and per §0</text>
90+ <text x="100" y="654" font-size="14" fill="#c8c8c8">the disagreement is resolvable from the spec prose alone. Disagreements ARE the spec's pressure points."</text>
91+ </g>
92+
93+ <!-- Watermark -->
94+ <text x="1120" y="684" text-anchor="end" font-family="ui-monospace, 'SF Mono', 'JetBrains Mono', Menlo, Consolas, monospace" font-size="12" fill="#5a5a5a">https://tdd.md</text>
95+</svg>
modified src/a31_blog.ts +6 −0
@@ -12,6 +12,12 @@ export interface BlogEntry {
1212 }
1313
1414 export const ALL_POSTS: BlogEntry[] = [
15+ {
16+ slug: "sama-v2-verifier-second-opinion-gap",
17+ title: "The verifier has no second opinion",
18+ description: "Every load-bearing claim on tdd.md has an independent oracle that confirms it. §5 workingSetFit numbers are pinned to external repos at specific SHAs (anyone can clone and recount). URL refactor wall-clock measurements are timestamps in git history. /goal contracts live in goals/ AND PR bodies AND conversation transcripts. The deploy succeeded? curl on /healthz is the oracle, independent of the deploy script. The sitemap is correct? Compare it to ALL_POSTS. Every blog post claim links back to its driving /goal and merge commit. The chain holds — every artifact passes its own audit. There's one exception, and it's at the heart of the entire structural claim: /sama/v2/verify reports 7/7 ✓, and the only oracle that confirms it is the program that emitted it. The §0 spec says 'the verifier is a deterministic program; that claim is only auditable if a human can reproduce it from the data' — but reproducibility means running the same program; that's not independent validation. A buggy verifier biased toward passing the codebase it was written against would emit 7/7 ✓ deterministically forever. This week the gap got a concrete demonstration: a draft of a second verifier landed at the repo root (cli.md), and its 'SAMA v2 compliant' file structure had the prefix-to-layer mapping inverted (a=Layer 3, c=Layer 0/1 — backwards from the canonical a=Pure, b=Core, c=Adapter, d=Entry). Someone who reads the spec daily still got the structure wrong. If the spec is this easy to misread, what catches a similar misreading in the TS verifier? Only a second independent implementation that disagrees. The fix: build the shell verifier proposed in cli.md (with the layer mapping corrected) at tools/sama-cli/. Different language (POSIX shell vs TS), different runtime (bash vs Bun), different primitives (find/grep/awk vs Bun.file/Glob), same spec read independently. A cross-verify.sh script runs both and asserts identical verdicts — agreement is empirical 7/7 ✓; disagreement is a §6 spec-prose pressure point. Third drama post in the structural-self-audit series: chain-gap (every artifact in git except the /goal — fixed), on-ramp-gap (every artifact has a URL except the on-ramp — fixed), verifier-second-opinion-gap (every claim has an oracle except the verifier's — /goal /goals/sama-cli-shell-verifier proposed). Falsifiable next: two-verifier agreement holds across the n=8 §5 cross-repo measurement corpus. If even one repo causes disagreement, the spec has a located ambiguity; §6 evolution-policy resolves it in the prose.",
19+ date: "2026-05-25",
20+ },
1521 {
1622 slug: "sama-v2-portability-boundary-found",
1723 title: "21 minutes 23 seconds — the portability boundary is empirically located",