syntaxai/tdd.md · commit 56db5e6

Blog: 7 measured workingSetFit datapoints; convergence question answered

n=2 → n=7. Ran the polyglot §5 emitter against 5 more mature
compiled-language CLI tools at pinned SHAs (sharkdp/bat,
sharkdp/fd, eza-community/eza, jesseduffield/lazygit, cli/cli),
collected the full table, computed the distribution, and tested
the hypothesis from the prior PR.

Measured results (sorted descending):
  cli/cli (gh):           73.59%  (e53ff321, Go, 379/515)
  sharkdp/fd:             69.57%  (42b2ab8a, Rust, 16/23)
  jesseduffield/lazygit:  67.38%  (608c90ae, Go, 595/883)
  eza-community/eza:      61.76%  (eed27ed0, Rust, 42/68)
  BurntSushi/ripgrep:     54.00%  (4519153e, Rust, 54/100)
  wagoodman/dive:         52.17%  (d6c69194, Go, 48/92)
  sharkdp/bat:            46.27%  (f3d07734, Rust, 31/67)

Range 27.32pp. Mean 60.68%. Sample stddev 10.13pp. Median eza at
61.76%. Five of seven cluster in [52%, 70%].

The dive/ripgrep 2-point convergence was n=2 coincidence — the
actual baseline spans 27 points — but the clustering is real (18-pt
IQR-equivalent window). The metric is more discriminating than
n=2 implied.

tdd.md (only SAMA-disciplined repo, n=1 of that class) measures
80%, 6.4 points above the top of the non-SAMA baseline. Suggestive
but n=1 vs n=7 is not a worth-following claim. The §6 falsifiable
experiment is now well-conditioned for when a second SAMA repo
exists.

Includes a §0-style hand-trace for bat (the lowest measurement)
mirroring the dive audit pattern: find … | wc -l = 67 total;
shell loop counts 31 in [50,500] LOC; 31/67 = 0.4626 matches the
polyglot emitter to four decimals.

Anti-fudge: no code changes; the polyglot tool from the prior PR
runs against the new corpus unchanged. Verifier still 7/7 ✓ on
this repo; 336/336 tests pass.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

author: syntaxai <[email protected]>
date: 2026-05-24 10:55:42 +01:00
parent: 43c6f7a
commit: 56db5e6845ebde71a8ffca4d9fff85248700a4ff

3 files changed · +176 −4

added content/blog/sama-v2-workingset-cross-repo-baseline.md +161 −0

@@ -0,0 +1,161 @@
	1	+# Was the dive/ripgrep convergence real? Seven measured workingSetFit datapoints
	2	+
	3	+The [dive audit](/blog/sama-v2-go-project-dive) and [ripgrep audit](/blog/sama-v2-rust-project-ripgrep) closed with a quietly interesting finding: when I ported the §5 `workingSetFit` metric to Go and Rust and ran it against both repos, they landed within two percentage points of each other — dive at 52.17% ([@d6c69194](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0)) and ripgrep at 54.00% ([@4519153e](https://github.com/BurntSushi/ripgrep/commit/4519153e5e461527f4bca45b042fff45c4ec6fb9)). I noted in the home page table that "workingSetFit in the 50–55% range may be characteristic of mature compiled-language CLI tools — a hypothesis that needs more datapoints to confirm."
	4	+
	5	+This post tests that hypothesis. n=2 → n=7, same tool, same bounds, same exclusion rules. Pinned SHAs throughout. The headline:
	6	+
	7	+> The convergence was an n=2 coincidence. The actual baseline distribution among seven mature compiled-language CLI tools spans 27 percentage points — from 46.27% (bat) to 73.59% (cli/gh) — with mean 60.68% and sample stddev 10.13pp.
	8	+
	9	+But the convergence wasn't entirely an artefact: five of the seven projects fall inside the band [52%, 70%] (an 18-point window, not 2), and that clustering does suggest something real about how mature CLI codebases distribute their file sizes. The story is just more textured than n=2 implied.
	10	+
	11	+## The corpus
	12	+
	13	+Five new repos cloned and measured, joining `dive` and `ripgrep`:
	14	+
	15	+\| project \| language \| role \| stars (approx) \| clone command \|
	16	+\|---\|---\|---\|---\|---\|
	17	+\| sharkdp/bat \| Rust \| syntax-highlighted `cat` \| ~50k \| `git clone --depth=1 https://github.com/sharkdp/bat.git` \|
	18	+\| sharkdp/fd \| Rust \| user-friendly `find` \| ~37k \| `git clone --depth=1 https://github.com/sharkdp/fd.git` \|
	19	+\| eza-community/eza \| Rust \| modern `ls` (fork of `exa`) \| ~12k \| `git clone --depth=1 https://github.com/eza-community/eza.git` \|
	20	+\| jesseduffield/lazygit \| Go \| terminal UI for git \| ~60k \| `git clone --depth=1 https://github.com/jesseduffield/lazygit.git` \|
	21	+\| cli/cli \| Go \| GitHub's official `gh` CLI \| ~37k \| `git clone --depth=1 https://github.com/cli/cli.git` \|
	22	+
	23	+Corpus criteria: each project is a CLI tool, widely used (10k+ stars), mature (5+ year codebase), and primarily written in its target language. `dive` and `ripgrep` from the prior audits round out a 4-Rust / 3-Go split.
	24	+
	25	+## Methodology
	26	+
	27	+The [polyglot §5 emitter](/GIT/syntaxai/tdd.md/blob/main/scripts/measure-working-set.ts) at `scripts/measure-working-set.ts` was used unchanged. The bounds [50, 500] LOC inclusive are imported from `WORKING_SET_MIN_LOC` and `WORKING_SET_MAX_LOC` in [`src/a31_sama_v2.ts`](/GIT/syntaxai/tdd.md/blob/main/src/a31_sama_v2.ts) — the same constants the `/sama/v2/verify` page uses against this site's own source. Single source of truth: the cross-repo numbers are computed against the exact band the spec defines.
	28	+
	29	+LOC for each file = `content.split("\n").length`, matching the TS reference implementation byte-for-byte. Test-file exclusion rule: Go excludes `_test.go` (mirroring TS's `.test.ts` exclusion); Rust includes all `.rs` files because Rust's convention is inline `#[cfg(test)] mod tests` — formalised at [/sama/v2 §6.2 inline-tests dialect](/sama/v2#62-inline-tests-dialect). Skipped directories: `.git/`, `target/`, `vendor/`, `node_modules/`, all dotdirs.
	30	+
	31	+### Hand-trace — bat (the lowest measurement)
	32	+
	33	+Per [/sama/v2 §0](/sama/v2) the verifier is a deterministic program; that claim is only auditable if a human can reproduce the number from the data. So:
	34	+
	35	+```bash
	36	+cd /tmp/bat # at SHA f3d07734
	37	+find . -name '*.rs' -type f \
	38	+ -not -path '/.git/' -not -path '/target/' \
	39	+ -not -path '/vendor/' -not -path '/node_modules/' \
	40	+ \| wc -l
	41	+# 67 total .rs files
	42	+
	43	+# For each file, count newlines, add 1, check [50, 500] inclusive:
	44	+in_band=0
	45	+while read -r f; do
	46	+ newlines=$(tr -cd '\n' < "$f" \| wc -c)
	47	+ lines=$((newlines + 1))
	48	+ if [ "$lines" -ge 50 ] && [ "$lines" -le 500 ]; then
	49	+ in_band=$((in_band + 1))
	50	+ fi
	51	+done < <(find . -name '*.rs' -type f \
	52	+ -not -path '/.git/' -not -path '/target/' \
	53	+ -not -path '/vendor/' -not -path '/node_modules/')
	54	+echo "in band: $in_band"
	55	+# 31
	56	+echo "ratio: $(echo "scale=4; $in_band / 67" \| bc)"
	57	+# .4626
	58	+```
	59	+
	60	+The polyglot emitter produces the same numbers: 67 total, 31 included, ratio 0.4627 (rounding-bit difference at the fifth decimal). 46.27% measured. Auditable per §0.
	61	+
	62	+## The seven datapoints
	63	+
	64	+Sorted by `workingSetFit` descending:
	65	+
	66	+\| rank \| project \| language \| SHA \| total \| included \| ratio \| % \|
	67	+\|---:\|---\|---\|---\|---:\|---:\|---:\|---:\|
	68	+\| 1 \| [cli/cli (gh)](https://github.com/cli/cli/commit/e53ff321f06514b5ba290bbc4ef84f7e0efcd3dd) \| Go \| `e53ff321` \| 515 \| 379 \| 0.7359 \| 73.59% \|
	69	+\| 2 \| [sharkdp/fd](https://github.com/sharkdp/fd/commit/42b2ab8a84ddedf80eeed9079128c60161f64658) \| Rust \| `42b2ab8a` \| 23 \| 16 \| 0.6957 \| 69.57% \|
	70	+\| 3 \| [jesseduffield/lazygit](https://github.com/jesseduffield/lazygit/commit/608c90ae3c1c99ffad9324bfc2613d9d46599992) \| Go \| `608c90ae` \| 883 \| 595 \| 0.6738 \| 67.38% \|
	71	+\| 4 \| [eza-community/eza](https://github.com/eza-community/eza/commit/eed27ed05e74542af5852aed40e3dbff87d69c43) \| Rust \| `eed27ed0` \| 68 \| 42 \| 0.6176 \| 61.76% \|
	72	+\| 5 \| [BurntSushi/ripgrep](https://github.com/BurntSushi/ripgrep/commit/4519153e5e461527f4bca45b042fff45c4ec6fb9) \| Rust \| `4519153e` \| 100 \| 54 \| 0.5400 \| 54.00% \|
	73	+\| 6 \| [wagoodman/dive](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0) \| Go \| `d6c69194` \| 92 \| 48 \| 0.5217 \| 52.17% \|
	74	+\| 7 \| [sharkdp/bat](https://github.com/sharkdp/bat/commit/f3d077346824eae07fbac4b56466d27049b9616e) \| Rust \| `f3d07734` \| 67 \| 31 \| 0.4627 \| 46.27% \|
	75	+
	76	+For reference (not included in the cross-repo baseline because it's the SAMA-disciplined dogfood, not a non-SAMA mature CLI tool): tdd.md (this site, TypeScript) measures 80.00% at the live `/sama/v2/verify` endpoint.
	77	+
	78	+## Distribution
	79	+
	80	+```
	81	+ 46 50 55 60 65 70 75
	82	+ \|---\|-------\|-------\|-------\|-------\|-------\|
	83	+ bat 46.27
	84	+ dive 52.17
	85	+ ripgrep 54.00
	86	+ eza 61.76
	87	+ lazygit 67.38
	88	+ fd 69.57
	89	+ cli/gh 73.59
	90	+ (mature CLI baseline)
	91	+ ---80.00--- tdd.md (SAMA)
	92	+```
	93	+
	94	+- Range: 46.27% – 73.59% (spread 27.32 percentage points)
	95	+- Mean: 60.68%
	96	+- Median: 61.76% (eza)
	97	+- Sample stddev: 10.13 pp
	98	+- Inter-quartile range (sort positions 2 and 6): 52.17% – 69.57% (spread 17.40 pp)
	99	+
	100	+Five of seven projects fall in [52%, 70%] — a real clustering, though wider than the dive/ripgrep coincidence suggested.
	101	+
	102	+## Go vs Rust subset
	103	+
	104	+\| subset \| n \| mean \| median \| range \|
	105	+\|---\|---:\|---:\|---:\|---\|
	106	+\| Go (cli, lazygit, dive) \| 3 \| 64.38% \| 67.38% \| 52.17–73.59 (21.42 pp) \|
	107	+\| Rust (fd, eza, ripgrep, bat) \| 4 \| 57.90% \| 57.88% \| 46.27–69.57 (23.30 pp) \|
	108	+
	109	+Go averages ~6 percentage points higher than Rust at n=3 vs n=4. Sample sizes are small; the gap may not survive a larger corpus. But: nothing in either subset cleanly clusters; both span ~20+ points. The hypothesis that "Go projects are tighter than Rust projects on this axis" is consistent with the data but not evidenced by it.
	110	+
	111	+## Per-project notes
	112	+
	113	+A 1-2 sentence read on what each project's distribution implies. The polyglot emitter's `--verbose` flag emits the per-file LOC breakdown if you want to follow up.
	114	+
	115	+- cli/cli at 73.59% — the highest measured score. 515 Go files, of which 379 land in band. Reading the over-band tail reveals it's mostly large command-handler files (`pkg/cmd/repo/sync/sync.go` and similar) — natural behavioural cohesion, not god-classes. Likely a real architectural fit signal.
	116	+
	117	+- sharkdp/fd at 69.57% — second highest, and the smallest project in the corpus by file count (23 .rs files). High `workingSetFit` partly reflects that there are few files to be tiny stubs against. With n=23, the metric is noisier; honest to report.
	118	+
	119	+- jesseduffield/lazygit at 67.38% — the biggest project in the corpus (883 .go files) and still clears 67%. That's the impressive number in the table: even at scale, a Go TUI keeps two-thirds of its files in the substantive-module band.
	120	+
	121	+- eza-community/eza at 61.76% — median of the seven. The audit-style observation: eza inherits its layout from `exa` (its predecessor) and the file-size distribution looks deliberate — small modules tend to be the leaf-renderers for one column-formatter each, not stubs.
	122	+
	123	+- BurntSushi/ripgrep at 54.00% — the prior audit identified 30 files over 500 LOC. Most are the textbook declarative-exempt cases the [§6.3 declarative-exemption dialect](/sama/v2#63-declarative-exemption-dialect) was drafted for; the raw metric doesn't distinguish them. The audit goes into more detail.
	124	+
	125	+- wagoodman/dive at 52.17% — the prior audit identified the opposite shape: 0 files over 500 LOC, 44 under 50 LOC. Tiny type-stubs and platform-shims pull the score down, not god-classes.
	126	+
	127	+- sharkdp/bat at 46.27% — the lowest measurement. Reading the distribution: the over-band tail (`src/printer.rs` at ~2,100 LOC, `src/assets.rs`, `src/config.rs`) is sizeable, but the under-50 tail is also substantial. Bat has many small "language definition" modules that pre-build syntax highlighting for the supported languages — by-construction declarative shards. Like the ripgrep `defs.rs` case, the raw metric doesn't distinguish them from "this file is too small."
	128	+
	129	+## What this answers and what it doesn't
	130	+
	131	+Answers the convergence question: the dive/ripgrep 2-point landing was n=2 coincidence. The real distribution spans ~27 percentage points. But there's still a real clustering effect: most mature CLI tools land between 50% and 70%, with the median right at 60%.
	132	+
	133	+Does not yet answer the SAMA-vs-non-SAMA question. That requires a second SAMA-disciplined repo measured against the same axes, and only one exists today (this site, at 80%). One SAMA datapoint above the entire non-SAMA distribution is suggestive — tdd.md's 80% sits 6.4 percentage points above the top of the mature-CLI baseline (cli/gh, 73.59%) — but n=1 vs n=7 is far from a SAMA-worth-following claim. §6 of the spec is explicit that promotion requires cross-repo deltas, not a single dogfood.
	134	+
	135	+What this run does establish:
	136	+
	137	+1. The empirical chain is now n=7 measured against the same bounds. Before today, the cross-repo argument was "tdd.md is measured, the audits are hand-estimated." Now the audits and five new baseline datapoints are measured. The estimates are gone from this column of the table.
	138	+2. The metric is more discriminating than n=2 implied. A 27-point spread is meaningful — workingSetFit does distinguish projects from one another, even within the narrow category of "mature compiled-language CLI tools."
	139	+3. The §6 falsifiable experiment is now well-conditioned. When a second SAMA repo exists, comparing its workingSetFit against this seven-row baseline is a real test, not a vibes call. The baseline distribution (mean, range, stddev) is what the test compares against.
	140	+
	141	+## Reproducibility
	142	+
	143	+Anyone with the polyglot emitter and the pinned SHAs can reproduce these numbers exactly. The repo has the tool; the SHAs are in the table above; the bounds live in source as constants. Run:
	144	+
	145	+```bash
	146	+git clone --depth=1 https://github.com/sharkdp/bat.git /tmp/bat
	147	+cd /tmp/bat && git checkout f3d077346824eae07fbac4b56466d27049b9616e
	148	+bun /path/to/tdd.md/scripts/measure-working-set.ts /tmp/bat --lang rust
	149	+# {"total": 67, "included": 31, "ratio": 0.4626865671641791, "ratioPercent": 46.27}
	150	+```
	151	+
	152	+That's the §0 contract: the program is deterministic; the same source tree + same bounds produces the same number; a human can reproduce it from the spec. Seven times over, now.
	153	+
	154	+---
	155	+
	156	+Companion posts:
	157	+
	158	+- [The dive audit](/blog/sama-v2-go-project-dive) — where the dive measurement is hand-traced
	159	+- [The ripgrep audit](/blog/sama-v2-rust-project-ripgrep) — where the ripgrep measurement is hand-traced
	160	+- [The §5 metrics emitter post](/blog/sama-v2-metrics-emitter) — why measurement matters more than estimates
	161	+- [The v2.1 dialects (§6.1–6.3)](/sama/v2#6a-v21-dialects-provisional) — particularly §6.2 inline-tests (load-bearing for the Rust file-counting rule above) and §6.3 declarative-exemption (the policy lens for what the raw metric can't distinguish)

modified content/home.md +9 −4

@@ -58,16 +58,21 @@ SAMA bundles those findings into four constraints a CI job can enforce. Sorted
58	58
59	59	## Datapoints on the same axes
60	60
61		-Empirical baseline so far. The §4 score for this site is [computed live](/sama/v2/verify); the §4 scores for the other repos are hand-estimated. The workingSetFit column is now measured for three of the four repos by the polyglot §5 emitter at [`scripts/measure-working-set.ts`](/GIT/syntaxai/tdd.md/blob/main/scripts/measure-working-set.ts); the remaining columns are still hand-estimated where flagged.
	61	+Empirical baseline so far. The §4 score for this site is [computed live](/sama/v2/verify); the §4 scores for the other repos are hand-estimated. The workingSetFit column is now measured for the SAMA dogfood (this site) and seven non-SAMA mature compiled-language CLI tools by the polyglot §5 emitter at [`scripts/measure-working-set.ts`](/GIT/syntaxai/tdd.md/blob/main/scripts/measure-working-set.ts) — see the [seven-datapoint baseline post](/blog/sama-v2-workingset-cross-repo-baseline) for the full table, distribution, and hand-trace.
62	62
63	63	\| project \| language \| §4 score \| workingSetFit \| boundaryRatio \| graphDepth \|
64	64	\|---\|---\|---\|---\|---\|---\|
65		-\| tdd.md (this site) \| TypeScript \| 7 / 7 ✓ (measured) \| 80% (measured) \| 100% (measured) \| 7 (measured) \|
66		-\| [wagoodman/dive](/blog/sama-v2-go-project-dive) \| Go \| ~5 / 7 (estimated) \| 52.17% (measured, [@d6c69194](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0)) \| ~85% (estimated) \| ~5 (estimated) \|
	65	+\| tdd.md (this site, SAMA-disciplined) \| TypeScript \| 7 / 7 ✓ (measured) \| 80.00% (measured) \| 100% (measured) \| 7 (measured) \|
	66	+\| [cli/cli (gh)](https://github.com/cli/cli) \| Go \| n/a (not audited) \| 73.59% (measured, [@e53ff321](https://github.com/cli/cli/commit/e53ff321f06514b5ba290bbc4ef84f7e0efcd3dd)) \| — \| — \|
	67	+\| [sharkdp/fd](https://github.com/sharkdp/fd) \| Rust \| n/a (not audited) \| 69.57% (measured, [@42b2ab8a](https://github.com/sharkdp/fd/commit/42b2ab8a84ddedf80eeed9079128c60161f64658)) \| — \| — \|
	68	+\| [jesseduffield/lazygit](https://github.com/jesseduffield/lazygit) \| Go \| n/a (not audited) \| 67.38% (measured, [@608c90ae](https://github.com/jesseduffield/lazygit/commit/608c90ae3c1c99ffad9324bfc2613d9d46599992)) \| — \| — \|
	69	+\| [eza-community/eza](https://github.com/eza-community/eza) \| Rust \| n/a (not audited) \| 61.76% (measured, [@eed27ed0](https://github.com/eza-community/eza/commit/eed27ed05e74542af5852aed40e3dbff87d69c43)) \| — \| — \|
67	70	\| [BurntSushi/ripgrep](/blog/sama-v2-rust-project-ripgrep) \| Rust \| ~3-5 / 7 (estimated, depends on v2.1 dialect uptake) \| 54.00% (measured, [@4519153e](https://github.com/BurntSushi/ripgrep/commit/4519153e5e461527f4bca45b042fff45c4ec6fb9)) \| ~95% (estimated) \| ~5 (estimated) \|
	71	+\| [wagoodman/dive](/blog/sama-v2-go-project-dive) \| Go \| ~5 / 7 (estimated) \| 52.17% (measured, [@d6c69194](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0)) \| ~85% (estimated) \| ~5 (estimated) \|
	72	+\| [sharkdp/bat](https://github.com/sharkdp/bat) \| Rust \| n/a (not audited) \| 46.27% (measured, [@f3d07734](https://github.com/sharkdp/bat/commit/f3d077346824eae07fbac4b56466d27049b9616e)) \| — \| — \|
68	73	\| [Open Graph plugin](/blog/sama-v2-wordpress-plugin-audit) \| PHP / WordPress \| 0 / 7 (estimated) \| ~47% (estimated) \| <10% (estimated) \| ~3 (estimated) \|
69	74
70		-Four points is not yet a "v2 is worth following" claim. §6 of the spec is explicit that promotion to official requires cross-repo deltas, not a single dogfood. But three workingSetFit rows are now measured against the same bounds the spec defines — a quiet but load-bearing step from "we have numbers" to "we have the same numbers across repos." The cross-repo signal that emerges: ripgrep (54.00%) and dive (52.17%) land within two percentage points of each other, suggesting workingSetFit in the 50–55% range may be characteristic of mature compiled-language CLI tools — a hypothesis that needs more datapoints to confirm but is now testable in a way it was not when the numbers were all eyeballed.
	75	+The cross-repo signal that emerged: across the seven non-SAMA mature CLI tools, `workingSetFit` ranges from 46.27% (bat) to 73.59% (cli/gh) — a 27-point spread, mean 60.68%, sample stddev 10.13pp. Five of seven cluster inside [52%, 70%]. The original dive/ripgrep 2-point convergence at n=2 was coincidence; the actual distribution is wider, but the clustering is real. tdd.md (the SAMA-disciplined dogfood) measures 80.00% — 6.4 percentage points above the top of the non-SAMA baseline. Suggestive but n=1 vs n=7 is far from a SAMA-worth-following claim. §6 of the spec is explicit that promotion requires cross-repo deltas across multiple SAMA-disciplined repos; only one exists today. What this nine-row table does establish: the empirical chain is now eight workingSetFit values measured against the same bounds the spec defines, which is the prerequisite §6 was always asking for.
71	76
72	77	## See it in practice
73	78

modified src/a31_blog.ts +6 −0

@@ -12,6 +12,12 @@ export interface BlogEntry {
12	12	}
13	13
14	14	export const ALL_POSTS: BlogEntry[] = [
	15	+ {
	16	+ slug: "sama-v2-workingset-cross-repo-baseline",
	17	+ title: "Was the dive/ripgrep convergence real? Seven measured workingSetFit datapoints",
	18	+ description: "The dive/ripgrep audits ended with a quietly interesting finding: when the polyglot §5 emitter ran against both, they landed within 2 percentage points of each other (52.17% and 54.00%). I noted on the home page that this might be characteristic of mature compiled-language CLI tools — a hypothesis that needs more datapoints to confirm. This post tests it. n=2 → n=7. Cloned 5 more popular CLI tools at pinned SHAs (sharkdp/bat, sharkdp/fd, eza-community/eza, jesseduffield/lazygit, cli/cli), ran the same emitter with the same bounds imported from a31_sama_v2.ts. Headline: the convergence was n=2 coincidence. The actual distribution spans 27 percentage points — bat at 46.27% (lowest) to cli/gh at 73.59% (highest). Mean 60.68%, median 61.76% (eza), sample stddev 10.13pp. But there IS clustering: five of seven projects fall within [52%, 70%] — an 18-point window, not 2. The metric is more discriminating than n=2 implied, and the clustering is real. Go subset (cli, lazygit, dive) averages ~6pp higher than Rust subset (fd, eza, ripgrep, bat) at small n. Per-project notes on what each distribution implies — cli/gh's high score reflects natural command-handler cohesion; bat's low score reflects pre-built syntax-highlighting language-definition shards (the same declarative-exemption case the §6.3 dialect was drafted for); dive's miss reflects platform-shim stubs not god-classes. tdd.md (the SAMA-disciplined dogfood) measures 80% — 6.4 percentage points above the top of the non-SAMA mature-CLI baseline. Suggestive but n=1 vs n=7 is not a SAMA-worth-following claim. What this run does establish: the empirical chain is now n=7 measured against the same bounds; the §6 falsifiable experiment is well-conditioned for when a second SAMA repo exists. Includes a hand-trace of bat (the lowest measurement) per the §0 deterministic-program contract, mirroring the dive audit's hand-trace pattern. Reproducibility: pinned SHAs throughout; anyone can clone-and-run.",
	19	+ date: "2026-05-27",
	20	+ },
15	21	{
16	22	slug: "sama-v2-rust-project-ripgrep-parallel-fleet",
17	23	title: "The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection",

raw .diff