56db5e6845ebde71a8ffca4d9fff85248700a4ff diff --git a/content/blog/sama-v2-workingset-cross-repo-baseline.md b/content/blog/sama-v2-workingset-cross-repo-baseline.md new file mode 100644 index 0000000000000000000000000000000000000000..65dcbe5faef0b87d73652bcc6b62a75b54691b41 --- /dev/null +++ b/content/blog/sama-v2-workingset-cross-repo-baseline.md @@ -0,0 +1,161 @@ +# Was the dive/ripgrep convergence real? Seven measured workingSetFit datapoints + +The [dive audit](/blog/sama-v2-go-project-dive) and [ripgrep audit](/blog/sama-v2-rust-project-ripgrep) closed with a quietly interesting finding: when I ported the §5 `workingSetFit` metric to Go and Rust and ran it against both repos, they landed within two percentage points of each other — **dive at 52.17%** ([@d6c69194](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0)) and **ripgrep at 54.00%** ([@4519153e](https://github.com/BurntSushi/ripgrep/commit/4519153e5e461527f4bca45b042fff45c4ec6fb9)). I noted in the home page table that *"workingSetFit in the 50–55% range may be characteristic of mature compiled-language CLI tools — a hypothesis that needs more datapoints to confirm."* + +This post tests that hypothesis. n=2 → n=7, same tool, same bounds, same exclusion rules. Pinned SHAs throughout. The headline: + +> **The convergence was an n=2 coincidence.** The actual baseline distribution among seven mature compiled-language CLI tools spans **27 percentage points** — from 46.27% (bat) to 73.59% (cli/gh) — with mean **60.68%** and sample stddev **10.13pp**. + +But the convergence wasn't *entirely* an artefact: five of the seven projects fall inside the band **[52%, 70%]** (an 18-point window, not 2), and that clustering does suggest something real about how mature CLI codebases distribute their file sizes. The story is just more textured than n=2 implied. + +## The corpus + +Five new repos cloned and measured, joining `dive` and `ripgrep`: + +| project | language | role | stars (approx) | clone command | +|---|---|---|---|---| +| **sharkdp/bat** | Rust | syntax-highlighted `cat` | ~50k | `git clone --depth=1 https://github.com/sharkdp/bat.git` | +| **sharkdp/fd** | Rust | user-friendly `find` | ~37k | `git clone --depth=1 https://github.com/sharkdp/fd.git` | +| **eza-community/eza** | Rust | modern `ls` (fork of `exa`) | ~12k | `git clone --depth=1 https://github.com/eza-community/eza.git` | +| **jesseduffield/lazygit** | Go | terminal UI for git | ~60k | `git clone --depth=1 https://github.com/jesseduffield/lazygit.git` | +| **cli/cli** | Go | GitHub's official `gh` CLI | ~37k | `git clone --depth=1 https://github.com/cli/cli.git` | + +Corpus criteria: each project is a CLI tool, widely used (10k+ stars), mature (5+ year codebase), and primarily written in its target language. `dive` and `ripgrep` from the prior audits round out a 4-Rust / 3-Go split. + +## Methodology + +The [polyglot §5 emitter](/GIT/syntaxai/tdd.md/blob/main/scripts/measure-working-set.ts) at `scripts/measure-working-set.ts` was used unchanged. The bounds **[50, 500] LOC inclusive** are imported from `WORKING_SET_MIN_LOC` and `WORKING_SET_MAX_LOC` in [`src/a31_sama_v2.ts`](/GIT/syntaxai/tdd.md/blob/main/src/a31_sama_v2.ts) — the same constants the `/sama/v2/verify` page uses against this site's own source. Single source of truth: the cross-repo numbers are computed against the *exact* band the spec defines. + +LOC for each file = `content.split("\n").length`, matching the TS reference implementation byte-for-byte. Test-file exclusion rule: Go excludes `*_test.go` (mirroring TS's `*.test.ts` exclusion); Rust includes all `.rs` files because Rust's convention is inline `#[cfg(test)] mod tests` — formalised at [/sama/v2 §6.2 inline-tests dialect](/sama/v2#62-inline-tests-dialect). Skipped directories: `.git/`, `target/`, `vendor/`, `node_modules/`, all dotdirs. + +### Hand-trace — bat (the lowest measurement) + +Per [/sama/v2 §0](/sama/v2) the verifier is a deterministic program; that claim is only auditable if a human can reproduce the number from the data. So: + +```bash +cd /tmp/bat # at SHA f3d07734 +find . -name '*.rs' -type f \ + -not -path '*/.git/*' -not -path '*/target/*' \ + -not -path '*/vendor/*' -not -path '*/node_modules/*' \ + | wc -l +# 67 total .rs files + +# For each file, count newlines, add 1, check [50, 500] inclusive: +in_band=0 +while read -r f; do + newlines=$(tr -cd '\n' < "$f" | wc -c) + lines=$((newlines + 1)) + if [ "$lines" -ge 50 ] && [ "$lines" -le 500 ]; then + in_band=$((in_band + 1)) + fi +done < <(find . -name '*.rs' -type f \ + -not -path '*/.git/*' -not -path '*/target/*' \ + -not -path '*/vendor/*' -not -path '*/node_modules/*') +echo "in band: $in_band" +# 31 +echo "ratio: $(echo "scale=4; $in_band / 67" | bc)" +# .4626 +``` + +The polyglot emitter produces the same numbers: 67 total, 31 included, ratio 0.4627 (rounding-bit difference at the fifth decimal). 46.27% measured. Auditable per §0. + +## The seven datapoints + +Sorted by `workingSetFit` descending: + +| rank | project | language | SHA | total | included | ratio | % | +|---:|---|---|---|---:|---:|---:|---:| +| 1 | [cli/cli (gh)](https://github.com/cli/cli/commit/e53ff321f06514b5ba290bbc4ef84f7e0efcd3dd) | Go | `e53ff321` | 515 | 379 | 0.7359 | **73.59%** | +| 2 | [sharkdp/fd](https://github.com/sharkdp/fd/commit/42b2ab8a84ddedf80eeed9079128c60161f64658) | Rust | `42b2ab8a` | 23 | 16 | 0.6957 | **69.57%** | +| 3 | [jesseduffield/lazygit](https://github.com/jesseduffield/lazygit/commit/608c90ae3c1c99ffad9324bfc2613d9d46599992) | Go | `608c90ae` | 883 | 595 | 0.6738 | **67.38%** | +| 4 | [eza-community/eza](https://github.com/eza-community/eza/commit/eed27ed05e74542af5852aed40e3dbff87d69c43) | Rust | `eed27ed0` | 68 | 42 | 0.6176 | **61.76%** | +| 5 | [BurntSushi/ripgrep](https://github.com/BurntSushi/ripgrep/commit/4519153e5e461527f4bca45b042fff45c4ec6fb9) | Rust | `4519153e` | 100 | 54 | 0.5400 | **54.00%** | +| 6 | [wagoodman/dive](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0) | Go | `d6c69194` | 92 | 48 | 0.5217 | **52.17%** | +| 7 | [sharkdp/bat](https://github.com/sharkdp/bat/commit/f3d077346824eae07fbac4b56466d27049b9616e) | Rust | `f3d07734` | 67 | 31 | 0.4627 | **46.27%** | + +For reference (not included in the cross-repo baseline because it's the *SAMA-disciplined dogfood*, not a non-SAMA mature CLI tool): **tdd.md** (this site, TypeScript) measures **80.00%** at the live `/sama/v2/verify` endpoint. + +## Distribution + +``` + 46 50 55 60 65 70 75 + |---|-------|-------|-------|-------|-------| + bat 46.27 + dive 52.17 + ripgrep 54.00 + eza 61.76 + lazygit 67.38 + fd 69.57 + cli/gh 73.59 + (mature CLI baseline) + ---80.00--- tdd.md (SAMA) +``` + +- **Range**: 46.27% – 73.59% (spread 27.32 percentage points) +- **Mean**: 60.68% +- **Median**: 61.76% (eza) +- **Sample stddev**: 10.13 pp +- **Inter-quartile range** (sort positions 2 and 6): 52.17% – 69.57% (spread 17.40 pp) + +Five of seven projects fall in **[52%, 70%]** — a real clustering, though wider than the dive/ripgrep coincidence suggested. + +## Go vs Rust subset + +| subset | n | mean | median | range | +|---|---:|---:|---:|---| +| Go (cli, lazygit, dive) | 3 | 64.38% | 67.38% | 52.17–73.59 (21.42 pp) | +| Rust (fd, eza, ripgrep, bat) | 4 | 57.90% | 57.88% | 46.27–69.57 (23.30 pp) | + +Go averages ~6 percentage points higher than Rust at n=3 vs n=4. Sample sizes are small; the gap may not survive a larger corpus. But: nothing in either subset cleanly clusters; both span ~20+ points. The hypothesis that "Go projects are tighter than Rust projects on this axis" is *consistent* with the data but not *evidenced* by it. + +## Per-project notes + +A 1-2 sentence read on what each project's distribution implies. The polyglot emitter's `--verbose` flag emits the per-file LOC breakdown if you want to follow up. + +- **cli/cli at 73.59%** — the highest measured score. 515 Go files, of which 379 land in band. Reading the over-band tail reveals it's mostly large command-handler files (`pkg/cmd/repo/sync/sync.go` and similar) — natural behavioural cohesion, not god-classes. Likely a real architectural fit signal. + +- **sharkdp/fd at 69.57%** — second highest, and the smallest project in the corpus by file count (23 .rs files). High `workingSetFit` partly reflects that there are few files to be tiny stubs against. With n=23, the metric is noisier; honest to report. + +- **jesseduffield/lazygit at 67.38%** — the biggest project in the corpus (883 .go files) and still clears 67%. That's the impressive number in the table: even at scale, a Go TUI keeps two-thirds of its files in the substantive-module band. + +- **eza-community/eza at 61.76%** — median of the seven. The audit-style observation: eza inherits its layout from `exa` (its predecessor) and the file-size distribution looks deliberate — small modules tend to be the leaf-renderers for one column-formatter each, not stubs. + +- **BurntSushi/ripgrep at 54.00%** — the prior audit identified 30 files over 500 LOC. Most are the textbook declarative-exempt cases the [§6.3 declarative-exemption dialect](/sama/v2#63-declarative-exemption-dialect) was drafted for; the raw metric doesn't distinguish them. The audit goes into more detail. + +- **wagoodman/dive at 52.17%** — the prior audit identified the opposite shape: **0 files over 500 LOC, 44 under 50 LOC**. Tiny type-stubs and platform-shims pull the score down, not god-classes. + +- **sharkdp/bat at 46.27%** — the lowest measurement. Reading the distribution: the over-band tail (`src/printer.rs` at ~2,100 LOC, `src/assets.rs`, `src/config.rs`) is sizeable, but the under-50 tail is also substantial. Bat has many small "language definition" modules that pre-build syntax highlighting for the supported languages — by-construction declarative shards. Like the ripgrep `defs.rs` case, the raw metric doesn't distinguish them from "this file is too small." + +## What this answers and what it doesn't + +**Answers the convergence question**: the dive/ripgrep 2-point landing was n=2 coincidence. The real distribution spans ~27 percentage points. But there's still a real clustering effect: most mature CLI tools land between 50% and 70%, with the median right at 60%. + +**Does not yet answer the SAMA-vs-non-SAMA question**. That requires a *second* SAMA-disciplined repo measured against the same axes, and only one exists today (this site, at 80%). One SAMA datapoint above the entire non-SAMA distribution is *suggestive* — tdd.md's 80% sits 6.4 percentage points above the top of the mature-CLI baseline (cli/gh, 73.59%) — but n=1 vs n=7 is far from a SAMA-worth-following claim. §6 of the spec is explicit that promotion requires cross-repo *deltas*, not a single dogfood. + +What this run *does* establish: + +1. **The empirical chain is now n=7 measured against the same bounds.** Before today, the cross-repo argument was "tdd.md is measured, the audits are hand-estimated." Now the audits *and* five new baseline datapoints are measured. The estimates are gone from this column of the table. +2. **The metric is more discriminating than n=2 implied.** A 27-point spread is meaningful — workingSetFit *does* distinguish projects from one another, even within the narrow category of "mature compiled-language CLI tools." +3. **The §6 falsifiable experiment is now well-conditioned.** When a second SAMA repo exists, comparing its workingSetFit against this seven-row baseline is a real test, not a vibes call. The baseline distribution (mean, range, stddev) is what the test compares against. + +## Reproducibility + +Anyone with the polyglot emitter and the pinned SHAs can reproduce these numbers exactly. The repo has the tool; the SHAs are in the table above; the bounds live in source as constants. Run: + +```bash +git clone --depth=1 https://github.com/sharkdp/bat.git /tmp/bat +cd /tmp/bat && git checkout f3d077346824eae07fbac4b56466d27049b9616e +bun /path/to/tdd.md/scripts/measure-working-set.ts /tmp/bat --lang rust +# {"total": 67, "included": 31, "ratio": 0.4626865671641791, "ratioPercent": 46.27} +``` + +That's the §0 contract: the program is deterministic; the same source tree + same bounds produces the same number; a human can reproduce it from the spec. Seven times over, now. + +--- + +**Companion posts:** + +- [The dive audit](/blog/sama-v2-go-project-dive) — where the dive measurement is hand-traced +- [The ripgrep audit](/blog/sama-v2-rust-project-ripgrep) — where the ripgrep measurement is hand-traced +- [The §5 metrics emitter post](/blog/sama-v2-metrics-emitter) — why measurement matters more than estimates +- [The v2.1 dialects (§6.1–6.3)](/sama/v2#6a-v21-dialects-provisional) — particularly §6.2 inline-tests (load-bearing for the Rust file-counting rule above) and §6.3 declarative-exemption (the policy lens for what the raw metric can't distinguish) diff --git a/content/home.md b/content/home.md index e83005853f554c6ffecbec74ddc34944431b05b6..f4692e2c1ee001a1d57431b387359d9396d5eef6 100644 --- a/content/home.md +++ b/content/home.md @@ -58,16 +58,21 @@ SAMA bundles those findings into four constraints a CI job can enforce. *Sorted* ## Datapoints on the same axes -Empirical baseline so far. The §4 score for this site is [computed live](/sama/v2/verify); the §4 scores for the other repos are hand-estimated. The **workingSetFit** column is now measured for three of the four repos by the polyglot §5 emitter at [`scripts/measure-working-set.ts`](/GIT/syntaxai/tdd.md/blob/main/scripts/measure-working-set.ts); the remaining columns are still hand-estimated where flagged. +Empirical baseline so far. The §4 score for this site is [computed live](/sama/v2/verify); the §4 scores for the other repos are hand-estimated. The **workingSetFit** column is now measured for the SAMA dogfood (this site) and seven non-SAMA mature compiled-language CLI tools by the polyglot §5 emitter at [`scripts/measure-working-set.ts`](/GIT/syntaxai/tdd.md/blob/main/scripts/measure-working-set.ts) — see the [seven-datapoint baseline post](/blog/sama-v2-workingset-cross-repo-baseline) for the full table, distribution, and hand-trace. | project | language | §4 score | workingSetFit | boundaryRatio | graphDepth | |---|---|---|---|---|---| -| **tdd.md** (this site) | TypeScript | **7 / 7 ✓** (measured) | 80% (measured) | 100% (measured) | 7 (measured) | -| [**wagoodman/dive**](/blog/sama-v2-go-project-dive) | Go | ~5 / 7 (estimated) | **52.17%** (measured, [@d6c69194](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0)) | ~85% (estimated) | ~5 (estimated) | +| **tdd.md** (this site, SAMA-disciplined) | TypeScript | **7 / 7 ✓** (measured) | **80.00%** (measured) | 100% (measured) | 7 (measured) | +| [**cli/cli (gh)**](https://github.com/cli/cli) | Go | n/a (not audited) | **73.59%** (measured, [@e53ff321](https://github.com/cli/cli/commit/e53ff321f06514b5ba290bbc4ef84f7e0efcd3dd)) | — | — | +| [**sharkdp/fd**](https://github.com/sharkdp/fd) | Rust | n/a (not audited) | **69.57%** (measured, [@42b2ab8a](https://github.com/sharkdp/fd/commit/42b2ab8a84ddedf80eeed9079128c60161f64658)) | — | — | +| [**jesseduffield/lazygit**](https://github.com/jesseduffield/lazygit) | Go | n/a (not audited) | **67.38%** (measured, [@608c90ae](https://github.com/jesseduffield/lazygit/commit/608c90ae3c1c99ffad9324bfc2613d9d46599992)) | — | — | +| [**eza-community/eza**](https://github.com/eza-community/eza) | Rust | n/a (not audited) | **61.76%** (measured, [@eed27ed0](https://github.com/eza-community/eza/commit/eed27ed05e74542af5852aed40e3dbff87d69c43)) | — | — | | [**BurntSushi/ripgrep**](/blog/sama-v2-rust-project-ripgrep) | Rust | ~3-5 / 7 (estimated, depends on v2.1 dialect uptake) | **54.00%** (measured, [@4519153e](https://github.com/BurntSushi/ripgrep/commit/4519153e5e461527f4bca45b042fff45c4ec6fb9)) | ~95% (estimated) | ~5 (estimated) | +| [**wagoodman/dive**](/blog/sama-v2-go-project-dive) | Go | ~5 / 7 (estimated) | **52.17%** (measured, [@d6c69194](https://github.com/wagoodman/dive/commit/d6c691947f8fda635c952a17ee3b7555379d58f0)) | ~85% (estimated) | ~5 (estimated) | +| [**sharkdp/bat**](https://github.com/sharkdp/bat) | Rust | n/a (not audited) | **46.27%** (measured, [@f3d07734](https://github.com/sharkdp/bat/commit/f3d077346824eae07fbac4b56466d27049b9616e)) | — | — | | [**Open Graph plugin**](/blog/sama-v2-wordpress-plugin-audit) | PHP / WordPress | 0 / 7 (estimated) | ~47% (estimated) | <10% (estimated) | ~3 (estimated) | -Four points is not yet a "v2 is worth following" claim. §6 of the spec is explicit that promotion to official requires cross-repo *deltas*, not a single dogfood. But three workingSetFit rows are now *measured* against the same bounds the spec defines — a quiet but load-bearing step from "we have numbers" to "we have *the same* numbers across repos." The cross-repo signal that emerges: ripgrep (54.00%) and dive (52.17%) land within two percentage points of each other, suggesting workingSetFit in the 50–55% range may be characteristic of mature compiled-language CLI tools — a hypothesis that needs more datapoints to confirm but is now *testable* in a way it was not when the numbers were all eyeballed. +**The cross-repo signal that emerged**: across the seven non-SAMA mature CLI tools, `workingSetFit` ranges from 46.27% (bat) to 73.59% (cli/gh) — a 27-point spread, mean **60.68%**, sample stddev **10.13pp**. Five of seven cluster inside [52%, 70%]. The original dive/ripgrep 2-point convergence at n=2 was coincidence; the actual distribution is wider, but the clustering is real. **tdd.md** (the SAMA-disciplined dogfood) measures 80.00% — 6.4 percentage points above the top of the non-SAMA baseline. Suggestive but n=1 vs n=7 is far from a SAMA-worth-following claim. §6 of the spec is explicit that promotion requires cross-repo *deltas* across multiple SAMA-disciplined repos; only one exists today. What this nine-row table *does* establish: the empirical chain is now eight workingSetFit values measured against the same bounds the spec defines, which is the prerequisite §6 was always asking for. ## See it in practice diff --git a/src/a31_blog.ts b/src/a31_blog.ts index 3614fc2d52734edb50616e44f24a2ca0b601075f..5f086fac3ac56a7eee272fb88a437cd7fa1d2773 100644 --- a/src/a31_blog.ts +++ b/src/a31_blog.ts @@ -12,6 +12,12 @@ export interface BlogEntry { } export const ALL_POSTS: BlogEntry[] = [ + { + slug: "sama-v2-workingset-cross-repo-baseline", + title: "Was the dive/ripgrep convergence real? Seven measured workingSetFit datapoints", + description: "The dive/ripgrep audits ended with a quietly interesting finding: when the polyglot §5 emitter ran against both, they landed within 2 percentage points of each other (52.17% and 54.00%). I noted on the home page that this *might* be characteristic of mature compiled-language CLI tools — a hypothesis that needs more datapoints to confirm. This post tests it. n=2 → n=7. Cloned 5 more popular CLI tools at pinned SHAs (sharkdp/bat, sharkdp/fd, eza-community/eza, jesseduffield/lazygit, cli/cli), ran the same emitter with the same bounds imported from a31_sama_v2.ts. Headline: the convergence was n=2 coincidence. The actual distribution spans 27 percentage points — bat at 46.27% (lowest) to cli/gh at 73.59% (highest). Mean 60.68%, median 61.76% (eza), sample stddev 10.13pp. But there IS clustering: five of seven projects fall within [52%, 70%] — an 18-point window, not 2. The metric is more discriminating than n=2 implied, and the clustering is real. Go subset (cli, lazygit, dive) averages ~6pp higher than Rust subset (fd, eza, ripgrep, bat) at small n. Per-project notes on what each distribution implies — cli/gh's high score reflects natural command-handler cohesion; bat's low score reflects pre-built syntax-highlighting language-definition shards (the same declarative-exemption case the §6.3 dialect was drafted for); dive's miss reflects platform-shim stubs not god-classes. tdd.md (the SAMA-disciplined dogfood) measures 80% — 6.4 percentage points above the top of the non-SAMA mature-CLI baseline. Suggestive but n=1 vs n=7 is not a SAMA-worth-following claim. What this run does establish: the empirical chain is now n=7 measured against the same bounds; the §6 falsifiable experiment is well-conditioned for when a second SAMA repo exists. Includes a hand-trace of bat (the lowest measurement) per the §0 deterministic-program contract, mirroring the dive audit's hand-trace pattern. Reproducibility: pinned SHAs throughout; anyone can clone-and-run.", + date: "2026-05-27", + }, { slug: "sama-v2-rust-project-ripgrep-parallel-fleet", title: "The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection",