Was the dive/ripgrep convergence real? Seven measured workingSetFit datapoints
The dive audit and ripgrep audit closed with a quietly interesting finding: when I ported the §5 workingSetFit metric to Go and Rust and ran it against both repos, they landed within two percentage points of each other — dive at 52.17% (@d6c69194) and ripgrep at 54.00% (@4519153e). I noted in the home page table that "workingSetFit in the 50–55% range may be characteristic of mature compiled-language CLI tools — a hypothesis that needs more datapoints to confirm."
This post tests that hypothesis. n=2 → n=7, same tool, same bounds, same exclusion rules. Pinned SHAs throughout. The headline:
The convergence was an n=2 coincidence. The actual baseline distribution among seven mature compiled-language CLI tools spans 27 percentage points — from 46.27% (bat) to 73.59% (cli/gh) — with mean 60.68% and sample stddev 10.13pp.
But the convergence wasn't entirely an artefact: five of the seven projects fall inside the band [52%, 70%] (an 18-point window, not 2), and that clustering does suggest something real about how mature CLI codebases distribute their file sizes. The story is just more textured than n=2 implied.
#The corpus
Five new repos cloned and measured, joining dive and ripgrep:
| project | language | role | stars (approx) | clone command |
|---|---|---|---|---|
| sharkdp/bat | Rust | syntax-highlighted cat |
~50k | git clone --depth=1 https://github.com/sharkdp/bat.git |
| sharkdp/fd | Rust | user-friendly find |
~37k | git clone --depth=1 https://github.com/sharkdp/fd.git |
| eza-community/eza | Rust | modern ls (fork of exa) |
~12k | git clone --depth=1 https://github.com/eza-community/eza.git |
| jesseduffield/lazygit | Go | terminal UI for git | ~60k | git clone --depth=1 https://github.com/jesseduffield/lazygit.git |
| cli/cli | Go | GitHub's official gh CLI |
~37k | git clone --depth=1 https://github.com/cli/cli.git |
Corpus criteria: each project is a CLI tool, widely used (10k+ stars), mature (5+ year codebase), and primarily written in its target language. dive and ripgrep from the prior audits round out a 4-Rust / 3-Go split.
#Methodology
The polyglot §5 emitter at scripts/measure-working-set.ts was used unchanged. The bounds [50, 500] LOC inclusive are imported from WORKING_SET_MIN_LOC and WORKING_SET_MAX_LOC in src/a31_sama_v2.ts — the same constants the /sama/v2/verify page uses against this site's own source. Single source of truth: the cross-repo numbers are computed against the exact band the spec defines.
LOC for each file = content.split("\n").length, matching the TS reference implementation byte-for-byte. Test-file exclusion rule: Go excludes *_test.go (mirroring TS's *.test.ts exclusion); Rust includes all .rs files because Rust's convention is inline #[cfg(test)] mod tests — formalised at /sama/v2 §6.2 inline-tests dialect. Skipped directories: .git/, target/, vendor/, node_modules/, all dotdirs.
#Hand-trace — bat (the lowest measurement)
Per /sama/v2 §0 the verifier is a deterministic program; that claim is only auditable if a human can reproduce the number from the data. So:
cd /tmp/bat # at SHA f3d07734
find . -name '*.rs' -type f \
-not -path '*/.git/*' -not -path '*/target/*' \
-not -path '*/vendor/*' -not -path '*/node_modules/*' \
| wc -l
# 67 total .rs files
# For each file, count newlines, add 1, check [50, 500] inclusive:
in_band=0
while read -r f; do
newlines=$(tr -cd '\n' < "$f" | wc -c)
lines=$((newlines + 1))
if [ "$lines" -ge 50 ] && [ "$lines" -le 500 ]; then
in_band=$((in_band + 1))
fi
done < <(find . -name '*.rs' -type f \
-not -path '*/.git/*' -not -path '*/target/*' \
-not -path '*/vendor/*' -not -path '*/node_modules/*')
echo "in band: $in_band"
# 31
echo "ratio: $(echo "scale=4; $in_band / 67" | bc)"
# .4626
The polyglot emitter produces the same numbers: 67 total, 31 included, ratio 0.4627 (rounding-bit difference at the fifth decimal). 46.27% measured. Auditable per §0.
#The seven datapoints
Sorted by workingSetFit descending:
| rank | project | language | SHA | total | included | ratio | % |
|---|---|---|---|---|---|---|---|
| 1 | cli/cli (gh) | Go | e53ff321 |
515 | 379 | 0.7359 | 73.59% |
| 2 | sharkdp/fd | Rust | 42b2ab8a |
23 | 16 | 0.6957 | 69.57% |
| 3 | jesseduffield/lazygit | Go | 608c90ae |
883 | 595 | 0.6738 | 67.38% |
| 4 | eza-community/eza | Rust | eed27ed0 |
68 | 42 | 0.6176 | 61.76% |
| 5 | BurntSushi/ripgrep | Rust | 4519153e |
100 | 54 | 0.5400 | 54.00% |
| 6 | wagoodman/dive | Go | d6c69194 |
92 | 48 | 0.5217 | 52.17% |
| 7 | sharkdp/bat | Rust | f3d07734 |
67 | 31 | 0.4627 | 46.27% |
For reference (not included in the cross-repo baseline because it's the SAMA-disciplined dogfood, not a non-SAMA mature CLI tool): tdd.md (this site, TypeScript) measures 80.00% at the live /sama/v2/verify endpoint.
#Distribution
46 50 55 60 65 70 75
|---|-------|-------|-------|-------|-------|
bat 46.27
dive 52.17
ripgrep 54.00
eza 61.76
lazygit 67.38
fd 69.57
cli/gh 73.59
(mature CLI baseline)
---80.00--- tdd.md (SAMA)
- Range: 46.27% – 73.59% (spread 27.32 percentage points)
- Mean: 60.68%
- Median: 61.76% (eza)
- Sample stddev: 10.13 pp
- Inter-quartile range (sort positions 2 and 6): 52.17% – 69.57% (spread 17.40 pp)
Five of seven projects fall in [52%, 70%] — a real clustering, though wider than the dive/ripgrep coincidence suggested.
#Go vs Rust subset
| subset | n | mean | median | range |
|---|---|---|---|---|
| Go (cli, lazygit, dive) | 3 | 64.38% | 67.38% | 52.17–73.59 (21.42 pp) |
| Rust (fd, eza, ripgrep, bat) | 4 | 57.90% | 57.88% | 46.27–69.57 (23.30 pp) |
Go averages ~6 percentage points higher than Rust at n=3 vs n=4. Sample sizes are small; the gap may not survive a larger corpus. But: nothing in either subset cleanly clusters; both span ~20+ points. The hypothesis that "Go projects are tighter than Rust projects on this axis" is consistent with the data but not evidenced by it.
#Per-project notes
A 1-2 sentence read on what each project's distribution implies. The polyglot emitter's --verbose flag emits the per-file LOC breakdown if you want to follow up.
cli/cli at 73.59% — the highest measured score. 515 Go files, of which 379 land in band. Reading the over-band tail reveals it's mostly large command-handler files (
pkg/cmd/repo/sync/sync.goand similar) — natural behavioural cohesion, not god-classes. Likely a real architectural fit signal.sharkdp/fd at 69.57% — second highest, and the smallest project in the corpus by file count (23 .rs files). High
workingSetFitpartly reflects that there are few files to be tiny stubs against. With n=23, the metric is noisier; honest to report.jesseduffield/lazygit at 67.38% — the biggest project in the corpus (883 .go files) and still clears 67%. That's the impressive number in the table: even at scale, a Go TUI keeps two-thirds of its files in the substantive-module band.
eza-community/eza at 61.76% — median of the seven. The audit-style observation: eza inherits its layout from
exa(its predecessor) and the file-size distribution looks deliberate — small modules tend to be the leaf-renderers for one column-formatter each, not stubs.BurntSushi/ripgrep at 54.00% — the prior audit identified 30 files over 500 LOC. Most are the textbook declarative-exempt cases the §6.3 declarative-exemption dialect was drafted for; the raw metric doesn't distinguish them. The audit goes into more detail.
wagoodman/dive at 52.17% — the prior audit identified the opposite shape: 0 files over 500 LOC, 44 under 50 LOC. Tiny type-stubs and platform-shims pull the score down, not god-classes.
sharkdp/bat at 46.27% — the lowest measurement. Reading the distribution: the over-band tail (
src/printer.rsat ~2,100 LOC,src/assets.rs,src/config.rs) is sizeable, but the under-50 tail is also substantial. Bat has many small "language definition" modules that pre-build syntax highlighting for the supported languages — by-construction declarative shards. Like the ripgrepdefs.rscase, the raw metric doesn't distinguish them from "this file is too small."
#What this answers and what it doesn't
Answers the convergence question: the dive/ripgrep 2-point landing was n=2 coincidence. The real distribution spans ~27 percentage points. But there's still a real clustering effect: most mature CLI tools land between 50% and 70%, with the median right at 60%.
Does not yet answer the SAMA-vs-non-SAMA question. That requires a second SAMA-disciplined repo measured against the same axes, and only one exists today (this site, at 80%). One SAMA datapoint above the entire non-SAMA distribution is suggestive — tdd.md's 80% sits 6.4 percentage points above the top of the mature-CLI baseline (cli/gh, 73.59%) — but n=1 vs n=7 is far from a SAMA-worth-following claim. §6 of the spec is explicit that promotion requires cross-repo deltas, not a single dogfood.
What this run does establish:
- The empirical chain is now n=7 measured against the same bounds. Before today, the cross-repo argument was "tdd.md is measured, the audits are hand-estimated." Now the audits and five new baseline datapoints are measured. The estimates are gone from this column of the table.
- The metric is more discriminating than n=2 implied. A 27-point spread is meaningful — workingSetFit does distinguish projects from one another, even within the narrow category of "mature compiled-language CLI tools."
- The §6 falsifiable experiment is now well-conditioned. When a second SAMA repo exists, comparing its workingSetFit against this seven-row baseline is a real test, not a vibes call. The baseline distribution (mean, range, stddev) is what the test compares against.
#Reproducibility
Anyone with the polyglot emitter and the pinned SHAs can reproduce these numbers exactly. The repo has the tool; the SHAs are in the table above; the bounds live in source as constants. Run:
git clone --depth=1 https://github.com/sharkdp/bat.git /tmp/bat
cd /tmp/bat && git checkout f3d077346824eae07fbac4b56466d27049b9616e
bun /path/to/tdd.md/scripts/measure-working-set.ts /tmp/bat --lang rust
# {"total": 67, "included": 31, "ratio": 0.4626865671641791, "ratioPercent": 46.27}
That's the §0 contract: the program is deterministic; the same source tree + same bounds produces the same number; a human can reproduce it from the spec. Seven times over, now.
Companion posts:
- The dive audit — where the dive measurement is hand-traced
- The ripgrep audit — where the ripgrep measurement is hand-traced
- The §5 metrics emitter post — why measurement matters more than estimates
- The v2.1 dialects (§6.1–6.3) — particularly §6.2 inline-tests (load-bearing for the Rust file-counting rule above) and §6.3 declarative-exemption (the policy lens for what the raw metric can't distinguish)