syntaxai/tdd.md · commit 1067fc9

Blog: pointing SAMA v2 at ripgrep — three spec findings from BurntSushi's exemplar

Fourth real-world audit after this site (TS, 7/7), dive (Go, ~5/7),
and the WP plugin (PHP, 0/7). Picked BurntSushi/ripgrep — 64k stars,
ten years of refinement, ten crates, 45,927 lines of Rust. The
carefully-architected exemplar of Rust CLI design.

Cloned, walked the workspace, scored against §4. The crate dependency
graph already reads like a SAMA v2 layer chart: matcher trait at the
bottom, core binary at the top, searcher/printer/globset/ignore in
between. BurntSushi did the layering work; he just didn't call it that.

Three spec-evolution findings the audit surfaces:
1. Sorted needs the directory-based dialect (Go finding, confirmed here)
2. Modeled-tests needs an inline-tests mode — Rust convention is
   #[test] inside the source file, not sibling files (NEW)
3. Atomic-700 needs a declarative-file exemption — the 7,779-line
   flags/defs.rs is a CLI-flag catalog with near-zero cyclomatic
   complexity per line; splitting it would scatter what's naturally
   one long table (NEW)

§5 metric estimate that surprises: workingSetFit ~60% for ripgrep,
LOWER than tdd.md or dive — but the over-cap files are appropriate to
their content. That's the §5 intent: the metric surfaces a property;
whether it's good or bad depends on what the file SHOULD be.
Compliance scores conflate; metrics separate.

Strict score: ~3/7. Under proposed v2.1 dialects: ~5-6/7. ripgrep is
so close to compliant that the work isn't on ripgrep — it's on v2.

n=4 datapoints now. Pattern: strongly-typed compiled-language projects
(Go, Rust) cluster near the dogfood; the WP codebase is the outlier on
every axis. Whether that's "the language enforces architecture for
free" or "people who pick Go/Rust care more about architecture" is the
experiment §6 hasn't run yet.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

author: syntaxai <[email protected]>
date: 2026-05-24 09:39:00 +01:00
parent: 7fb3cd8
commit: 1067fc93b3068f45ef87c12a7747177e0c5f27e0

2 files changed · +204 −0

added content/blog/sama-v2-rust-project-ripgrep.md +198 −0

@@ -0,0 +1,198 @@
	1	+# Pointing SAMA v2 at `ripgrep`: BurntSushi's exemplar surfaces three findings about the spec
	2	+
	3	+The [WordPress plugin audit](/blog/sama-v2-wordpress-plugin-audit) scored 0/7 because the plugin was written under no architectural discipline at all. The [Go `dive` audit](/blog/sama-v2-go-project-dive) scored ~5/7 because Go's standard layout enforces a lot of v2's rules for free. Both audits taught us something about v2 by what they failed against.
	4	+
	5	+Today's question: what does v2 see when pointed at code that is, by reputation, exemplary?
	6	+
	7	+For the test, `BurntSushi/ripgrep` — 64k stars, ten years of refinement, ten crates, 45,927 lines of Rust. Andrew Gallant's code is widely studied and the workspace's architectural choices have been imitated across the Rust ecosystem. If v2 has problems, ripgrep is where they'll surface — because the obvious failures (god-classes, scattered I/O, untyped data) just aren't there.
	8	+
	9	+Three findings did surface. All three are about v2, not about ripgrep.
	10	+
	11	+## What's in the box
	12	+
	13	+A clean Cargo workspace:
	14	+
	15	+```
	16	+ripgrep/
	17	+├── Cargo.toml # workspace manifest
	18	+├── crates/
	19	+│ ├── core/ 13,128 LOC, 20 files # the binary itself
	20	+│ ├── printer/ 9,320 LOC, 13 files # output formatting (color, JSON, hyperlinks)
	21	+│ ├── ignore/ 6,639 LOC, 9 files # gitignore + directory walking
	22	+│ ├── searcher/ 6,511 LOC, 11 files # the search loop, line buffering
	23	+│ ├── globset/ 3,229 LOC, 6 files # high-performance glob matching
	24	+│ ├── regex/ 2,782 LOC, 9 files # Rust regex adapter
	25	+│ ├── cli/ 1,866 LOC, 8 files # terminal detection, escapes, stdout helpers
	26	+│ ├── matcher/ 1,710 LOC, 2 files # the central `Matcher` trait
	27	+│ ├── pcre2/ 569 LOC, 3 files # alternative PCRE2 backend
	28	+│ └── grep/ 90 LOC, 2 files # meta-crate (re-exports)
	29	+└── tests/ # integration tests
	30	+```
	31	+
	32	+Tests: 38 source files contain `#[test]` blocks inline (the Rust convention — tests live in the same module they test, behind a `#[cfg(test)]` gate). There are also 15 separate integration test files under `tests/`. By Rust standards this is well-tested.
	33	+
	34	+The crate dependency graph is strictly acyclic and roughly:
	35	+
	36	+```
	37	+matcher (the trait)
	38	+ ↑
	39	+ ├── regex, pcre2, globset (matcher implementations)
	40	+ ↑
	41	+ ├── searcher (uses any Matcher)
	42	+ ↑
	43	+ ├── printer (formats searcher results)
	44	+ ↑
	45	+ └── core (the binary; uses ignore + cli + printer + searcher)
	46	+```
	47	+
	48	+That graph reads exactly like a SAMA v2 layer chart: `matcher` is the Pure abstraction at the bottom, `core` is the Entry at the top, everything between is layered by dependency direction. BurntSushi did the layering work; he just didn't call it that.
	49	+
	50	+## §4 conformance — what the verifier would report
	51	+
	52	+Walking the seven checks against this workspace:
	53	+
	54	+### #1 Sorted — would fail under v2.0
	55	+
	56	+Same finding as the [Go `dive` audit](/blog/sama-v2-go-project-dive): Rust organizes by crate + module directory, not by filename prefix. Files inside `crates/searcher/src/` are `glue.rs`, `mod.rs`, `searcher.rs`, etc. — descriptive, not layer-marking. The v2.0 lex-sort-the-filename-prefixes rule does not translate.
	57	+
	58	+Under the hypothetical v2.1 directory-based dialect (the [`dive` rebuild post](/blog/sama-v2-go-project-dive-rebuilt) proposes it formally), Sorted becomes "crate directories declared in layer order in the profile; no import edge violates that order." ripgrep would pass cleanly because the crate dependency graph already runs that direction.
	59	+
	60	+### #2 Architecture — would pass under the directory dialect
	61	+
	62	+A natural profile maps every file unambiguously:
	63	+
	64	+```toml
	65	+sama_version = "2.0"
	66	+profile = "ripgrep"
	67	+layout = "directory"
	68	+
	69	+[layers.0] # the trait abstraction every other crate depends on
	70	+crates = ["matcher"]
	71	+
	72	+[layers.1] # pure algorithms, no I/O
	73	+sublayers = [
	74	+ { name = "engine", crates = ["regex", "pcre2", "globset"] },
	75	+ { name = "algorithm", crates = ["searcher", "printer"] },
	76	+]
	77	+
	78	+[layers.2] # the filesystem-touching adapter
	79	+crates = ["ignore", "cli"]
	80	+
	81	+[layers.3] # the binary entry
	82	+crates = ["core", "grep"]
	83	+```
	84	+
	85	+This passes Architecture. Every file is mapped, no ambiguity.
	86	+
	87	+### #3 Modeled (tests) — the second spec-evolution finding
	88	+
	89	+The verifier looks for sibling test files (`foo.ts` + `foo.test.ts`, or for PHP `foo.php` + `foo.test.php`). Rust's convention is fundamentally different: tests live in the same file as the code they test, gated by `#[cfg(test)]` and `#[test]`. ripgrep has 38 source files that contain inline `#[test]` blocks. By Rust standards they're tested. By v2.0 sibling-file standards they're not.
	90	+
	91	+This is a real spec gap, parallel to Sorted. v2.1 needs an `inline-tests` mode where the verifier checks for `#[test]` annotations inside the source file rather than for a sibling test file. Under that mode, ripgrep's 38 inline-tested files would count as tested; the ones without `#[test]` blocks would still be flagged.
	92	+
	93	+Without that mode, ripgrep "fails" Modeled-tests not because it's untested but because v2.0 doesn't recognize how Rust tests.
	94	+
	95	+### #4 Modeled (boundary) — would mostly pass
	96	+
	97	+Boundary patterns (`std::fs::read`, `std::env::var`, `serde_json::from_str`, `std::process::`, raw byte parsing from stdin):
	98	+
	99	+- `crates/ignore/src/walk.rs` + `crates/ignore/src/dir.rs` + `crates/ignore/src/gitignore.rs` — filesystem reads, gitignore parsing. Layer 2 (`ignore` is mapped to L2). ✓
	100	+- `crates/core/main.rs` + `crates/core/messages.rs` + `crates/core/flags/config.rs` — env::var, stderr writes, config file reads. Layer 3 (`core` is L3). ✓ (Layer 3 may use Layer 2's facilities, but should it parse external input directly?)
	101	+- `crates/globset/src/serde_impl.rs` — `serde_json` deserialization for glob patterns. Layer 1 (`globset` is L1 engine). ✗ — this is a boundary call in Layer 1.
	102	+- `crates/searcher/src/line_buffer.rs` — raw byte slicing and decoding. Layer 1 algorithm. Borderline: it's parsing bytes-to-lines, which is a kind of boundary work, but it's also the core search algorithm's hot loop.
	103	+
	104	+Two borderline cases (`globset/serde_impl.rs` and `searcher/line_buffer.rs`), neither egregious. Under a strict reading: fail; under a profile that explicitly declares "serde derives are not boundary parsing in the §4.4 sense": pass. Either way, closer to 95% than to 100%.
	105	+
	106	+### #5 Atomic — fails hard, and surfaces the third finding
	107	+
	108	+The 700-LOC cap is violated by 19 files. The top:
	109	+
	110	+\| file \| LOC \| what's in it \|
	111	+\|---\|---\|---\|
	112	+\| `crates/core/flags/defs.rs` \| 7,779 \| the catalog of every CLI flag, one `impl Flag` per flag struct \|
	113	+\| `crates/printer/src/standard.rs` \| 3,987 \| the default output formatter (color, line-by-line, multi-line, --vimgrep mode, etc.) \|
	114	+\| `crates/ignore/src/walk.rs` \| 2,494 \| the parallel filesystem walker \|
	115	+\| `crates/globset/src/glob.rs` \| 1,686 \| glob → regex translation \|
	116	+\| `crates/searcher/src/searcher/glue.rs` \| 1,549 \| the search loop assembly \|
	117	+\| `crates/core/flags/hiargs.rs` \| 1,480 \| high-level argument struct \|
	118	+\| `crates/matcher/src/lib.rs` \| 1,379 \| the `Matcher` trait definition + helpers \|
	119	+\| `crates/ignore/src/dir.rs` \| 1,305 \| gitignore directory state \|
	120	+\| ... \| ... \| ten more between 700 and 1,200 \|
	121	+
	122	+That's a lot. And ripgrep is by reputation careful code. Are these god-classes that should be split?
	123	+
	124	+Looking at the largest:
	125	+
	126	+`crates/core/flags/defs.rs` (7,779 LOC) is — quoting its own docstring — "Defines all of the flags available in ripgrep. Each flag corresponds to a unit struct with a corresponding implementation of `Flag`." It's a long-form catalog: ~150 flag definitions, each ~30-50 lines, each a small `struct` + a small `impl`. The file has near-zero cyclomatic complexity per line. It's a data table written in Rust syntax.
	127	+
	128	+Splitting it into 19 files of 400 lines each would scatter the flag definitions across many files when the natural reading order is "all flags in one place, in display order, with the deprecated ones at the end." The current single-file layout is the right shape for the content. Atomic-700 was designed to catch behavioral god-classes, not declaration catalogs.
	129	+
	130	+This is a real v2 spec-evolution finding. Atomic's 700-LOC cap should have an exemption — or a separate, higher cap — for files whose content is overwhelmingly declarative (data structures, const tables, enum variants, trait implementations with trivial bodies). The verifier could detect this heuristically: a file is "declarative" if its cyclomatic complexity per LOC drops below some threshold, or if its body is mostly `impl X for Y` / `const FOO: T = ...` / `pub struct ...`.
	131	+
	132	+`crates/printer/src/standard.rs` (3,987 LOC) is the opposite case: real behavioral complexity. That one would benefit from splitting per output mode. Same for `crates/ignore/src/walk.rs` (2,494 LOC) — the parallel filesystem walker is genuinely doing a lot. Those two are honest Atomic failures.
	133	+
	134	+So of the 19 over-cap files, roughly: two or three are catalog files that the spec should learn to recognize, and the rest are real-but-defensible behavioral complexity. The current binary verdict ("19 violations, fail") doesn't capture that nuance. v2.1 needs Atomic-with-categories.
	135	+
	136	+### #6 The Law (§1.2) — would pass
	137	+
	138	+Cargo enforces the absence of cyclic crate dependencies — the workspace literally won't build if `searcher` depends on `core`. The proposed layer mapping above respects every direction the build already enforces. PASS.
	139	+
	140	+### #7 Consistency — would pass
	141	+
	142	+Derives from Law on the same edge set.
	143	+
	144	+Tally: 3 of 7 strict-pass (Architecture, Law, Consistency). With the proposed v2.1 dialects (directory mode, inline-tests mode, declarative-Atomic exemption), the score rises to 5-6 of 7. Without them, ripgrep "fails" v2 mostly because v2 doesn't yet understand Rust.
	145	+
	146	+## §5 metric estimates
	147	+
	148	+\| metric \| ripgrep (estimated) \| dive (Go) \| tdd.md (TS, measured) \| WP plugin (PHP) \|
	149	+\|---\|---\|---\|---\|---\|
	150	+\| §4 checks passing \| ~3/7 strict, ~5/7 under v2.1 dialects \| ~5/7 \| 7/7 ✓ \| 0/7 \|
	151	+\| graphDepth \| ~5 (matcher → engine → searcher → printer → core) \| ~5 \| 7 \| ~3 \|
	152	+\| boundaryRatio \| ~95% \| ~85% \| 100% \| <10% \|
	153	+\| workingSetFit (50–500 LOC) \| ~60% (those 19 big files drag it down) \| ~80% \| 80% \| ~47% \|
	154	+\| violationCounts (sum) \| ~50 (19 Atomic + ~30 Modeled-tests under sibling-rule) \| ~30 \| 0 \| 17+ \|
	155	+
	156	+ripgrep's `workingSetFit` is the metric that surprises here: ~60%, lower than dive and lower than this site. That's the 19 big files pulling the distribution down. And yet most of those files are appropriate to their content. It's a useful signal: workingSetFit is not by itself a quality measure — a project full of declaration catalogs will score lower than a project full of small handlers without being architecturally worse.
	157	+
	158	+This is exactly the §5 intent. The metric surfaces a property; whether that property is good or bad depends on what the file content should be. Compliance scores conflate the two; metrics keep them separate.
	159	+
	160	+## What a rebuilt ripgrep would look like — the small version
	161	+
	162	+The audit makes the rebuild sketch short, because BurntSushi's crate split already maps to v2 layers under the directory dialect. The lift to make it pass under v2.1 with the proposed dialects:
	163	+
	164	+1. Add `sama.profile.toml` declaring the layer mapping (see profile above). 50 lines, zero code change.
	165	+2. Move two boundary leaks — `globset/src/serde_impl.rs` is fine if `serde` derives are exempted; the `searcher/line_buffer.rs` byte parsing is fundamentally part of the search algorithm, not a boundary. Likely a profile note, not a code change.
	166	+3. Split the two genuine god-files — `crates/printer/src/standard.rs` (3,987 LOC) splits per output mode (standard, vimgrep, multi-line, color-only) into 4 files of ~1,000 LOC each. `crates/ignore/src/walk.rs` (2,494 LOC) splits walker-config / walker-loop / walker-results into 3 files of ~800 LOC each. ~2 weeks of focused work.
	167	+4. The catalog files stay. `defs.rs` (7,779 LOC) is correct shape; the spec needs the exemption, not the file.
	168	+
	169	+That's it. ripgrep is so close to v2-compliant that the work isn't on ripgrep — it's on v2.
	170	+
	171	+## Three findings, restated
	172	+
	173	+1. Sorted needs a directory-based dialect. Already surfaced by the Go audit; ripgrep confirms.
	174	+2. Modeled-tests needs an `inline-tests` mode that recognizes `#[test]` annotations inside the source file rather than requiring a sibling file. Rust's convention is fundamentally not sibling-based.
	175	+3. Atomic-700 needs a declarative-file exemption. The 7,779-line `defs.rs` is the textbook case: a flag-definition catalog that's structurally correct as one file. The spec was written with behavioral complexity in mind; it doesn't yet distinguish "long file because complex" from "long file because catalog."
	176	+
	177	+All three are §6 evolution-policy moves: falsifiable extensions admitted provisionally, measured against §5 metrics across multiple repos. ripgrep is one of those repos.
	178	+
	179	+## Four datapoints on the same axes
	180	+
	181	+\| project \| language \| §4 score \| workingSetFit \| boundaryRatio \|
	182	+\|---\|---\|---\|---\|---\|
	183	+\| tdd.md \| TypeScript \| 7/7 ✓ (measured) \| 80% \| 100% \|
	184	+\| wagoodman/dive \| Go \| ~5/7 (estimated) \| ~80% \| ~85% \|
	185	+\| BurntSushi/ripgrep \| Rust \| ~3-5/7 (estimated) \| ~60% \| ~95% \|
	186	+\| Open Graph plugin \| PHP/WordPress \| 0/7 (estimated) \| ~47% \| <10% \|
	187	+
	188	+n=4, three of them hand-estimated, still far from a "v2 is worth following" claim. But the pattern is clearer now: the strongly-typed compiled-language projects (Go, Rust) cluster near the dogfood; the WordPress codebase is the outlier on every axis. Whether that's "the language enforces architecture for free" or "people who choose Go/Rust care more about architecture" is the experiment §6 hasn't run yet.
	189	+
	190	+---
	191	+
	192	+See for yourself:
	193	+
	194	+- The project: <https://github.com/BurntSushi/ripgrep>
	195	+- The Go audit (companion): [Pointing SAMA v2 at `dive`](/blog/sama-v2-go-project-dive)
	196	+- The WP audit + rebuild: [WordPress plugin audit](/blog/sama-v2-wordpress-plugin-audit) · [rebuilt](/blog/sama-v2-wordpress-plugin-rebuilt)
	197	+- The §5 metrics emitter: [Compliance proves the rules followed. Delta proves they were worth following.](/blog/sama-v2-metrics-emitter)
	198	+- The spec being audited against: [/sama/v2](/sama/v2)

modified src/a31_blog.ts +6 −0

@@ -12,6 +12,12 @@ export interface BlogEntry {
12	12	}
13	13
14	14	export const ALL_POSTS: BlogEntry[] = [
	15	+ {
	16	+ slug: "sama-v2-rust-project-ripgrep",
	17	+ title: "Pointing SAMA v2 at `ripgrep`: BurntSushi's exemplar surfaces three findings about the spec",
	18	+ description: "Fourth real-world audit after this site (TS, 7/7), dive (Go, ~5/7), and the WP plugin (PHP, 0/7). Picked BurntSushi/ripgrep — 64k stars, ten years of refinement, ten crates, 45,927 lines of Rust. The carefully-architected exemplar of Rust CLI design. If v2 has problems, ripgrep is where they surface, because the obvious failures (god-classes, scattered I/O, untyped data) just aren't there. The crate dep graph already reads like a SAMA v2 layer chart — matcher trait at the bottom, core binary at the top, searcher/printer/globset/ignore between. BurntSushi did the layering; he just didn't call it that. Walking the seven §4 checks surfaces three real spec-evolution findings: (1) Sorted needs the directory-based dialect (Go finding, confirmed here); (2) Modeled-tests needs an inline-tests mode because Rust convention is #[test] inside the source file, not sibling files (NEW finding); (3) Atomic-700 needs a declarative-file exemption because the 7,779-line flags/defs.rs is a CLI-flag catalog with near-zero cyclomatic complexity per line — splitting it would scatter what's naturally one long table (NEW finding). The §5 metrics surface their own insight: workingSetFit drops to ~60% for ripgrep, lower than tdd.md or dive, yet most of the over-cap files are appropriate to their content. That's exactly the §5 intent — the metric surfaces a property; whether the property is good or bad depends on what the file SHOULD be. Compliance scores conflate; metrics separate. Strict score ~3/7; under the proposed v2.1 dialects ~5-6/7. ripgrep is so close to compliant that the work isn't on ripgrep — it's on v2. n=4 datapoints now: strongly-typed compiled-language projects cluster near the dogfood; the WP codebase is the outlier on every axis. Whether that's 'language enforces architecture' or 'people who choose Go/Rust care more about architecture' is the experiment §6 hasn't run yet.",
	19	+ date: "2026-05-24",
	20	+ },
15	21	{
16	22	slug: "sama-v2-go-project-dive-prefix-scheme",
17	23	title: "`dive`, the prefix-scheme variant — what `ls`-readable layer order costs in Go",

raw .diff