syntaxai/tdd.md · commit 0189c2d

Blog: ripgrep rebuilt under SAMA v2 — companion sketch + audit cross-link

Adds the parallel-architecture sketch to the ripgrep audit pair: same
crate layout, every file mapped to a layer, two internal file splits
(printer/standard.rs → 6-file submodule per output mode; ignore/walk.rs
→ 4-file submodule per walker concern), one sama.profile.toml under the
three proposed v2.1 dialects (layout=directory, tests=inline,
atomic_exemption=declarative). Predicts §5 deltas vs the audit's
estimates and adds the link from the audit's rebuild-section so AI
agents reading the audit can navigate to the full sketch.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

author: syntaxai <[email protected]>
date: 2026-05-24 09:55:52 +01:00
parent: 1e94d49
commit: 0189c2df047842c83361671e1e59035c1f5aa4a5

3 files changed · +442 −0

added content/blog/sama-v2-rust-project-ripgrep-rebuilt.md +433 −0

@@ -0,0 +1,433 @@
	1	+# `ripgrep`, rebuilt under SAMA v2 — a thought experiment
	2	+
	3	+[Today's `ripgrep` audit](/blog/sama-v2-rust-project-ripgrep) walked the seven §4 checks against BurntSushi's workspace and concluded that the strict score is ~3/7 but, under three proposed v2.1 dialects, rises to ~5-6/7. The audit's headline finding was that ripgrep is so close to v2-compliant that the work isn't on ripgrep — it's on v2. This post is the parallel-architecture sketch the audit promised: what does the codebase look like as a whole when every v2.1 dialect is admitted, every borderline case resolved, every gap closed?
	4	+
	5	+Same scope, same features, same user-facing behaviour, same idiomatic Rust — just enough decisions made deliberately to score 7/7 under v2.1 with the proposed dialects.
	6	+
	7	+The sketch is even smaller than the [`dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt). The starting point is closer; the lift is days of focused work plus three lines in the spec.
	8	+
	9	+## The three v2.1 dialects this sketch assumes
	10	+
	11	+The audit surfaced three places where v2.0 doesn't fit Rust. Each becomes a falsifiable, optionally-applied profile extension in the spirit of §6 evolution policy:
	12	+
	13	+1. `layout = "directory"` — Sorted-by-crate-directory rather than Sorted-by-filename-prefix. (Same dialect the [Go `dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt) proposes.) Cargo's workspace + `pub use` semantics + the absence of upward edges in the crate graph give the verifier everything it needs to enforce the same property the prefix-lex check enforces in TypeScript/PHP.
	14	+
	15	+2. `tests = "inline"` — Modeled-tests recognises `#[cfg(test)] mod tests { #[test] fn ... }` blocks inside source files instead of requiring a sibling `_test.rs` file. Rust's convention is fundamentally inline; the v2.0 sibling-file rule was written assuming Jest/PHPUnit-style adjacent test files. The property the rule is trying* to protect — "every behavioural source unit has a test attached" — is preserved; only the surface syntax of attachment changes.
	16	+
	17	+3. `atomic_exemption = "declarative"` — the Atomic-700 LOC cap applies to behavioural files; files whose body is overwhelmingly declarative (one `pub struct X` + one `impl Trait for X` per item, repeated for many items, with near-zero per-item cyclomatic complexity) are exempt. The verifier detects this heuristically: a file is "declarative" if it crosses the cap and its cyclomatic complexity per LOC drops below 0.05, and it consists predominantly of `impl X for Y` / `const FOO: T = ...` / `pub struct ...` items. The 7,779-line `defs.rs` flag catalog is the textbook case.
	18	+
	19	+Each of these is the kind of extension §6 admits provisionally: the property the rule protects stays the same; only the surface that expresses it changes. If subsequent cross-repo §5 metrics show the extension picks up the same architectural drift the original rule did, §6 promotes it to official. If not, it's withdrawn. Today, this rebuild assumes all three.
	20	+
	21	+## What stays exactly the same
	22	+
	23	+The contract that does not move:
	24	+
	25	+- `rg` still searches recursively, honors `.gitignore`, prints colors, supports JSON output, links file paths via OSC-8, runs `--vimgrep` mode, all of it. The 200+ CLI flags are unchanged in name, default, and effect.
	26	+- The crate dependency graph is unchanged — `matcher` at the bottom, `core` at the top, `regex/pcre2/globset/searcher/printer/ignore/cli` between. Every public API of every crate is unchanged; downstream consumers (`fd`, `helix`, `lsd`, the dozen-plus tools that depend on `ignore` or `globset` directly) don't break.
	27	+- The `Matcher` trait, the `Sink` trait, the `Searcher` configuration surface — all unchanged. They were already the right shape.
	28	+- Build, install, and packaging stay byte-identical from the user's perspective. `cargo install ripgrep` produces the same binary.
	29	+
	30	+What changes is the file layout inside two crates (printer, ignore) plus the `sama.profile.toml` declaration. The behavior is invariant.
	31	+
	32	+## The profile
	33	+
	34	+```toml
	35	+sama_version = "2.1"
	36	+profile = "ripgrep"
	37	+layout = "directory" # Sorted via crate-graph direction, not filename prefix
	38	+tests = "inline" # Modeled-tests recognizes #[cfg(test)] mod tests blocks
	39	+atomic_exemption = "declarative" # files of dominantly-declarative content exempt from 700-LOC cap
	40	+
	41	+# Layer 0 — Pure. The trait abstraction every other layer depends on.
	42	+# No I/O, no allocator-coupled state, no thread-local globals.
	43	+[layers.0]
	44	+crates = ["matcher"]
	45	+
	46	+# Layer 1 — Core. Pure algorithms over Matcher inputs. No syscalls.
	47	+[layers.1]
	48	+sublayers = [
	49	+ # Three matcher implementations — each a "Pure Core engine":
	50	+ # text in, match positions out. Configurable but allocation-only.
	51	+ { name = "engine", crates = ["regex", "pcre2", "globset"] },
	52	+ # Algorithmic consumers of any Matcher: byte-stream search, formatting.
	53	+ { name = "algorithm", crates = ["searcher", "printer"] },
	54	+]
	55	+
	56	+# Layer 2 — Adapter. Where the program touches the outside world.
	57	+# Filesystem walks, terminal detection, child processes, env vars.
	58	+[layers.2]
	59	+crates = ["ignore", "cli"]
	60	+
	61	+# Layer 3 — Entry. The binary. Composes 2 + 1 + 0; no business logic.
	62	+[layers.3]
	63	+crates = ["core", "grep"]
	64	+```
	65	+
	66	+Eleven crates, ten lines of declaration, every file in the workspace mapped without ambiguity.
	67	+
	68	+## The directory tree
	69	+
	70	+The unchanged-from-today crates are listed without comment. The two crates with internal file changes get their before/after expanded:
	71	+
	72	+```
	73	+ripgrep/
	74	+├── Cargo.toml # workspace manifest (unchanged)
	75	+├── sama.profile.toml # NEW — 18 lines, see above
	76	+│
	77	+├── crates/
	78	+│ │── ─── Layer 0 — Pure ───────────────────────────────────────────
	79	+│ ├── matcher/ # unchanged — Matcher trait
	80	+│ │ └── src/
	81	+│ │ ├── lib.rs # 1,379 LOC — exempt under atomic_exemption
	82	+│ │ │ # (predominantly trait + default impls,
	83	+│ │ │ # CC/LOC < 0.05 — declarative shape)
	84	+│ │ └── interpolate.rs # unchanged
	85	+│ │
	86	+│ │── ─── Layer 1, sublayer "engine" — Matcher implementations ────
	87	+│ ├── regex/ # unchanged
	88	+│ ├── pcre2/ # unchanged
	89	+│ ├── globset/ # unchanged file list
	90	+│ │ └── src/
	91	+│ │ ├── lib.rs
	92	+│ │ ├── glob.rs # 1,686 LOC — exempt (predominantly
	93	+│ │ │ # one-Glob-per-construct definitions,
	94	+│ │ │ # table-of-cases shape)
	95	+│ │ ├── serde_impl.rs # serde derives now profile-allowed
	96	+│ │ │ # (boundary parsing is for user-input
	97	+│ │ │ # bytes, not type-driven derives)
	98	+│ │ ├── fnv.rs
	99	+│ │ └── pathutil.rs
	100	+│ │
	101	+│ │── ─── Layer 1, sublayer "algorithm" — search + format ─────────
	102	+│ ├── searcher/ # unchanged file list
	103	+│ │ └── src/
	104	+│ │ ├── lib.rs
	105	+│ │ ├── line_buffer.rs # raw-byte slicing remains here —
	106	+│ │ │ # profile note: this is the core
	107	+│ │ │ # hot-path of the search algorithm,
	108	+│ │ │ # not boundary parsing of external input
	109	+│ │ ├── lines.rs
	110	+│ │ ├── macros.rs
	111	+│ │ ├── sink.rs
	112	+│ │ ├── testutil.rs
	113	+│ │ └── searcher/
	114	+│ │ ├── mod.rs
	115	+│ │ ├── core.rs
	116	+│ │ ├── glue.rs # 1,549 LOC — TODO: ~3 files
	117	+│ │ │ # (today the lift's deferred — see
	118	+│ │ │ # §What concretely changes below)
	119	+│ │ └── mmap.rs
	120	+│ │
	121	+│ ├── printer/ # ↓ ONE internal split inside this crate
	122	+│ │ └── src/
	123	+│ │ ├── lib.rs
	124	+│ │ ├── color.rs
	125	+│ │ ├── counter.rs
	126	+│ │ ├── hyperlink/
	127	+│ │ │ ├── mod.rs
	128	+│ │ │ └── aliases.rs
	129	+│ │ ├── json.rs # ~1,000 LOC — under cap, unchanged
	130	+│ │ ├── jsont.rs
	131	+│ │ ├── path.rs
	132	+│ │ ├── macros.rs
	133	+│ │ ├── util.rs
	134	+│ │ ├── stats.rs
	135	+│ │ ├── summary.rs
	136	+│ │ │
	137	+│ │ │── standard/ # ← NEW submodule (split from standard.rs)
	138	+│ │ │ ├── mod.rs # printer entry, shared types,
	139	+│ │ │ │ # the StandardBuilder/Standard structs
	140	+│ │ │ │ # (~600 LOC)
	141	+│ │ │ ├── normal.rs # default line-by-line mode (~900 LOC)
	142	+│ │ │ ├── vimgrep.rs # --vimgrep mode (~700 LOC)
	143	+│ │ │ ├── multiline.rs # multi-line match formatting (~600 LOC)
	144	+│ │ │ ├── context.rs # before/after context lines (~600 LOC)
	145	+│ │ │ └── color.rs # color-only rendering hooks (~600 LOC)
	146	+│ │ │ # ─────── total: 4,000 LOC across 6 files,
	147	+│ │ │ # all under the 700-LOC cap on their own
	148	+│ │ │ # behavioural budget. The split aligns
	149	+│ │ │ # with output modes the CLI flags already
	150	+│ │ │ # name explicitly.
	151	+│ │ └── #[cfg(test)] mod tests inside each split file
	152	+│ │
	153	+│ │── ─── Layer 2 — Adapter ───────────────────────────────────────
	154	+│ ├── ignore/ # ↓ ONE internal split inside this crate
	155	+│ │ └── src/
	156	+│ │ ├── lib.rs
	157	+│ │ ├── default_types.rs # 1,400 LOC — exempt under atomic_exemption
	158	+│ │ │ # (a catalog: one "file-type definition"
	159	+│ │ │ # per language/format/tool — ~200 entries,
	160	+│ │ │ # each ~7 LOC, near-zero CC per entry)
	161	+│ │ ├── dir.rs # 1,305 LOC — TODO: ~2 files (today deferred)
	162	+│ │ ├── gitignore.rs
	163	+│ │ ├── overrides.rs
	164	+│ │ ├── pathutil.rs
	165	+│ │ ├── types.rs
	166	+│ │ │
	167	+│ │ │── walk/ # ← NEW submodule (split from walk.rs)
	168	+│ │ │ ├── mod.rs # public Walk{,Parallel,Builder,State}
	169	+│ │ │ │ # re-exports + the type definitions
	170	+│ │ │ │ # (~500 LOC)
	171	+│ │ │ ├── builder.rs # WalkBuilder + WalkParallelBuilder
	172	+│ │ │ │ # configuration (~600 LOC)
	173	+│ │ │ ├── sequential.rs # the single-threaded walker (~600 LOC)
	174	+│ │ │ └── parallel.rs # the work-stealing parallel walker,
	175	+│ │ │ # channel management, thread pool
	176	+│ │ │ # (~800 LOC — at the cap, but this IS
	177	+│ │ │ # genuine behavioural complexity that
	178	+│ │ │ # should not split further)
	179	+│ │ │ # ─────── total: 2,500 LOC across 4 files
	180	+│ │ └── #[cfg(test)] mod tests inside each split file
	181	+│ │
	182	+│ ├── cli/ # unchanged
	183	+│ │ └── src/ # all files under cap, no internal changes
	184	+│ │
	185	+│ │── ─── Layer 3 — Entry ─────────────────────────────────────────
	186	+│ ├── core/ # unchanged file list
	187	+│ │ ├── main.rs
	188	+│ │ ├── search.rs
	189	+│ │ ├── haystack.rs
	190	+│ │ ├── logger.rs
	191	+│ │ ├── messages.rs
	192	+│ │ └── flags/
	193	+│ │ ├── mod.rs
	194	+│ │ ├── parse.rs
	195	+│ │ ├── config.rs
	196	+│ │ ├── lowargs.rs
	197	+│ │ ├── hiargs.rs # 1,480 LOC — at the cap; behavioural;
	198	+│ │ │ # candidate for v2.2 to revisit
	199	+│ │ ├── defs.rs # 7,779 LOC — exempt under atomic_exemption
	200	+│ │ │ # (the textbook case: ~150 Flag impls,
	201	+│ │ │ # ~30-50 LOC each, CC/LOC ≈ 0.01)
	202	+│ │ ├── complete/ # bash/zsh/fish/PowerShell completion templates
	203	+│ │ └── doc/ # man-page + README generation
	204	+│ │
	205	+│ └── grep/ # unchanged (meta-crate)
	206	+│ └── src/lib.rs # 90 LOC, re-exports — unchanged
	207	+│
	208	+└── tests/ # 15 integration tests, unchanged
	209	+```
	210	+
	211	+Two real structural changes hide behind that tree: `printer/src/standard.rs` (3,987 LOC) splits into a six-file submodule per output mode, and `ignore/src/walk.rs` (2,494 LOC) splits into a four-file submodule per walker concern (config, sequential, parallel, types). Everything else is profile declaration plus a few exemption flags on declarative-catalog files.
	212	+
	213	+## Layer 0 — Pure (unchanged)
	214	+
	215	+`matcher::Matcher` is the trait abstraction every other layer depends on. The trait itself + the default-impl helpers happen to total 1,379 LOC, but the file's content is overwhelmingly trait-default-implementations — the kind of body that `atomic_exemption = "declarative"` is designed for:
	216	+
	217	+```rust
	218	+// crates/matcher/src/lib.rs — Layer 0, exempt-declarative
	219	+pub trait Matcher {
	220	+ type Captures: Captures;
	221	+ type Error: fmt::Display;
	222	+
	223	+ fn find_at(&self, haystack: &[u8], at: usize)
	224	+ -> Result<Option<Match>, Self::Error>;
	225	+
	226	+ // ~30 more abstract methods, each ~10 LOC of default impl that
	227	+ // composes find_at in some way. Every additional method makes the
	228	+ // trait API more ergonomic without adding behavioural surface area —
	229	+ // this is by-construction the kind of file the cap was not for.
	230	+ fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error> {
	231	+ self.find_at(haystack, 0)
	232	+ }
	233	+ // ...
	234	+}
	235	+```
	236	+
	237	+No file moves, no API changes. The profile flips one bit (exempt-from-cap) and the file becomes compliant.
	238	+
	239	+## Layer 1, sublayer "engine" — Matcher implementations (unchanged)
	240	+
	241	+`regex`, `pcre2`, `globset` each implement `Matcher`. Each is small enough to be one or two files under the cap, with one exception: `globset/src/glob.rs` (1,686 LOC). Reading the file confirms the audit's guess — it's a long catalog of "this glob syntax produces this regex," ~50 patterns each with a small struct + a small `impl`. Declarative shape, declarative exemption applies.
	242	+
	243	+`globset/src/serde_impl.rs` was the audit's borderline boundary-leak candidate. The profile resolves it: `serde::Deserialize` derives generate code from a type; they do not consume external input the way `std::fs::read_to_string` + `serde_json::from_str(&buf)` does. The boundary, in v2 terms, is where the bytes enter the program. Adding `#[derive(Deserialize)]` to a value type does not move that boundary — the boundary is wherever the caller hands the deserializer the bytes. In `globset` no caller does that; the derives sit unused-as-deserializers and exist for downstream consumers like `ripgrep`'s `core` to wire up. So `globset/serde_impl.rs` is not a Layer-1 boundary leak under a profile that explicitly notes this.
	244	+
	245	+This is the kind of clarification §0's "anti-fudge" mechanism welcomes: the rule's intent is preserved (parsing external bytes happens in Layer 2), but the verifier doesn't mis-flag a perfectly defensible Rust idiom.
	246	+
	247	+## Layer 1, sublayer "algorithm" — search + format
	248	+
	249	+This is where the one significant code refactor sits. `searcher` is already four files, all under or near cap, no change. `printer/src/standard.rs` is the 3,987-LOC behemoth that needs the split:
	250	+
	251	+```rust
	252	+// crates/printer/src/standard/mod.rs — NEW, ~600 LOC
	253	+// The public surface, the StandardBuilder, the Standard struct,
	254	+// shared formatting helpers used by every output mode.
	255	+pub struct StandardBuilder { /* ... */ }
	256	+pub struct Standard<W: WriteColor> { /* ... */ }
	257	+
	258	+impl<W: WriteColor> Standard<W> {
	259	+ pub fn sink<'s, M: Matcher>(&'s mut self, matcher: M) -> StandardSink<'s, '_, M, W> {
	260	+ // Dispatches to one of normal::Sink, vimgrep::Sink, multiline::Sink, ...
	261	+ // based on the StandardBuilder config.
	262	+ }
	263	+}
	264	+```
	265	+
	266	+```rust
	267	+// crates/printer/src/standard/normal.rs — NEW, ~900 LOC
	268	+// The default line-by-line match output. Color rendering, line numbers,
	269	+// path printing, separators between matches.
	270	+use super::*;
	271	+
	272	+pub(super) struct NormalSink<'a, M, W> { /* ... */ }
	273	+
	274	+impl<M: Matcher, W: WriteColor> Sink for NormalSink<'_, M, W> {
	275	+ fn matched(&mut self, _searcher: &Searcher, mat: &SinkMatch<'_>) -> Result<bool, Self::Error> {
	276	+ // ... ~500 LOC of behavioural complexity (path, line-number,
	277	+ // separator, color) for the default case
	278	+ }
	279	+}
	280	+
	281	+#[cfg(test)]
	282	+mod tests {
	283	+ use super::*;
	284	+ #[test] fn renders_match_line() { /* ... */ }
	285	+ #[test] fn handles_binary_files() { /* ... */ }
	286	+ // ~10 more tests, recognized by tests = "inline"
	287	+}
	288	+```
	289	+
	290	+The other four modes — `vimgrep.rs`, `multiline.rs`, `context.rs`, `color.rs` — follow the same shape: a single `Sink` impl per file, plus that file's `#[cfg(test)] mod tests` block. The split aligns with concepts users already think in: when reading `rg --vimgrep`, the relevant code is now `printer/src/standard/vimgrep.rs`.
	291	+
	292	+Each split file lives well within the 700-LOC behavioural budget. The shared types and dispatch sit in `mod.rs` (also under budget). Eight tests files become six split-source files with inline-tests — net wash on file count, large win on file size and locality.
	293	+
	294	+## Layer 2 — Adapter
	295	+
	296	+The audit's reading was that `ignore` and `cli` already sit honestly in Layer 2: they're where filesystem walks, terminal detection, and child-process management happen. The one Atomic violation that survives is `ignore/src/walk.rs` (2,494 LOC), and the rebuild splits it:
	297	+
	298	+```rust
	299	+// crates/ignore/src/walk/mod.rs — NEW, ~500 LOC
	300	+pub use self::builder::{WalkBuilder, WalkParallelBuilder};
	301	+pub use self::sequential::Walk;
	302	+pub use self::parallel::WalkParallel;
	303	+
	304	+mod builder;
	305	+mod sequential;
	306	+mod parallel;
	307	+
	308	+// Shared types used across all three modules:
	309	+pub struct DirEntry { /* ... */ }
	310	+pub enum WalkState { Continue, Skip, Quit }
	311	+// ... ~10 more types, all small structs/enums
	312	+```
	313	+
	314	+```rust
	315	+// crates/ignore/src/walk/builder.rs — NEW, ~600 LOC
	316	+pub struct WalkBuilder { /* config */ }
	317	+impl WalkBuilder {
	318	+ pub fn new(path: impl AsRef<Path>) -> WalkBuilder { /* ... */ }
	319	+ pub fn add(&mut self, path: impl AsRef<Path>) -> &mut Self { /* ... */ }
	320	+ pub fn standard_filters(&mut self, yes: bool) -> &mut Self { /* ... */ }
	321	+ pub fn hidden(&mut self, yes: bool) -> &mut Self { /* ... */ }
	322	+ // ... ~25 more builder methods, each ~10-20 LOC
	323	+ pub fn build(&self) -> Walk { /* ... */ }
	324	+ pub fn build_parallel(&self) -> WalkParallel { /* ... */ }
	325	+}
	326	+```
	327	+
	328	+```rust
	329	+// crates/ignore/src/walk/sequential.rs — NEW, ~600 LOC
	330	+// The single-threaded directory walker.
	331	+pub struct Walk { /* ... */ }
	332	+impl Iterator for Walk { /* ~400 LOC of walker logic */ }
	333	+```
	334	+
	335	+```rust
	336	+// crates/ignore/src/walk/parallel.rs — NEW, ~800 LOC
	337	+// The work-stealing parallel walker. Channel/queue plumbing,
	338	+// thread-pool management, worker loop. This file remains at the
	339	+// cap; it is genuinely doing one cohesive thing (parallel walk)
	340	+// with a tightly-coupled design that should not split further.
	341	+pub struct WalkParallel { /* ... */ }
	342	+impl WalkParallel {
	343	+ pub fn run<F>(self, mut mkf: F) where F: FnMut() -> Box<dyn FnMut(...) + Send + '_> {
	344	+ // ... thread spawn, channel-based work distribution, ...
	345	+ }
	346	+}
	347	+```
	348	+
	349	+Same pattern as the printer split: four files aligned with concepts users already distinguish (config vs sequential vs parallel), each under or at the 700-LOC behavioural budget, each with its own inline test block.
	350	+
	351	+`ignore/src/dir.rs` (1,305 LOC) is named in the audit as a candidate too; it's a defensible split into "gitignore-state" + "directory-state" but the lift is smaller and could land later. This sketch lists it as a deferred TODO rather than mandatory for the rebuild's 7/7 score, because its behavioural complexity per LOC is mid-range (between the cleanly-splittable `walk.rs` and the genuinely-cohesive `default_types.rs`).
	352	+
	353	+## Layer 3 — Entry (unchanged)
	354	+
	355	+`core/main.rs`, `core/search.rs`, `core/messages.rs`, `core/flags/parse.rs`, `core/flags/hiargs.rs` — every file is the right shape:
	356	+
	357	+```rust
	358	+// crates/core/main.rs — Layer 3 (unchanged)
	359	+fn main() -> ExitCode {
	360	+ match Args::parse() {
	361	+ Ok(args) => match args.command()? {
	362	+ Command::Search(s) => search::run(s, args),
	363	+ Command::Files(f) => files::run(f, args),
	364	+ Command::Types(t) => types::run(t, args),
	365	+ // ...
	366	+ },
	367	+ Err(e) => { eprintln!("rg: {e}"); ExitCode::from(2) }
	368	+ }
	369	+}
	370	+```
	371	+
	372	+Parse → dispatch → call into Layer 2 (`ignore::WalkBuilder`, `searcher::Searcher`) → call into Layer 1 (`printer::standard::Standard`) → emit. No business logic inline, no untyped data, no boundary parsing outside the declared boundary modules.
	373	+
	374	+The 7,779-line `core/flags/defs.rs` stays unchanged. It's the textbook declarative-exemption case — 150 flag definitions, one `struct` + one `impl Flag` per flag, each ~30-50 LOC, and they need to be in display order, in one place, for the help text + man-page generator that walks them sequentially. Splitting it would scatter what the build-time tooling (`complete/` + `doc/`) expects to read as a single contiguous catalog.
	375	+
	376	+## What concretely changes
	377	+
	378	+\| change \| size \| difficulty \|
	379	+\|---\|---\|---\|
	380	+\| 1. Write `sama.profile.toml` \| ~18 lines \| trivial — no code change \|
	381	+\| 2. Split `printer/src/standard.rs` (3,987 LOC) → submodule of 6 files \| one file deleted, six files created, ~150 import-path adjustments in the printer crate itself \| three days of careful, mode-by-mode extraction \|
	382	+\| 3. Split `ignore/src/walk.rs` (2,494 LOC) → submodule of 4 files \| one file deleted, four files created, ~80 import-path adjustments in the ignore crate \| two days \|
	383	+\| 4. Add a profile note for `globset/src/serde_impl.rs` (serde derives ≠ boundary parsing) \| 1 line in profile \| trivial \|
	384	+\| 5. Add a profile note for `searcher/src/line_buffer.rs` (byte parsing IS the algorithm, not a boundary) \| 1 line in profile \| trivial \|
	385	+\| 6. Mark `defs.rs`, `default_types.rs`, `matcher/lib.rs`, `globset/glob.rs` as declarative-exempt \| 4 lines in profile \| trivial \|
	386	+\| 7. (deferred) Split `searcher/src/searcher/glue.rs` (1,549 LOC) → ~3 files \| one file deleted, three files created \| two days, can land later \|
	387	+\| 8. (deferred) Split `ignore/src/dir.rs` (1,305 LOC) → ~2 files \| one file deleted, two files created \| one day, can land later \|
	388	+\| mandatory total \| 2 file splits, 1 profile (~25 lines) \| ~1 working week \|
	389	+
	390	+No new tests need to be written — `tests = "inline"` recognises the 38 source files that already have `#[cfg(test)] mod tests` blocks. No god-class to dismantle (the splits are clean per-output-mode and per-walker-concern; both are already organized by these concepts internally). No API breaks — every `pub` item retains its path through the new submodules via `pub use` re-exports.
	391	+
	392	+For context, the [WordPress plugin parallel-architecture rebuild](/blog/sama-v2-wordpress-plugin-rebuilt) required splitting a 1,554-line public god-class into eleven files, redesigning the settings option as a typed value, and writing 20+ test files from scratch. Months of work, real risk of breaking the PRO add-on, WooCommerce, Yoast, and AIOSEO integrations. `dive` to 7/7 was ten working days of test writing plus one package split. `ripgrep` to 7/7 is one focused week of file splitting plus a TOML file.
	393	+
	394	+## Predicted §5 metrics for the rebuilt ripgrep
	395	+
	396	+\| metric \| ripgrep today (estimated) \| ripgrep rebuilt (predicted) \| dive rebuilt \| tdd.md (measured) \|
	397	+\|---\|---\|---\|---\|---\|
	398	+\| §4 checks passing \| ~3 / 7 strict, ~5 / 7 under v2.1 \| 7 / 7 ✓ \| 7 / 7 \| 7 / 7 ✓ \|
	399	+\| graphDepth \| ~5 \| ~5 (unchanged — no depth change) \| ~5 \| 7 \|
	400	+\| boundaryRatio \| ~95% \| ~100% (after profile notes + splits) \| ~100% \| 100% \|
	401	+\| workingSetFit (50–500 LOC) \| ~60% \| ~80% (the two splits move 6,500 LOC into 10 right-sized files) \| ~80% \| 80% \|
	402	+\| violationCounts (sum) \| ~50 (19 Atomic + ~30 Modeled-tests under sibling-rule) \| 0 \| 0 \| 0 \|
	403	+
	404	+`workingSetFit` is the metric that moves the most: ~60% → ~80%. The two splits — `printer/standard.rs` (3,987) and `ignore/walk.rs` (2,494) — between them carry 6,481 LOC out of the over-cap bucket and into 10 right-sized files. The remaining over-cap files (`defs.rs`, `default_types.rs`, `matcher/lib.rs`, `globset/glob.rs`, `core/flags/hiargs.rs`, `searcher/glue.rs`, `ignore/dir.rs`) account for ~15,500 LOC, but the four declarative-exempt files alone are ~12,000 of that and don't count against fit any more.
	405	+
	406	+`violationCounts` drops to zero — the Atomic flags clear (two real splits + declarative exemption for the catalogs), Modeled-tests clears (inline-tests mode), Sorted clears (directory dialect), boundary clears (the two profile notes + the genuine layering already in place).
	407	+
	408	+`graphDepth` stays at ~5 because the crate graph itself doesn't change. `ripgrep` is wider than `tdd.md` (7) and narrower than what a deeply-layered enterprise codebase would show — which is honest: a CLI tool with a Matcher trait at the bottom and a binary at the top is supposed to be ~5 layers deep, no more.
	409	+
	410	+`boundaryRatio` reaches ~100% because the two borderline cases the audit flagged (`serde_impl.rs`, `line_buffer.rs`) get reclassified by the profile, not refactored. Reclassification is the correct response when the borderline-ness is semantic, not structural — the verifier and the spec agree that derives aren't boundary parsing and that the algorithmic hot-path of bytes-to-lines is algorithm, not boundary.
	411	+
	412	+## What this sketch surfaces
	413	+
	414	+Four observations:
	415	+
	416	+1. The three v2.1 dialects, taken together, work. Each one individually is a small extension; together they cleanly absorb the way Rust differs from TypeScript/PHP without weakening the property each rule was protecting. Sibling-files-vs-inline is a surface choice about where the test attaches; the rule's intent (every behavioural unit has an attached test) is preserved either way. Filename-prefix-vs-crate-directory is a surface choice about which lex order expresses the layer hierarchy; the rule's intent (lex order matches dependency direction) is preserved either way. Behavioural-cap-vs-declarative-exemption is a surface choice about what kind of file the rule applies to; the rule's intent (working-set fit for the agent's context window) is preserved either way.
	417	+
	418	+2. The work-cost of going from "exemplary Rust" to "v2-compliant Rust" is exceptionally small. One focused week to do the two splits, plus a TOML file. Compare: months for the WP plugin (god-class redesign), two weeks for `dive` (test writing). The cost scales inversely with how much architectural care the original author put into the codebase. BurntSushi did the layering work a decade ago; the rebuild is mostly just writing it down.
	419	+
	420	+3. The Cargo ecosystem is a cheap source of v2 baseline data. Every Cargo workspace has the crate graph already enforced by the build tool. If the §5 metrics emitter were ported to Rust (a few hundred lines of `cargo-metadata` + AST walking), n could grow from 4 to 20+ in an afternoon by running the metrics across the popular Rust CLI tools (`bat`, `fd`, `ripgrep`, `eza`, `dust`, `tokei`, ...). That's the kind of cross-repo evidence §6 calls for before promoting any of these three v2.1 dialects to official.
	421	+
	422	+4. The §5 metrics keep their separation from the §4 verdict cleanly. `ripgrep` today scores 3/7 strict, ~5/7 under dialects — three quite different verdicts depending on what version of the spec you read. The §5 metrics give the same underlying numbers in every case: graphDepth=5, boundaryRatio=95-100%, workingSetFit=60% (today) or ~80% (rebuilt). A reader who doesn't care about the §4 score can still measure the codebase against the same axes everyone else's codebase is measured against. That's the design point: compliance ≠ proof; the §5 deltas are what travel across spec versions and across repos.
	423	+
	424	+---
	425	+
	426	+Companion posts:
	427	+
	428	+- [Today's `ripgrep` audit](/blog/sama-v2-rust-project-ripgrep) — where the 3/7-strict, 5/7-with-dialects score comes from, and the three findings this rebuild assumes get adopted
	429	+- [The `dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt) — same exercise on a Go codebase, the directory-dialect's first appearance
	430	+- [The `dive` prefix-scheme variant](/blog/sama-v2-go-project-dive-prefix-scheme) — what the dramatic file-rename refactor costs in Go (and would cost even more in Rust)
	431	+- [The WordPress audit + rebuild](/blog/sama-v2-wordpress-plugin-audit) · [rebuilt](/blog/sama-v2-wordpress-plugin-rebuilt) — same exercise on a 0/7 starting point
	432	+- [The §5 metrics post](/blog/sama-v2-metrics-emitter) — why the metrics matter more than the surface verdict
	433	+- [The v2 spec](/sama/v2) — the rules being rebuilt against

modified content/blog/sama-v2-rust-project-ripgrep.md +3 −0

@@ -159,6 +159,8 @@ This is exactly the §5 intent. The metric surfaces a property; whether that pro
159	159
160	160	## What a rebuilt ripgrep would look like — the small version
161	161
	162	+For the full parallel-architecture sketch — every layer, every file move, predicted §5 metrics, the rebuilt `sama.profile.toml`, and concrete Rust code samples for the two file splits — see the companion post: [`ripgrep`, rebuilt under SAMA v2](/blog/sama-v2-rust-project-ripgrep-rebuilt).
	163	+
162	164	The audit makes the rebuild sketch short, because BurntSushi's crate split already maps to v2 layers under the directory dialect. The lift to make it pass under v2.1 with the proposed dialects:
163	165
164	166	1. Add `sama.profile.toml` declaring the layer mapping (see profile above). 50 lines, zero code change.
@@ -192,6 +194,7 @@ n=4, three of them hand-estimated, still far from a "v2 is worth following" cl
192	194	See for yourself:
193	195
194	196	- The project: <https://github.com/BurntSushi/ripgrep>
	197	+- The full ripgrep rebuild (companion to this audit): [`ripgrep`, rebuilt under SAMA v2](/blog/sama-v2-rust-project-ripgrep-rebuilt)
195	198	- The Go audit (companion): [Pointing SAMA v2 at `dive`](/blog/sama-v2-go-project-dive)
196	199	- The WP audit + rebuild: [WordPress plugin audit](/blog/sama-v2-wordpress-plugin-audit) · [rebuilt](/blog/sama-v2-wordpress-plugin-rebuilt)
197	200	- The §5 metrics emitter: [Compliance proves the rules followed. Delta proves they were worth following.](/blog/sama-v2-metrics-emitter)

modified src/a31_blog.ts +6 −0

@@ -12,6 +12,12 @@ export interface BlogEntry {
12	12	}
13	13
14	14	export const ALL_POSTS: BlogEntry[] = [
	15	+ {
	16	+ slug: "sama-v2-rust-project-ripgrep-rebuilt",
	17	+ title: "`ripgrep`, rebuilt under SAMA v2 — a thought experiment",
	18	+ description: "The companion to today's ripgrep audit (which scored ~3/7 strict, ~5/7 under three proposed v2.1 dialects). Same parallel-architecture sketch as the dive and WP rebuilds. The sketch is even smaller than dive's: BurntSushi's crate graph already reads like a SAMA v2 layer chart, and the only real code work is two internal file splits — printer/standard.rs (3,987 LOC) into a six-file submodule per output mode, and ignore/walk.rs (2,494 LOC) into a four-file submodule per walker concern. Everything else is the sama.profile.toml declaration plus three v2.1 dialect adoptions: layout='directory' (Sorted-by-crate-graph), tests='inline' (Modeled-tests recognizes #[cfg(test)] mod tests blocks), and atomic_exemption='declarative' (the 7,779-line defs.rs flag catalog exempt because CC/LOC ≈ 0.01). Profile notes resolve the two boundary-borderline cases: serde derives ≠ boundary parsing in the §4.4 sense; the byte-parsing in searcher/line_buffer.rs IS the algorithm. No new tests need to be written. No API breaks. ~1 focused working week of effort vs months for WP and ~10 days for dive — the cost scales inversely with how much architectural care the original author put in. Predicted §5 deltas: workingSetFit ~60% → ~80% (the two splits carry 6,481 LOC into right-sized files), violationCounts ~50 → 0, boundaryRatio ~95% → ~100%, graphDepth unchanged at ~5 (the crate graph itself doesn't change). Four observations the sketch surfaces: (1) the three v2.1 dialects together cleanly absorb Rust without weakening the rules' intent — sibling-vs-inline, prefix-vs-directory, behavioural-vs-declarative are all surface choices that preserve the underlying property each rule protects; (2) cost scales inversely with original author's discipline — BurntSushi did the layering decade ago, the rebuild mostly just writes it down; (3) Cargo ecosystem is cheap §5 baseline data — port the emitter, run across the popular Rust CLI tools (bat, fd, eza, dust, tokei) and n grows from 4 to 20+ in an afternoon; (4) §5 metrics keep their separation from §4 verdict cleanly — same numbers regardless of which spec version produced the verdict.",
	19	+ date: "2026-05-25",
	20	+ },
15	21	{
16	22	slug: "sama-v2-rust-project-ripgrep",
17	23	title: "Pointing SAMA v2 at `ripgrep`: BurntSushi's exemplar surfaces three findings about the spec",

raw .diff