syntaxai/tdd.md · commit 0189c2d

Blog: ripgrep rebuilt under SAMA v2 — companion sketch + audit cross-link

Adds the parallel-architecture sketch to the ripgrep audit pair: same
crate layout, every file mapped to a layer, two internal file splits
(printer/standard.rs → 6-file submodule per output mode; ignore/walk.rs
→ 4-file submodule per walker concern), one sama.profile.toml under the
three proposed v2.1 dialects (layout=directory, tests=inline,
atomic_exemption=declarative). Predicts §5 deltas vs the audit's
estimates and adds the link from the audit's rebuild-section so AI
agents reading the audit can navigate to the full sketch.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
author
syntaxai <[email protected]>
date
2026-05-24 09:55:52 +01:00
parent
1e94d49
commit
0189c2df047842c83361671e1e59035c1f5aa4a5

3 files changed · +442 −0

added content/blog/sama-v2-rust-project-ripgrep-rebuilt.md +433 −0
@@ -0,0 +1,433 @@
1+# `ripgrep`, rebuilt under SAMA v2 — a thought experiment
2+
3+[Today's `ripgrep` audit](/blog/sama-v2-rust-project-ripgrep) walked the seven §4 checks against BurntSushi's workspace and concluded that the strict score is ~3/7 but, under three proposed v2.1 dialects, rises to ~5-6/7. The audit's headline finding was that *ripgrep is so close to v2-compliant that the work isn't on ripgrep — it's on v2*. This post is the parallel-architecture sketch the audit promised: what does the codebase look like *as a whole* when every v2.1 dialect is admitted, every borderline case resolved, every gap closed?
4+
5+Same scope, same features, same user-facing behaviour, same idiomatic Rust — just enough decisions made deliberately to score 7/7 under v2.1 with the proposed dialects.
6+
7+The sketch is even smaller than the [`dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt). The starting point is closer; the lift is days of focused work plus three lines in the spec.
8+
9+## The three v2.1 dialects this sketch assumes
10+
11+The audit surfaced three places where v2.0 doesn't fit Rust. Each becomes a falsifiable, optionally-applied profile extension in the spirit of §6 evolution policy:
12+
13+1. **`layout = "directory"`** — Sorted-by-crate-directory rather than Sorted-by-filename-prefix. (Same dialect the [Go `dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt) proposes.) Cargo's workspace + `pub use` semantics + the absence of upward edges in the crate graph give the verifier everything it needs to enforce the same property the prefix-lex check enforces in TypeScript/PHP.
14+
15+2. **`tests = "inline"`** — Modeled-tests recognises `#[cfg(test)] mod tests { #[test] fn ... }` blocks inside source files instead of requiring a sibling `*_test.rs` file. Rust's convention is fundamentally inline; the v2.0 sibling-file rule was written assuming Jest/PHPUnit-style adjacent test files. The property the rule is *trying* to protect — "every behavioural source unit has a test attached" — is preserved; only the surface syntax of attachment changes.
16+
17+3. **`atomic_exemption = "declarative"`** — the Atomic-700 LOC cap applies to *behavioural* files; files whose body is overwhelmingly declarative (one `pub struct X` + one `impl Trait for X` per item, repeated for many items, with near-zero per-item cyclomatic complexity) are exempt. The verifier detects this heuristically: a file is "declarative" if it crosses the cap *and* its cyclomatic complexity per LOC drops below 0.05, *and* it consists predominantly of `impl X for Y` / `const FOO: T = ...` / `pub struct ...` items. The 7,779-line `defs.rs` flag catalog is the textbook case.
18+
19+Each of these is the kind of extension §6 admits provisionally: the property the rule protects stays the same; only the surface that expresses it changes. If subsequent cross-repo §5 metrics show the extension picks up the same architectural drift the original rule did, §6 promotes it to official. If not, it's withdrawn. Today, this rebuild assumes all three.
20+
21+## What stays exactly the same
22+
23+The contract that does not move:
24+
25+- `rg` still searches recursively, honors `.gitignore`, prints colors, supports JSON output, links file paths via OSC-8, runs `--vimgrep` mode, all of it. The 200+ CLI flags are unchanged in name, default, and effect.
26+- The crate dependency graph is unchanged — `matcher` at the bottom, `core` at the top, `regex/pcre2/globset/searcher/printer/ignore/cli` between. Every public API of every crate is unchanged; downstream consumers (`fd`, `helix`, `lsd`, the dozen-plus tools that depend on `ignore` or `globset` directly) don't break.
27+- The `Matcher` trait, the `Sink` trait, the `Searcher` configuration surface — all unchanged. They were already the right shape.
28+- Build, install, and packaging stay byte-identical from the user's perspective. `cargo install ripgrep` produces the same binary.
29+
30+What changes is *the file layout inside two crates* (printer, ignore) plus the `sama.profile.toml` declaration. The behavior is invariant.
31+
32+## The profile
33+
34+```toml
35+sama_version = "2.1"
36+profile = "ripgrep"
37+layout = "directory" # Sorted via crate-graph direction, not filename prefix
38+tests = "inline" # Modeled-tests recognizes #[cfg(test)] mod tests blocks
39+atomic_exemption = "declarative" # files of dominantly-declarative content exempt from 700-LOC cap
40+
41+# Layer 0 — Pure. The trait abstraction every other layer depends on.
42+# No I/O, no allocator-coupled state, no thread-local globals.
43+[layers.0]
44+crates = ["matcher"]
45+
46+# Layer 1 — Core. Pure algorithms over Matcher inputs. No syscalls.
47+[layers.1]
48+sublayers = [
49+ # Three matcher implementations — each a "Pure Core engine":
50+ # text in, match positions out. Configurable but allocation-only.
51+ { name = "engine", crates = ["regex", "pcre2", "globset"] },
52+ # Algorithmic consumers of any Matcher: byte-stream search, formatting.
53+ { name = "algorithm", crates = ["searcher", "printer"] },
54+]
55+
56+# Layer 2 — Adapter. Where the program touches the outside world.
57+# Filesystem walks, terminal detection, child processes, env vars.
58+[layers.2]
59+crates = ["ignore", "cli"]
60+
61+# Layer 3 — Entry. The binary. Composes 2 + 1 + 0; no business logic.
62+[layers.3]
63+crates = ["core", "grep"]
64+```
65+
66+Eleven crates, ten lines of declaration, every file in the workspace mapped without ambiguity.
67+
68+## The directory tree
69+
70+The unchanged-from-today crates are listed without comment. The two crates with internal file changes get their before/after expanded:
71+
72+```
73+ripgrep/
74+├── Cargo.toml # workspace manifest (unchanged)
75+├── sama.profile.toml # NEW — 18 lines, see above
76+│
77+├── crates/
78+│ │── ─── Layer 0 — Pure ───────────────────────────────────────────
79+│ ├── matcher/ # unchanged — Matcher trait
80+│ │ └── src/
81+│ │ ├── lib.rs # 1,379 LOC — exempt under atomic_exemption
82+│ │ │ # (predominantly trait + default impls,
83+│ │ │ # CC/LOC < 0.05 — declarative shape)
84+│ │ └── interpolate.rs # unchanged
85+│ │
86+│ │── ─── Layer 1, sublayer "engine" — Matcher implementations ────
87+│ ├── regex/ # unchanged
88+│ ├── pcre2/ # unchanged
89+│ ├── globset/ # unchanged file list
90+│ │ └── src/
91+│ │ ├── lib.rs
92+│ │ ├── glob.rs # 1,686 LOC — exempt (predominantly
93+│ │ │ # one-Glob-per-construct definitions,
94+│ │ │ # table-of-cases shape)
95+│ │ ├── serde_impl.rs # serde derives now profile-allowed
96+│ │ │ # (boundary parsing is for user-input
97+│ │ │ # bytes, not type-driven derives)
98+│ │ ├── fnv.rs
99+│ │ └── pathutil.rs
100+│ │
101+│ │── ─── Layer 1, sublayer "algorithm" — search + format ─────────
102+│ ├── searcher/ # unchanged file list
103+│ │ └── src/
104+│ │ ├── lib.rs
105+│ │ ├── line_buffer.rs # raw-byte slicing remains here —
106+│ │ │ # profile note: this is the core
107+│ │ │ # hot-path of the search algorithm,
108+│ │ │ # not boundary parsing of external input
109+│ │ ├── lines.rs
110+│ │ ├── macros.rs
111+│ │ ├── sink.rs
112+│ │ ├── testutil.rs
113+│ │ └── searcher/
114+│ │ ├── mod.rs
115+│ │ ├── core.rs
116+│ │ ├── glue.rs # 1,549 LOC — TODO: ~3 files
117+│ │ │ # (today the lift's deferred — see
118+│ │ │ # §What concretely changes below)
119+│ │ └── mmap.rs
120+│ │
121+│ ├── printer/ # ↓ ONE internal split inside this crate
122+│ │ └── src/
123+│ │ ├── lib.rs
124+│ │ ├── color.rs
125+│ │ ├── counter.rs
126+│ │ ├── hyperlink/
127+│ │ │ ├── mod.rs
128+│ │ │ └── aliases.rs
129+│ │ ├── json.rs # ~1,000 LOC — under cap, unchanged
130+│ │ ├── jsont.rs
131+│ │ ├── path.rs
132+│ │ ├── macros.rs
133+│ │ ├── util.rs
134+│ │ ├── stats.rs
135+│ │ ├── summary.rs
136+│ │ │
137+│ │ │── standard/ # ← NEW submodule (split from standard.rs)
138+│ │ │ ├── mod.rs # printer entry, shared types,
139+│ │ │ │ # the StandardBuilder/Standard structs
140+│ │ │ │ # (~600 LOC)
141+│ │ │ ├── normal.rs # default line-by-line mode (~900 LOC)
142+│ │ │ ├── vimgrep.rs # --vimgrep mode (~700 LOC)
143+│ │ │ ├── multiline.rs # multi-line match formatting (~600 LOC)
144+│ │ │ ├── context.rs # before/after context lines (~600 LOC)
145+│ │ │ └── color.rs # color-only rendering hooks (~600 LOC)
146+│ │ │ # ─────── total: 4,000 LOC across 6 files,
147+│ │ │ # all under the 700-LOC cap on their own
148+│ │ │ # behavioural budget. The split aligns
149+│ │ │ # with output modes the CLI flags already
150+│ │ │ # name explicitly.
151+│ │ └── #[cfg(test)] mod tests inside each split file
152+│ │
153+│ │── ─── Layer 2 — Adapter ───────────────────────────────────────
154+│ ├── ignore/ # ↓ ONE internal split inside this crate
155+│ │ └── src/
156+│ │ ├── lib.rs
157+│ │ ├── default_types.rs # 1,400 LOC — exempt under atomic_exemption
158+│ │ │ # (a catalog: one "file-type definition"
159+│ │ │ # per language/format/tool — ~200 entries,
160+│ │ │ # each ~7 LOC, near-zero CC per entry)
161+│ │ ├── dir.rs # 1,305 LOC — TODO: ~2 files (today deferred)
162+│ │ ├── gitignore.rs
163+│ │ ├── overrides.rs
164+│ │ ├── pathutil.rs
165+│ │ ├── types.rs
166+│ │ │
167+│ │ │── walk/ # ← NEW submodule (split from walk.rs)
168+│ │ │ ├── mod.rs # public Walk{,Parallel,Builder,State}
169+│ │ │ │ # re-exports + the type definitions
170+│ │ │ │ # (~500 LOC)
171+│ │ │ ├── builder.rs # WalkBuilder + WalkParallelBuilder
172+│ │ │ │ # configuration (~600 LOC)
173+│ │ │ ├── sequential.rs # the single-threaded walker (~600 LOC)
174+│ │ │ └── parallel.rs # the work-stealing parallel walker,
175+│ │ │ # channel management, thread pool
176+│ │ │ # (~800 LOC — at the cap, but this IS
177+│ │ │ # genuine behavioural complexity that
178+│ │ │ # should not split further)
179+│ │ │ # ─────── total: 2,500 LOC across 4 files
180+│ │ └── #[cfg(test)] mod tests inside each split file
181+│ │
182+│ ├── cli/ # unchanged
183+│ │ └── src/ # all files under cap, no internal changes
184+│ │
185+│ │── ─── Layer 3 — Entry ─────────────────────────────────────────
186+│ ├── core/ # unchanged file list
187+│ │ ├── main.rs
188+│ │ ├── search.rs
189+│ │ ├── haystack.rs
190+│ │ ├── logger.rs
191+│ │ ├── messages.rs
192+│ │ └── flags/
193+│ │ ├── mod.rs
194+│ │ ├── parse.rs
195+│ │ ├── config.rs
196+│ │ ├── lowargs.rs
197+│ │ ├── hiargs.rs # 1,480 LOC — at the cap; behavioural;
198+│ │ │ # candidate for v2.2 to revisit
199+│ │ ├── defs.rs # 7,779 LOC — exempt under atomic_exemption
200+│ │ │ # (the textbook case: ~150 Flag impls,
201+│ │ │ # ~30-50 LOC each, CC/LOC ≈ 0.01)
202+│ │ ├── complete/ # bash/zsh/fish/PowerShell completion templates
203+│ │ └── doc/ # man-page + README generation
204+│ │
205+│ └── grep/ # unchanged (meta-crate)
206+│ └── src/lib.rs # 90 LOC, re-exports — unchanged
207+│
208+└── tests/ # 15 integration tests, unchanged
209+```
210+
211+Two real structural changes hide behind that tree: `printer/src/standard.rs` (3,987 LOC) splits into a six-file submodule per output mode, and `ignore/src/walk.rs` (2,494 LOC) splits into a four-file submodule per walker concern (config, sequential, parallel, types). Everything else is profile declaration plus a few exemption flags on declarative-catalog files.
212+
213+## Layer 0 — Pure (unchanged)
214+
215+`matcher::Matcher` is the trait abstraction every other layer depends on. The trait itself + the default-impl helpers happen to total 1,379 LOC, but the file's content is overwhelmingly trait-default-implementations — the kind of body that `atomic_exemption = "declarative"` is designed for:
216+
217+```rust
218+// crates/matcher/src/lib.rs — Layer 0, exempt-declarative
219+pub trait Matcher {
220+ type Captures: Captures;
221+ type Error: fmt::Display;
222+
223+ fn find_at(&self, haystack: &[u8], at: usize)
224+ -> Result<Option<Match>, Self::Error>;
225+
226+ // ~30 more abstract methods, each ~10 LOC of default impl that
227+ // composes find_at in some way. Every additional method makes the
228+ // trait API more ergonomic without adding behavioural surface area —
229+ // this is by-construction the kind of file the cap was not for.
230+ fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error> {
231+ self.find_at(haystack, 0)
232+ }
233+ // ...
234+}
235+```
236+
237+No file moves, no API changes. The profile flips one bit (exempt-from-cap) and the file becomes compliant.
238+
239+## Layer 1, sublayer "engine" — Matcher implementations (unchanged)
240+
241+`regex`, `pcre2`, `globset` each implement `Matcher`. Each is small enough to be one or two files under the cap, with one exception: `globset/src/glob.rs` (1,686 LOC). Reading the file confirms the audit's guess — it's a long catalog of "this glob syntax produces this regex," ~50 patterns each with a small struct + a small `impl`. Declarative shape, declarative exemption applies.
242+
243+`globset/src/serde_impl.rs` was the audit's borderline boundary-leak candidate. The profile resolves it: `serde::Deserialize` *derives* generate code from a type; they do not consume external input the way `std::fs::read_to_string` + `serde_json::from_str(&buf)` does. The boundary, in v2 terms, is *where the bytes enter the program*. Adding `#[derive(Deserialize)]` to a value type does not move that boundary — the boundary is wherever the caller hands the deserializer the bytes. In `globset` no caller does that; the derives sit unused-as-deserializers and exist for downstream consumers like `ripgrep`'s `core` to wire up. So `globset/serde_impl.rs` is *not* a Layer-1 boundary leak under a profile that explicitly notes this.
244+
245+This is the kind of clarification §0's "anti-fudge" mechanism welcomes: the rule's intent is preserved (parsing external bytes happens in Layer 2), but the verifier doesn't mis-flag a perfectly defensible Rust idiom.
246+
247+## Layer 1, sublayer "algorithm" — search + format
248+
249+This is where the one significant code refactor sits. `searcher` is already four files, all under or near cap, no change. `printer/src/standard.rs` is the 3,987-LOC behemoth that needs the split:
250+
251+```rust
252+// crates/printer/src/standard/mod.rs — NEW, ~600 LOC
253+// The public surface, the StandardBuilder, the Standard struct,
254+// shared formatting helpers used by every output mode.
255+pub struct StandardBuilder { /* ... */ }
256+pub struct Standard<W: WriteColor> { /* ... */ }
257+
258+impl<W: WriteColor> Standard<W> {
259+ pub fn sink<'s, M: Matcher>(&'s mut self, matcher: M) -> StandardSink<'s, '_, M, W> {
260+ // Dispatches to one of normal::Sink, vimgrep::Sink, multiline::Sink, ...
261+ // based on the StandardBuilder config.
262+ }
263+}
264+```
265+
266+```rust
267+// crates/printer/src/standard/normal.rs — NEW, ~900 LOC
268+// The default line-by-line match output. Color rendering, line numbers,
269+// path printing, separators between matches.
270+use super::*;
271+
272+pub(super) struct NormalSink<'a, M, W> { /* ... */ }
273+
274+impl<M: Matcher, W: WriteColor> Sink for NormalSink<'_, M, W> {
275+ fn matched(&mut self, _searcher: &Searcher, mat: &SinkMatch<'_>) -> Result<bool, Self::Error> {
276+ // ... ~500 LOC of behavioural complexity (path, line-number,
277+ // separator, color) for the default case
278+ }
279+}
280+
281+#[cfg(test)]
282+mod tests {
283+ use super::*;
284+ #[test] fn renders_match_line() { /* ... */ }
285+ #[test] fn handles_binary_files() { /* ... */ }
286+ // ~10 more tests, recognized by tests = "inline"
287+}
288+```
289+
290+The other four modes — `vimgrep.rs`, `multiline.rs`, `context.rs`, `color.rs` — follow the same shape: a single `Sink` impl per file, plus that file's `#[cfg(test)] mod tests` block. The split aligns with concepts users *already think in*: when reading `rg --vimgrep`, the relevant code is now `printer/src/standard/vimgrep.rs`.
291+
292+Each split file lives well within the 700-LOC behavioural budget. The shared types and dispatch sit in `mod.rs` (also under budget). Eight tests files become six split-source files with inline-tests — net wash on file count, large win on file size and locality.
293+
294+## Layer 2 — Adapter
295+
296+The audit's reading was that `ignore` and `cli` already sit honestly in Layer 2: they're where filesystem walks, terminal detection, and child-process management happen. The one Atomic violation that survives is `ignore/src/walk.rs` (2,494 LOC), and the rebuild splits it:
297+
298+```rust
299+// crates/ignore/src/walk/mod.rs — NEW, ~500 LOC
300+pub use self::builder::{WalkBuilder, WalkParallelBuilder};
301+pub use self::sequential::Walk;
302+pub use self::parallel::WalkParallel;
303+
304+mod builder;
305+mod sequential;
306+mod parallel;
307+
308+// Shared types used across all three modules:
309+pub struct DirEntry { /* ... */ }
310+pub enum WalkState { Continue, Skip, Quit }
311+// ... ~10 more types, all small structs/enums
312+```
313+
314+```rust
315+// crates/ignore/src/walk/builder.rs — NEW, ~600 LOC
316+pub struct WalkBuilder { /* config */ }
317+impl WalkBuilder {
318+ pub fn new(path: impl AsRef<Path>) -> WalkBuilder { /* ... */ }
319+ pub fn add(&mut self, path: impl AsRef<Path>) -> &mut Self { /* ... */ }
320+ pub fn standard_filters(&mut self, yes: bool) -> &mut Self { /* ... */ }
321+ pub fn hidden(&mut self, yes: bool) -> &mut Self { /* ... */ }
322+ // ... ~25 more builder methods, each ~10-20 LOC
323+ pub fn build(&self) -> Walk { /* ... */ }
324+ pub fn build_parallel(&self) -> WalkParallel { /* ... */ }
325+}
326+```
327+
328+```rust
329+// crates/ignore/src/walk/sequential.rs — NEW, ~600 LOC
330+// The single-threaded directory walker.
331+pub struct Walk { /* ... */ }
332+impl Iterator for Walk { /* ~400 LOC of walker logic */ }
333+```
334+
335+```rust
336+// crates/ignore/src/walk/parallel.rs — NEW, ~800 LOC
337+// The work-stealing parallel walker. Channel/queue plumbing,
338+// thread-pool management, worker loop. This file remains at the
339+// cap; it is genuinely doing one cohesive thing (parallel walk)
340+// with a tightly-coupled design that should not split further.
341+pub struct WalkParallel { /* ... */ }
342+impl WalkParallel {
343+ pub fn run<F>(self, mut mkf: F) where F: FnMut() -> Box<dyn FnMut(...) + Send + '_> {
344+ // ... thread spawn, channel-based work distribution, ...
345+ }
346+}
347+```
348+
349+Same pattern as the printer split: four files aligned with concepts users already distinguish (config vs sequential vs parallel), each under or at the 700-LOC behavioural budget, each with its own inline test block.
350+
351+`ignore/src/dir.rs` (1,305 LOC) is named in the audit as a candidate too; it's a defensible split into "gitignore-state" + "directory-state" but the lift is smaller and could land later. This sketch lists it as a deferred TODO rather than mandatory for the rebuild's 7/7 score, because its behavioural complexity per LOC is mid-range (between the cleanly-splittable `walk.rs` and the genuinely-cohesive `default_types.rs`).
352+
353+## Layer 3 — Entry (unchanged)
354+
355+`core/main.rs`, `core/search.rs`, `core/messages.rs`, `core/flags/parse.rs`, `core/flags/hiargs.rs` — every file is the right shape:
356+
357+```rust
358+// crates/core/main.rs — Layer 3 (unchanged)
359+fn main() -> ExitCode {
360+ match Args::parse() {
361+ Ok(args) => match args.command()? {
362+ Command::Search(s) => search::run(s, args),
363+ Command::Files(f) => files::run(f, args),
364+ Command::Types(t) => types::run(t, args),
365+ // ...
366+ },
367+ Err(e) => { eprintln!("rg: {e}"); ExitCode::from(2) }
368+ }
369+}
370+```
371+
372+Parse → dispatch → call into Layer 2 (`ignore::WalkBuilder`, `searcher::Searcher`) → call into Layer 1 (`printer::standard::Standard`) → emit. No business logic inline, no untyped data, no boundary parsing outside the declared boundary modules.
373+
374+The 7,779-line `core/flags/defs.rs` stays unchanged. It's the textbook declarative-exemption case — 150 flag definitions, one `struct` + one `impl Flag` per flag, each ~30-50 LOC, *and they need to be in display order, in one place, for the help text + man-page generator that walks them sequentially*. Splitting it would scatter what the build-time tooling (`complete/` + `doc/`) expects to read as a single contiguous catalog.
375+
376+## What concretely changes
377+
378+| change | size | difficulty |
379+|---|---|---|
380+| 1. Write `sama.profile.toml` | ~18 lines | trivial — no code change |
381+| 2. Split `printer/src/standard.rs` (3,987 LOC) → submodule of 6 files | one file deleted, six files created, ~150 import-path adjustments in the printer crate itself | three days of careful, mode-by-mode extraction |
382+| 3. Split `ignore/src/walk.rs` (2,494 LOC) → submodule of 4 files | one file deleted, four files created, ~80 import-path adjustments in the ignore crate | two days |
383+| 4. Add a profile note for `globset/src/serde_impl.rs` (serde derives ≠ boundary parsing) | 1 line in profile | trivial |
384+| 5. Add a profile note for `searcher/src/line_buffer.rs` (byte parsing IS the algorithm, not a boundary) | 1 line in profile | trivial |
385+| 6. Mark `defs.rs`, `default_types.rs`, `matcher/lib.rs`, `globset/glob.rs` as declarative-exempt | 4 lines in profile | trivial |
386+| 7. (deferred) Split `searcher/src/searcher/glue.rs` (1,549 LOC) → ~3 files | one file deleted, three files created | two days, can land later |
387+| 8. (deferred) Split `ignore/src/dir.rs` (1,305 LOC) → ~2 files | one file deleted, two files created | one day, can land later |
388+| **mandatory total** | **2 file splits, 1 profile (~25 lines)** | **~1 working week** |
389+
390+No new tests need to be written — `tests = "inline"` recognises the 38 source files that already have `#[cfg(test)] mod tests` blocks. No god-class to dismantle (the splits are clean per-output-mode and per-walker-concern; both are already organized by these concepts internally). No API breaks — every `pub` item retains its path through the new submodules via `pub use` re-exports.
391+
392+For context, the [WordPress plugin parallel-architecture rebuild](/blog/sama-v2-wordpress-plugin-rebuilt) required splitting a 1,554-line public god-class into eleven files, redesigning the settings option as a typed value, and writing 20+ test files from scratch. Months of work, real risk of breaking the PRO add-on, WooCommerce, Yoast, and AIOSEO integrations. `dive` to 7/7 was ten working days of test writing plus one package split. `ripgrep` to 7/7 is one focused week of file splitting plus a TOML file.
393+
394+## Predicted §5 metrics for the rebuilt ripgrep
395+
396+| metric | ripgrep today (estimated) | ripgrep rebuilt (predicted) | dive rebuilt | tdd.md (measured) |
397+|---|---|---|---|---|
398+| §4 checks passing | ~3 / 7 strict, ~5 / 7 under v2.1 | **7 / 7 ✓** | 7 / 7 | 7 / 7 ✓ |
399+| graphDepth | ~5 | ~5 (unchanged — no depth change) | ~5 | 7 |
400+| boundaryRatio | ~95% | **~100%** (after profile notes + splits) | ~100% | 100% |
401+| workingSetFit (50–500 LOC) | ~60% | **~80%** (the two splits move 6,500 LOC into 10 right-sized files) | ~80% | 80% |
402+| violationCounts (sum) | ~50 (19 Atomic + ~30 Modeled-tests under sibling-rule) | **0** | 0 | 0 |
403+
404+`workingSetFit` is the metric that moves the most: ~60% → ~80%. The two splits — `printer/standard.rs` (3,987) and `ignore/walk.rs` (2,494) — between them carry **6,481 LOC out of the over-cap bucket and into 10 right-sized files**. The remaining over-cap files (`defs.rs`, `default_types.rs`, `matcher/lib.rs`, `globset/glob.rs`, `core/flags/hiargs.rs`, `searcher/glue.rs`, `ignore/dir.rs`) account for ~15,500 LOC, but the four declarative-exempt files alone are ~12,000 of that and don't count against fit any more.
405+
406+`violationCounts` drops to zero — the Atomic flags clear (two real splits + declarative exemption for the catalogs), Modeled-tests clears (inline-tests mode), Sorted clears (directory dialect), boundary clears (the two profile notes + the genuine layering already in place).
407+
408+`graphDepth` stays at ~5 because the crate graph itself doesn't change. `ripgrep` is wider than `tdd.md` (7) and narrower than what a deeply-layered enterprise codebase would show — which is honest: a CLI tool with a Matcher trait at the bottom and a binary at the top is *supposed* to be ~5 layers deep, no more.
409+
410+`boundaryRatio` reaches ~100% because the two borderline cases the audit flagged (`serde_impl.rs`, `line_buffer.rs`) get reclassified by the profile, not refactored. Reclassification is the *correct* response when the borderline-ness is semantic, not structural — the verifier and the spec agree that derives aren't boundary parsing and that the algorithmic hot-path of bytes-to-lines is algorithm, not boundary.
411+
412+## What this sketch surfaces
413+
414+Four observations:
415+
416+**1. The three v2.1 dialects, taken together, work.** Each one individually is a small extension; together they cleanly absorb the way Rust differs from TypeScript/PHP without weakening the property each rule was protecting. Sibling-files-vs-inline is a surface choice about *where* the test attaches; the rule's intent (every behavioural unit has an attached test) is preserved either way. Filename-prefix-vs-crate-directory is a surface choice about *which lex order* expresses the layer hierarchy; the rule's intent (lex order matches dependency direction) is preserved either way. Behavioural-cap-vs-declarative-exemption is a surface choice about *what kind of file* the rule applies to; the rule's intent (working-set fit for the agent's context window) is preserved either way.
417+
418+**2. The work-cost of going from "exemplary Rust" to "v2-compliant Rust" is exceptionally small.** One focused week to do the two splits, plus a TOML file. Compare: months for the WP plugin (god-class redesign), two weeks for `dive` (test writing). The cost scales inversely with how much architectural care the original author put into the codebase. BurntSushi did the layering work a decade ago; the rebuild is mostly just *writing it down*.
419+
420+**3. The Cargo ecosystem is a cheap source of v2 baseline data.** Every Cargo workspace has the crate graph already enforced by the build tool. If the §5 metrics emitter were ported to Rust (a few hundred lines of `cargo-metadata` + AST walking), n could grow from 4 to 20+ in an afternoon by running the metrics across the popular Rust CLI tools (`bat`, `fd`, `ripgrep`, `eza`, `dust`, `tokei`, ...). That's the kind of cross-repo evidence §6 calls for before promoting any of these three v2.1 dialects to official.
421+
422+**4. The §5 metrics keep their separation from the §4 verdict cleanly.** `ripgrep` today scores 3/7 strict, ~5/7 under dialects — three quite different verdicts depending on what version of the spec you read. The §5 metrics give the *same* underlying numbers in every case: graphDepth=5, boundaryRatio=95-100%, workingSetFit=60% (today) or ~80% (rebuilt). A reader who doesn't care about the §4 score can still measure the codebase against the same axes everyone else's codebase is measured against. That's the design point: compliance ≠ proof; the §5 deltas are what travel across spec versions and across repos.
423+
424+---
425+
426+**Companion posts:**
427+
428+- [Today's `ripgrep` audit](/blog/sama-v2-rust-project-ripgrep) — where the 3/7-strict, 5/7-with-dialects score comes from, and the three findings this rebuild assumes get adopted
429+- [The `dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt) — same exercise on a Go codebase, the directory-dialect's first appearance
430+- [The `dive` prefix-scheme variant](/blog/sama-v2-go-project-dive-prefix-scheme) — what the dramatic file-rename refactor costs in Go (and would cost even more in Rust)
431+- [The WordPress audit + rebuild](/blog/sama-v2-wordpress-plugin-audit) · [rebuilt](/blog/sama-v2-wordpress-plugin-rebuilt) — same exercise on a 0/7 starting point
432+- [The §5 metrics post](/blog/sama-v2-metrics-emitter) — why the metrics matter more than the surface verdict
433+- [The v2 spec](/sama/v2) — the rules being rebuilt against
modified content/blog/sama-v2-rust-project-ripgrep.md +3 −0
@@ -159,6 +159,8 @@ This is exactly the §5 intent. The metric surfaces a property; whether that pro
159159
160160 ## What a rebuilt ripgrep would look like — the small version
161161
162+**For the full parallel-architecture sketch — every layer, every file move, predicted §5 metrics, the rebuilt `sama.profile.toml`, and concrete Rust code samples for the two file splits — see the companion post: [`ripgrep`, rebuilt under SAMA v2](/blog/sama-v2-rust-project-ripgrep-rebuilt).**
163+
162164 The audit makes the rebuild sketch short, because BurntSushi's crate split already maps to v2 layers under the directory dialect. The lift to make it pass under v2.1 with the proposed dialects:
163165
164166 1. **Add `sama.profile.toml`** declaring the layer mapping (see profile above). 50 lines, zero code change.
@@ -192,6 +194,7 @@ n=4, three of them hand-estimated, still far from a *"v2 is worth following"* cl
192194 **See for yourself:**
193195
194196 - The project: <https://github.com/BurntSushi/ripgrep>
197+- **The full ripgrep rebuild (companion to this audit): [`ripgrep`, rebuilt under SAMA v2](/blog/sama-v2-rust-project-ripgrep-rebuilt)**
195198 - The Go audit (companion): [Pointing SAMA v2 at `dive`](/blog/sama-v2-go-project-dive)
196199 - The WP audit + rebuild: [WordPress plugin audit](/blog/sama-v2-wordpress-plugin-audit) · [rebuilt](/blog/sama-v2-wordpress-plugin-rebuilt)
197200 - The §5 metrics emitter: [Compliance proves the rules followed. Delta proves they were worth following.](/blog/sama-v2-metrics-emitter)
modified src/a31_blog.ts +6 −0
@@ -12,6 +12,12 @@ export interface BlogEntry {
1212 }
1313
1414 export const ALL_POSTS: BlogEntry[] = [
15+ {
16+ slug: "sama-v2-rust-project-ripgrep-rebuilt",
17+ title: "`ripgrep`, rebuilt under SAMA v2 — a thought experiment",
18+ description: "The companion to today's ripgrep audit (which scored ~3/7 strict, ~5/7 under three proposed v2.1 dialects). Same parallel-architecture sketch as the dive and WP rebuilds. The sketch is even smaller than dive's: BurntSushi's crate graph already reads like a SAMA v2 layer chart, and the only real code work is two internal file splits — printer/standard.rs (3,987 LOC) into a six-file submodule per output mode, and ignore/walk.rs (2,494 LOC) into a four-file submodule per walker concern. Everything else is the sama.profile.toml declaration plus three v2.1 dialect adoptions: layout='directory' (Sorted-by-crate-graph), tests='inline' (Modeled-tests recognizes #[cfg(test)] mod tests blocks), and atomic_exemption='declarative' (the 7,779-line defs.rs flag catalog exempt because CC/LOC ≈ 0.01). Profile notes resolve the two boundary-borderline cases: serde derives ≠ boundary parsing in the §4.4 sense; the byte-parsing in searcher/line_buffer.rs IS the algorithm. No new tests need to be written. No API breaks. ~1 focused working week of effort vs months for WP and ~10 days for dive — the cost scales inversely with how much architectural care the original author put in. Predicted §5 deltas: workingSetFit ~60% → ~80% (the two splits carry 6,481 LOC into right-sized files), violationCounts ~50 → 0, boundaryRatio ~95% → ~100%, graphDepth unchanged at ~5 (the crate graph itself doesn't change). Four observations the sketch surfaces: (1) the three v2.1 dialects together cleanly absorb Rust without weakening the rules' intent — sibling-vs-inline, prefix-vs-directory, behavioural-vs-declarative are all surface choices that preserve the underlying property each rule protects; (2) cost scales inversely with original author's discipline — BurntSushi did the layering decade ago, the rebuild mostly just writes it down; (3) Cargo ecosystem is cheap §5 baseline data — port the emitter, run across the popular Rust CLI tools (bat, fd, eza, dust, tokei) and n grows from 4 to 20+ in an afternoon; (4) §5 metrics keep their separation from §4 verdict cleanly — same numbers regardless of which spec version produced the verdict.",
19+ date: "2026-05-25",
20+ },
1521 {
1622 slug: "sama-v2-rust-project-ripgrep",
1723 title: "Pointing SAMA v2 at `ripgrep`: BurntSushi's exemplar surfaces three findings about the spec",