ripgrep, rebuilt under SAMA v2 — a thought experiment
Today's ripgrep audit walked the seven §4 checks against BurntSushi's workspace and concluded that the strict score is ~3/7 but, under three proposed v2.1 dialects, rises to ~5-6/7. The audit's headline finding was that ripgrep is so close to v2-compliant that the work isn't on ripgrep — it's on v2. This post is the parallel-architecture sketch the audit promised: what does the codebase look like as a whole when every v2.1 dialect is admitted, every borderline case resolved, every gap closed?
Same scope, same features, same user-facing behaviour, same idiomatic Rust — just enough decisions made deliberately to score 7/7 under v2.1 with the proposed dialects.
The sketch is even smaller than the dive rebuild. The starting point is closer; the lift is days of focused work plus three lines in the spec.
#The three v2.1 dialects this sketch assumes
(Update: all three dialects have since been drafted formally into /sama/v2 §6.A v2.1 dialects as opt-in profile extensions, each with operational definitions, the property each preserves, and the falsifiable cross-repo experiment that would invalidate it. The bullets below are the original informal proposal.)
The audit surfaced three places where v2.0 doesn't fit Rust. Each becomes a falsifiable, optionally-applied profile extension in the spirit of §6 evolution policy:
layout = "directory"— Sorted-by-crate-directory rather than Sorted-by-filename-prefix. (Same dialect the Godiverebuild proposes.) Cargo's workspace +pub usesemantics + the absence of upward edges in the crate graph give the verifier everything it needs to enforce the same property the prefix-lex check enforces in TypeScript/PHP.tests = "inline"— Modeled-tests recognises#[cfg(test)] mod tests { #[test] fn ... }blocks inside source files instead of requiring a sibling*_test.rsfile. Rust's convention is fundamentally inline; the v2.0 sibling-file rule was written assuming Jest/PHPUnit-style adjacent test files. The property the rule is trying to protect — "every behavioural source unit has a test attached" — is preserved; only the surface syntax of attachment changes.atomic_exemption = "declarative"— the Atomic-700 LOC cap applies to behavioural files; files whose body is overwhelmingly declarative (onepub struct X+ oneimpl Trait for Xper item, repeated for many items, with near-zero per-item cyclomatic complexity) are exempt. The verifier detects this heuristically: a file is "declarative" if it crosses the cap and its cyclomatic complexity per LOC drops below 0.05, and it consists predominantly ofimpl X for Y/const FOO: T = .../pub struct ...items. The 7,779-linedefs.rsflag catalog is the textbook case.
Each of these is the kind of extension §6 admits provisionally: the property the rule protects stays the same; only the surface that expresses it changes. If subsequent cross-repo §5 metrics show the extension picks up the same architectural drift the original rule did, §6 promotes it to official. If not, it's withdrawn. Today, this rebuild assumes all three.
#What stays exactly the same
The contract that does not move:
rgstill searches recursively, honors.gitignore, prints colors, supports JSON output, links file paths via OSC-8, runs--vimgrepmode, all of it. The 200+ CLI flags are unchanged in name, default, and effect.- The crate dependency graph is unchanged —
matcherat the bottom,coreat the top,regex/pcre2/globset/searcher/printer/ignore/clibetween. Every public API of every crate is unchanged; downstream consumers (fd,helix,lsd, the dozen-plus tools that depend onignoreorglobsetdirectly) don't break. - The
Matchertrait, theSinktrait, theSearcherconfiguration surface — all unchanged. They were already the right shape. - Build, install, and packaging stay byte-identical from the user's perspective.
cargo install ripgrepproduces the same binary.
What changes is the file layout inside two crates (printer, ignore) plus the sama.profile.toml declaration. The behavior is invariant.
#The profile
sama_version = "2.1"
profile = "ripgrep"
layout = "directory" # Sorted via crate-graph direction, not filename prefix
tests = "inline" # Modeled-tests recognizes #[cfg(test)] mod tests blocks
atomic_exemption = "declarative" # files of dominantly-declarative content exempt from 700-LOC cap
# Layer 0 — Pure. The trait abstraction every other layer depends on.
# No I/O, no allocator-coupled state, no thread-local globals.
[layers.0]
crates = ["matcher"]
# Layer 1 — Core. Pure algorithms over Matcher inputs. No syscalls.
[layers.1]
sublayers = [
# Three matcher implementations — each a "Pure Core engine":
# text in, match positions out. Configurable but allocation-only.
{ name = "engine", crates = ["regex", "pcre2", "globset"] },
# Algorithmic consumers of any Matcher: byte-stream search, formatting.
{ name = "algorithm", crates = ["searcher", "printer"] },
]
# Layer 2 — Adapter. Where the program touches the outside world.
# Filesystem walks, terminal detection, child processes, env vars.
[layers.2]
crates = ["ignore", "cli"]
# Layer 3 — Entry. The binary. Composes 2 + 1 + 0; no business logic.
[layers.3]
crates = ["core", "grep"]
Eleven crates, ten lines of declaration, every file in the workspace mapped without ambiguity.
#The directory tree
The unchanged-from-today crates are listed without comment. The two crates with internal file changes get their before/after expanded:
ripgrep/
├── Cargo.toml # workspace manifest (unchanged)
├── sama.profile.toml # NEW — 18 lines, see above
│
├── crates/
│ │── ─── Layer 0 — Pure ───────────────────────────────────────────
│ ├── matcher/ # unchanged — Matcher trait
│ │ └── src/
│ │ ├── lib.rs # 1,379 LOC — exempt under atomic_exemption
│ │ │ # (predominantly trait + default impls,
│ │ │ # CC/LOC < 0.05 — declarative shape)
│ │ └── interpolate.rs # unchanged
│ │
│ │── ─── Layer 1, sublayer "engine" — Matcher implementations ────
│ ├── regex/ # unchanged
│ ├── pcre2/ # unchanged
│ ├── globset/ # unchanged file list
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── glob.rs # 1,686 LOC — exempt (predominantly
│ │ │ # one-Glob-per-construct definitions,
│ │ │ # table-of-cases shape)
│ │ ├── serde_impl.rs # serde derives now profile-allowed
│ │ │ # (boundary parsing is for user-input
│ │ │ # bytes, not type-driven derives)
│ │ ├── fnv.rs
│ │ └── pathutil.rs
│ │
│ │── ─── Layer 1, sublayer "algorithm" — search + format ─────────
│ ├── searcher/ # unchanged file list
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── line_buffer.rs # raw-byte slicing remains here —
│ │ │ # profile note: this is the core
│ │ │ # hot-path of the search algorithm,
│ │ │ # not boundary parsing of external input
│ │ ├── lines.rs
│ │ ├── macros.rs
│ │ ├── sink.rs
│ │ ├── testutil.rs
│ │ └── searcher/
│ │ ├── mod.rs
│ │ ├── core.rs
│ │ ├── glue.rs # 1,549 LOC — TODO: ~3 files
│ │ │ # (today the lift's deferred — see
│ │ │ # §What concretely changes below)
│ │ └── mmap.rs
│ │
│ ├── printer/ # ↓ ONE internal split inside this crate
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── color.rs
│ │ ├── counter.rs
│ │ ├── hyperlink/
│ │ │ ├── mod.rs
│ │ │ └── aliases.rs
│ │ ├── json.rs # ~1,000 LOC — under cap, unchanged
│ │ ├── jsont.rs
│ │ ├── path.rs
│ │ ├── macros.rs
│ │ ├── util.rs
│ │ ├── stats.rs
│ │ ├── summary.rs
│ │ │
│ │ │── standard/ # ← NEW submodule (split from standard.rs)
│ │ │ ├── mod.rs # printer entry, shared types,
│ │ │ │ # the StandardBuilder/Standard structs
│ │ │ │ # (~600 LOC)
│ │ │ ├── normal.rs # default line-by-line mode (~900 LOC)
│ │ │ ├── vimgrep.rs # --vimgrep mode (~700 LOC)
│ │ │ ├── multiline.rs # multi-line match formatting (~600 LOC)
│ │ │ ├── context.rs # before/after context lines (~600 LOC)
│ │ │ └── color.rs # color-only rendering hooks (~600 LOC)
│ │ │ # ─────── total: 4,000 LOC across 6 files,
│ │ │ # all under the 700-LOC cap on their own
│ │ │ # behavioural budget. The split aligns
│ │ │ # with output modes the CLI flags already
│ │ │ # name explicitly.
│ │ └── #[cfg(test)] mod tests inside each split file
│ │
│ │── ─── Layer 2 — Adapter ───────────────────────────────────────
│ ├── ignore/ # ↓ ONE internal split inside this crate
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── default_types.rs # 1,400 LOC — exempt under atomic_exemption
│ │ │ # (a catalog: one "file-type definition"
│ │ │ # per language/format/tool — ~200 entries,
│ │ │ # each ~7 LOC, near-zero CC per entry)
│ │ ├── dir.rs # 1,305 LOC — TODO: ~2 files (today deferred)
│ │ ├── gitignore.rs
│ │ ├── overrides.rs
│ │ ├── pathutil.rs
│ │ ├── types.rs
│ │ │
│ │ │── walk/ # ← NEW submodule (split from walk.rs)
│ │ │ ├── mod.rs # public Walk{,Parallel,Builder,State}
│ │ │ │ # re-exports + the type definitions
│ │ │ │ # (~500 LOC)
│ │ │ ├── builder.rs # WalkBuilder + WalkParallelBuilder
│ │ │ │ # configuration (~600 LOC)
│ │ │ ├── sequential.rs # the single-threaded walker (~600 LOC)
│ │ │ └── parallel.rs # the work-stealing parallel walker,
│ │ │ # channel management, thread pool
│ │ │ # (~800 LOC — at the cap, but this IS
│ │ │ # genuine behavioural complexity that
│ │ │ # should not split further)
│ │ │ # ─────── total: 2,500 LOC across 4 files
│ │ └── #[cfg(test)] mod tests inside each split file
│ │
│ ├── cli/ # unchanged
│ │ └── src/ # all files under cap, no internal changes
│ │
│ │── ─── Layer 3 — Entry ─────────────────────────────────────────
│ ├── core/ # unchanged file list
│ │ ├── main.rs
│ │ ├── search.rs
│ │ ├── haystack.rs
│ │ ├── logger.rs
│ │ ├── messages.rs
│ │ └── flags/
│ │ ├── mod.rs
│ │ ├── parse.rs
│ │ ├── config.rs
│ │ ├── lowargs.rs
│ │ ├── hiargs.rs # 1,480 LOC — at the cap; behavioural;
│ │ │ # candidate for v2.2 to revisit
│ │ ├── defs.rs # 7,779 LOC — exempt under atomic_exemption
│ │ │ # (the textbook case: ~150 Flag impls,
│ │ │ # ~30-50 LOC each, CC/LOC ≈ 0.01)
│ │ ├── complete/ # bash/zsh/fish/PowerShell completion templates
│ │ └── doc/ # man-page + README generation
│ │
│ └── grep/ # unchanged (meta-crate)
│ └── src/lib.rs # 90 LOC, re-exports — unchanged
│
└── tests/ # 15 integration tests, unchanged
Two real structural changes hide behind that tree: printer/src/standard.rs (3,987 LOC) splits into a six-file submodule per output mode, and ignore/src/walk.rs (2,494 LOC) splits into a four-file submodule per walker concern (config, sequential, parallel, types). Everything else is profile declaration plus a few exemption flags on declarative-catalog files.
#Layer 0 — Pure (unchanged)
matcher::Matcher is the trait abstraction every other layer depends on. The trait itself + the default-impl helpers happen to total 1,379 LOC, but the file's content is overwhelmingly trait-default-implementations — the kind of body that atomic_exemption = "declarative" is designed for:
// crates/matcher/src/lib.rs — Layer 0, exempt-declarative
pub trait Matcher {
type Captures: Captures;
type Error: fmt::Display;
fn find_at(&self, haystack: &[u8], at: usize)
-> Result<Option<Match>, Self::Error>;
// ~30 more abstract methods, each ~10 LOC of default impl that
// composes find_at in some way. Every additional method makes the
// trait API more ergonomic without adding behavioural surface area —
// this is by-construction the kind of file the cap was not for.
fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error> {
self.find_at(haystack, 0)
}
// ...
}
No file moves, no API changes. The profile flips one bit (exempt-from-cap) and the file becomes compliant.
#Layer 1, sublayer "engine" — Matcher implementations (unchanged)
regex, pcre2, globset each implement Matcher. Each is small enough to be one or two files under the cap, with one exception: globset/src/glob.rs (1,686 LOC). Reading the file confirms the audit's guess — it's a long catalog of "this glob syntax produces this regex," ~50 patterns each with a small struct + a small impl. Declarative shape, declarative exemption applies.
globset/src/serde_impl.rs was the audit's borderline boundary-leak candidate. The profile resolves it: serde::Deserialize derives generate code from a type; they do not consume external input the way std::fs::read_to_string + serde_json::from_str(&buf) does. The boundary, in v2 terms, is where the bytes enter the program. Adding #[derive(Deserialize)] to a value type does not move that boundary — the boundary is wherever the caller hands the deserializer the bytes. In globset no caller does that; the derives sit unused-as-deserializers and exist for downstream consumers like ripgrep's core to wire up. So globset/serde_impl.rs is not a Layer-1 boundary leak under a profile that explicitly notes this.
This is the kind of clarification §0's "anti-fudge" mechanism welcomes: the rule's intent is preserved (parsing external bytes happens in Layer 2), but the verifier doesn't mis-flag a perfectly defensible Rust idiom.
#Layer 1, sublayer "algorithm" — search + format
This is where the one significant code refactor sits. searcher is already four files, all under or near cap, no change. printer/src/standard.rs is the 3,987-LOC behemoth that needs the split:
// crates/printer/src/standard/mod.rs — NEW, ~600 LOC
// The public surface, the StandardBuilder, the Standard struct,
// shared formatting helpers used by every output mode.
pub struct StandardBuilder { /* ... */ }
pub struct Standard<W: WriteColor> { /* ... */ }
impl<W: WriteColor> Standard<W> {
pub fn sink<'s, M: Matcher>(&'s mut self, matcher: M) -> StandardSink<'s, '_, M, W> {
// Dispatches to one of normal::Sink, vimgrep::Sink, multiline::Sink, ...
// based on the StandardBuilder config.
}
}
// crates/printer/src/standard/normal.rs — NEW, ~900 LOC
// The default line-by-line match output. Color rendering, line numbers,
// path printing, separators between matches.
use super::*;
pub(super) struct NormalSink<'a, M, W> { /* ... */ }
impl<M: Matcher, W: WriteColor> Sink for NormalSink<'_, M, W> {
fn matched(&mut self, _searcher: &Searcher, mat: &SinkMatch<'_>) -> Result<bool, Self::Error> {
// ... ~500 LOC of behavioural complexity (path, line-number,
// separator, color) for the default case
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test] fn renders_match_line() { /* ... */ }
#[test] fn handles_binary_files() { /* ... */ }
// ~10 more tests, recognized by tests = "inline"
}
The other four modes — vimgrep.rs, multiline.rs, context.rs, color.rs — follow the same shape: a single Sink impl per file, plus that file's #[cfg(test)] mod tests block. The split aligns with concepts users already think in: when reading rg --vimgrep, the relevant code is now printer/src/standard/vimgrep.rs.
Each split file lives well within the 700-LOC behavioural budget. The shared types and dispatch sit in mod.rs (also under budget). Eight tests files become six split-source files with inline-tests — net wash on file count, large win on file size and locality.
#Layer 2 — Adapter
The audit's reading was that ignore and cli already sit honestly in Layer 2: they're where filesystem walks, terminal detection, and child-process management happen. The one Atomic violation that survives is ignore/src/walk.rs (2,494 LOC), and the rebuild splits it:
// crates/ignore/src/walk/mod.rs — NEW, ~500 LOC
pub use self::builder::{WalkBuilder, WalkParallelBuilder};
pub use self::sequential::Walk;
pub use self::parallel::WalkParallel;
mod builder;
mod sequential;
mod parallel;
// Shared types used across all three modules:
pub struct DirEntry { /* ... */ }
pub enum WalkState { Continue, Skip, Quit }
// ... ~10 more types, all small structs/enums
// crates/ignore/src/walk/builder.rs — NEW, ~600 LOC
pub struct WalkBuilder { /* config */ }
impl WalkBuilder {
pub fn new(path: impl AsRef<Path>) -> WalkBuilder { /* ... */ }
pub fn add(&mut self, path: impl AsRef<Path>) -> &mut Self { /* ... */ }
pub fn standard_filters(&mut self, yes: bool) -> &mut Self { /* ... */ }
pub fn hidden(&mut self, yes: bool) -> &mut Self { /* ... */ }
// ... ~25 more builder methods, each ~10-20 LOC
pub fn build(&self) -> Walk { /* ... */ }
pub fn build_parallel(&self) -> WalkParallel { /* ... */ }
}
// crates/ignore/src/walk/sequential.rs — NEW, ~600 LOC
// The single-threaded directory walker.
pub struct Walk { /* ... */ }
impl Iterator for Walk { /* ~400 LOC of walker logic */ }
// crates/ignore/src/walk/parallel.rs — NEW, ~800 LOC
// The work-stealing parallel walker. Channel/queue plumbing,
// thread-pool management, worker loop. This file remains at the
// cap; it is genuinely doing one cohesive thing (parallel walk)
// with a tightly-coupled design that should not split further.
pub struct WalkParallel { /* ... */ }
impl WalkParallel {
pub fn run<F>(self, mut mkf: F) where F: FnMut() -> Box<dyn FnMut(...) + Send + '_> {
// ... thread spawn, channel-based work distribution, ...
}
}
Same pattern as the printer split: four files aligned with concepts users already distinguish (config vs sequential vs parallel), each under or at the 700-LOC behavioural budget, each with its own inline test block.
ignore/src/dir.rs (1,305 LOC) is named in the audit as a candidate too; it's a defensible split into "gitignore-state" + "directory-state" but the lift is smaller and could land later. This sketch lists it as a deferred TODO rather than mandatory for the rebuild's 7/7 score, because its behavioural complexity per LOC is mid-range (between the cleanly-splittable walk.rs and the genuinely-cohesive default_types.rs).
#Layer 3 — Entry (unchanged)
core/main.rs, core/search.rs, core/messages.rs, core/flags/parse.rs, core/flags/hiargs.rs — every file is the right shape:
// crates/core/main.rs — Layer 3 (unchanged)
fn main() -> ExitCode {
match Args::parse() {
Ok(args) => match args.command()? {
Command::Search(s) => search::run(s, args),
Command::Files(f) => files::run(f, args),
Command::Types(t) => types::run(t, args),
// ...
},
Err(e) => { eprintln!("rg: {e}"); ExitCode::from(2) }
}
}
Parse → dispatch → call into Layer 2 (ignore::WalkBuilder, searcher::Searcher) → call into Layer 1 (printer::standard::Standard) → emit. No business logic inline, no untyped data, no boundary parsing outside the declared boundary modules.
The 7,779-line core/flags/defs.rs stays unchanged. It's the textbook declarative-exemption case — 150 flag definitions, one struct + one impl Flag per flag, each ~30-50 LOC, and they need to be in display order, in one place, for the help text + man-page generator that walks them sequentially. Splitting it would scatter what the build-time tooling (complete/ + doc/) expects to read as a single contiguous catalog.
#What concretely changes
| change | size | difficulty |
|---|---|---|
1. Write sama.profile.toml |
~18 lines | trivial — no code change |
2. Split printer/src/standard.rs (3,987 LOC) → submodule of 6 files |
one file deleted, six files created, ~150 import-path adjustments in the printer crate itself | three days of careful, mode-by-mode extraction |
3. Split ignore/src/walk.rs (2,494 LOC) → submodule of 4 files |
one file deleted, four files created, ~80 import-path adjustments in the ignore crate | two days |
4. Add a profile note for globset/src/serde_impl.rs (serde derives ≠ boundary parsing) |
1 line in profile | trivial |
5. Add a profile note for searcher/src/line_buffer.rs (byte parsing IS the algorithm, not a boundary) |
1 line in profile | trivial |
6. Mark defs.rs, default_types.rs, matcher/lib.rs, globset/glob.rs as declarative-exempt |
4 lines in profile | trivial |
7. (deferred) Split searcher/src/searcher/glue.rs (1,549 LOC) → ~3 files |
one file deleted, three files created | two days, can land later |
8. (deferred) Split ignore/src/dir.rs (1,305 LOC) → ~2 files |
one file deleted, two files created | one day, can land later |
| mandatory total | 2 file splits, 1 profile (~25 lines) | ~1 working week |
No new tests need to be written — tests = "inline" recognises the 38 source files that already have #[cfg(test)] mod tests blocks. No god-class to dismantle (the splits are clean per-output-mode and per-walker-concern; both are already organized by these concepts internally). No API breaks — every pub item retains its path through the new submodules via pub use re-exports.
For context, the WordPress plugin parallel-architecture rebuild required splitting a 1,554-line public god-class into eleven files, redesigning the settings option as a typed value, and writing 20+ test files from scratch. Months of work, real risk of breaking the PRO add-on, WooCommerce, Yoast, and AIOSEO integrations. dive to 7/7 was ten working days of test writing plus one package split. ripgrep to 7/7 is one focused week of file splitting plus a TOML file.
(One focused week is the serial-human number. For the parallel-fleet projection that divides this estimate by SAMA-mechanical work-package boundaries and lands on ~8 wall-clock hours, see the companion post.)
#Predicted §5 metrics for the rebuilt ripgrep
| metric | ripgrep today (estimated) | ripgrep rebuilt (predicted) | dive rebuilt | tdd.md (measured) |
|---|---|---|---|---|
| §4 checks passing | ~3 / 7 strict, ~5 / 7 under v2.1 | 7 / 7 ✓ | 7 / 7 | 7 / 7 ✓ |
| graphDepth | ~5 | ~5 (unchanged — no depth change) | ~5 | 7 |
| boundaryRatio | ~95% | ~100% (after profile notes + splits) | ~100% | 100% |
| workingSetFit (50–500 LOC) | ~60% | ~80% (the two splits move 6,500 LOC into 10 right-sized files) | ~80% | 80% |
| violationCounts (sum) | ~50 (19 Atomic + ~30 Modeled-tests under sibling-rule) | 0 | 0 | 0 |
workingSetFit is the metric that moves the most: ~60% → ~80%. The two splits — printer/standard.rs (3,987) and ignore/walk.rs (2,494) — between them carry 6,481 LOC out of the over-cap bucket and into 10 right-sized files. The remaining over-cap files (defs.rs, default_types.rs, matcher/lib.rs, globset/glob.rs, core/flags/hiargs.rs, searcher/glue.rs, ignore/dir.rs) account for ~15,500 LOC, but the four declarative-exempt files alone are ~12,000 of that and don't count against fit any more.
violationCounts drops to zero — the Atomic flags clear (two real splits + declarative exemption for the catalogs), Modeled-tests clears (inline-tests mode), Sorted clears (directory dialect), boundary clears (the two profile notes + the genuine layering already in place).
graphDepth stays at ~5 because the crate graph itself doesn't change. ripgrep is wider than tdd.md (7) and narrower than what a deeply-layered enterprise codebase would show — which is honest: a CLI tool with a Matcher trait at the bottom and a binary at the top is supposed to be ~5 layers deep, no more.
boundaryRatio reaches ~100% because the two borderline cases the audit flagged (serde_impl.rs, line_buffer.rs) get reclassified by the profile, not refactored. Reclassification is the correct response when the borderline-ness is semantic, not structural — the verifier and the spec agree that derives aren't boundary parsing and that the algorithmic hot-path of bytes-to-lines is algorithm, not boundary.
#What this sketch surfaces
Four observations:
1. The three v2.1 dialects, taken together, work. Each one individually is a small extension; together they cleanly absorb the way Rust differs from TypeScript/PHP without weakening the property each rule was protecting. Sibling-files-vs-inline is a surface choice about where the test attaches; the rule's intent (every behavioural unit has an attached test) is preserved either way. Filename-prefix-vs-crate-directory is a surface choice about which lex order expresses the layer hierarchy; the rule's intent (lex order matches dependency direction) is preserved either way. Behavioural-cap-vs-declarative-exemption is a surface choice about what kind of file the rule applies to; the rule's intent (working-set fit for the agent's context window) is preserved either way.
2. The work-cost of going from "exemplary Rust" to "v2-compliant Rust" is exceptionally small. One focused week to do the two splits, plus a TOML file. Compare: months for the WP plugin (god-class redesign), two weeks for dive (test writing). The cost scales inversely with how much architectural care the original author put into the codebase. BurntSushi did the layering work a decade ago; the rebuild is mostly just writing it down.
3. The Cargo ecosystem is a cheap source of v2 baseline data. Every Cargo workspace has the crate graph already enforced by the build tool. If the §5 metrics emitter were ported to Rust (a few hundred lines of cargo-metadata + AST walking), n could grow from 4 to 20+ in an afternoon by running the metrics across the popular Rust CLI tools (bat, fd, ripgrep, eza, dust, tokei, ...). That's the kind of cross-repo evidence §6 calls for before promoting any of these three v2.1 dialects to official.
4. The §5 metrics keep their separation from the §4 verdict cleanly. ripgrep today scores 3/7 strict, ~5/7 under dialects — three quite different verdicts depending on what version of the spec you read. The §5 metrics give the same underlying numbers in every case: graphDepth=5, boundaryRatio=95-100%, workingSetFit=60% (today) or ~80% (rebuilt). A reader who doesn't care about the §4 score can still measure the codebase against the same axes everyone else's codebase is measured against. That's the design point: compliance ≠ proof; the §5 deltas are what travel across spec versions and across repos.
Companion posts:
- The same rebuild, run by a fleet of AI agents in parallel — projecting this post's ~8-working-day serial estimate into ~8 wall-clock hours under SAMA-mechanical merge gates
- Today's
ripgrepaudit — where the 3/7-strict, 5/7-with-dialects score comes from, and the three findings this rebuild assumes get adopted - The
diverebuild — same exercise on a Go codebase, the directory-dialect's first appearance - The
diveprefix-scheme variant — what the dramatic file-rename refactor costs in Go (and would cost even more in Rust) - The WordPress audit + rebuild · rebuilt — same exercise on a 0/7 starting point
- The §5 metrics post — why the metrics matter more than the surface verdict
- The v2 spec — the rules being rebuilt against