# `ripgrep`, rebuilt under SAMA v2 — a thought experiment
[Today's `ripgrep` audit](/blog/2026-05/sama-v2-rust-project-ripgrep) walked the seven §4 checks against BurntSushi's workspace and concluded that the strict score is ~3/7 but, under three proposed v2.1 dialects, rises to ~5-6/7. The audit's headline finding was that *ripgrep is so close to v2-compliant that the work isn't on ripgrep — it's on v2*. This post is the parallel-architecture sketch the audit promised: what does the codebase look like *as a whole* when every v2.1 dialect is admitted, every borderline case resolved, every gap closed?
Same scope, same features, same user-facing behaviour, same idiomatic Rust — just enough decisions made deliberately to score 7/7 under v2.1 with the proposed dialects.
The sketch is even smaller than the [`dive` rebuild](/blog/2026-05/sama-v2-go-project-dive-rebuilt). The starting point is closer; the lift is days of focused work plus three lines in the spec.
## The three v2.1 dialects this sketch assumes
*(Update: all three dialects have since been drafted formally into [/sama/v2 §6.A v2.1 dialects](/sama/v2#6a-v21-dialects-provisional) as opt-in profile extensions, each with operational definitions, the property each preserves, and the falsifiable cross-repo experiment that would invalidate it. The bullets below are the original informal proposal.)*
The audit surfaced three places where v2.0 doesn't fit Rust. Each becomes a falsifiable, optionally-applied profile extension in the spirit of §6 evolution policy:
1. **[`layout = "directory"`](/sama/v2#61-directory-layout-dialect)** — Sorted-by-crate-directory rather than Sorted-by-filename-prefix. (Same dialect the [Go `dive` rebuild](/blog/2026-05/sama-v2-go-project-dive-rebuilt) proposes.) Cargo's workspace + `pub use` semantics + the absence of upward edges in the crate graph give the verifier everything it needs to enforce the same property the prefix-lex check enforces in TypeScript/PHP.
2. **[`tests = "inline"`](/sama/v2#62-inline-tests-dialect)** — Modeled-tests recognises `#[cfg(test)] mod tests { #[test] fn ... }` blocks inside source files instead of requiring a sibling `*_test.rs` file. Rust's convention is fundamentally inline; the v2.0 sibling-file rule was written assuming Jest/PHPUnit-style adjacent test files. The property the rule is *trying* to protect — "every behavioural source unit has a test attached" — is preserved; only the surface syntax of attachment changes.
3. **[`atomic_exemption = "declarative"`](/sama/v2#63-declarative-exemption-dialect)** — the Atomic-700 LOC cap applies to *behavioural* files; files whose body is overwhelmingly declarative (one `pub struct X` + one `impl Trait for X` per item, repeated for many items, with near-zero per-item cyclomatic complexity) are exempt. The verifier detects this heuristically: a file is "declarative" if it crosses the cap *and* its cyclomatic complexity per LOC drops below 0.05, *and* it consists predominantly of `impl X for Y` / `const FOO: T = ...` / `pub struct ...` items. The 7,779-line `defs.rs` flag catalog is the textbook case.
Each of these is the kind of extension §6 admits provisionally: the property the rule protects stays the same; only the surface that expresses it changes. If subsequent cross-repo §5 metrics show the extension picks up the same architectural drift the original rule did, §6 promotes it to official. If not, it's withdrawn. Today, this rebuild assumes all three.
## What stays exactly the same
The contract that does not move:
- `rg` still searches recursively, honors `.gitignore`, prints colors, supports JSON output, links file paths via OSC-8, runs `--vimgrep` mode, all of it. The 200+ CLI flags are unchanged in name, default, and effect.
- The crate dependency graph is unchanged — `matcher` at the bottom, `core` at the top, `regex/pcre2/globset/searcher/printer/ignore/cli` between. Every public API of every crate is unchanged; downstream consumers (`fd`, `helix`, `lsd`, the dozen-plus tools that depend on `ignore` or `globset` directly) don't break.
- The `Matcher` trait, the `Sink` trait, the `Searcher` configuration surface — all unchanged. They were already the right shape.
- Build, install, and packaging stay byte-identical from the user's perspective. `cargo install ripgrep` produces the same binary.
What changes is *the file layout inside two crates* (printer, ignore) plus the `sama.profile.toml` declaration. The behavior is invariant.
## The profile
```toml
sama_version = "2.1"
profile = "ripgrep"
layout = "directory" # Sorted via crate-graph direction, not filename prefix
tests = "inline" # Modeled-tests recognizes #[cfg(test)] mod tests blocks
atomic_exemption = "declarative" # files of dominantly-declarative content exempt from 700-LOC cap
# Layer 0 — Pure. The trait abstraction every other layer depends on.
# No I/O, no allocator-coupled state, no thread-local globals.
[layers.0]
crates = ["matcher"]
# Layer 1 — Core. Pure algorithms over Matcher inputs. No syscalls.
[layers.1]
sublayers = [
# Three matcher implementations — each a "Pure Core engine":
# text in, match positions out. Configurable but allocation-only.
{ name = "engine", crates = ["regex", "pcre2", "globset"] },
# Algorithmic consumers of any Matcher: byte-stream search, formatting.
{ name = "algorithm", crates = ["searcher", "printer"] },
]
# Layer 2 — Adapter. Where the program touches the outside world.
# Filesystem walks, terminal detection, child processes, env vars.
[layers.2]
crates = ["ignore", "cli"]
# Layer 3 — Entry. The binary. Composes 2 + 1 + 0; no business logic.
[layers.3]
crates = ["core", "grep"]
```
Eleven crates, ten lines of declaration, every file in the workspace mapped without ambiguity.
## The directory tree
The unchanged-from-today crates are listed without comment. The two crates with internal file changes get their before/after expanded:
```
ripgrep/
├── Cargo.toml # workspace manifest (unchanged)
├── sama.profile.toml # NEW — 18 lines, see above
│
├── crates/
│ │── ─── Layer 0 — Pure ───────────────────────────────────────────
│ ├── matcher/ # unchanged — Matcher trait
│ │ └── src/
│ │ ├── lib.rs # 1,379 LOC — exempt under atomic_exemption
│ │ │ # (predominantly trait + default impls,
│ │ │ # CC/LOC < 0.05 — declarative shape)
│ │ └── interpolate.rs # unchanged
│ │
│ │── ─── Layer 1, sublayer "engine" — Matcher implementations ────
│ ├── regex/ # unchanged
│ ├── pcre2/ # unchanged
│ ├── globset/ # unchanged file list
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── glob.rs # 1,686 LOC — exempt (predominantly
│ │ │ # one-Glob-per-construct definitions,
│ │ │ # table-of-cases shape)
│ │ ├── serde_impl.rs # serde derives now profile-allowed
│ │ │ # (boundary parsing is for user-input
│ │ │ # bytes, not type-driven derives)
│ │ ├── fnv.rs
│ │ └── pathutil.rs
│ │
│ │── ─── Layer 1, sublayer "algorithm" — search + format ─────────
│ ├── searcher/ # unchanged file list
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── line_buffer.rs # raw-byte slicing remains here —
│ │ │ # profile note: this is the core
│ │ │ # hot-path of the search algorithm,
│ │ │ # not boundary parsing of external input
│ │ ├── lines.rs
│ │ ├── macros.rs
│ │ ├── sink.rs
│ │ ├── testutil.rs
│ │ └── searcher/
│ │ ├── mod.rs
│ │ ├── core.rs
│ │ ├── glue.rs # 1,549 LOC — TODO: ~3 files
│ │ │ # (today the lift's deferred — see
│ │ │ # §What concretely changes below)
│ │ └── mmap.rs
│ │
│ ├── printer/ # ↓ ONE internal split inside this crate
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── color.rs
│ │ ├── counter.rs
│ │ ├── hyperlink/
│ │ │ ├── mod.rs
│ │ │ └── aliases.rs
│ │ ├── json.rs # ~1,000 LOC — under cap, unchanged
│ │ ├── jsont.rs
│ │ ├── path.rs
│ │ ├── macros.rs
│ │ ├── util.rs
│ │ ├── stats.rs
│ │ ├── summary.rs
│ │ │
│ │ │── standard/ # ← NEW submodule (split from standard.rs)
│ │ │ ├── mod.rs # printer entry, shared types,
│ │ │ │ # the StandardBuilder/Standard structs
│ │ │ │ # (~600 LOC)
│ │ │ ├── normal.rs # default line-by-line mode (~900 LOC)
│ │ │ ├── vimgrep.rs # --vimgrep mode (~700 LOC)
│ │ │ ├── multiline.rs # multi-line match formatting (~600 LOC)
│ │ │ ├── context.rs # before/after context lines (~600 LOC)
│ │ │ └── color.rs # color-only rendering hooks (~600 LOC)
│ │ │ # ─────── total: 4,000 LOC across 6 files,
│ │ │ # all under the 700-LOC cap on their own
│ │ │ # behavioural budget. The split aligns
│ │ │ # with output modes the CLI flags already
│ │ │ # name explicitly.
│ │ └── #[cfg(test)] mod tests inside each split file
│ │
│ │── ─── Layer 2 — Adapter ───────────────────────────────────────
│ ├── ignore/ # ↓ ONE internal split inside this crate
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── default_types.rs # 1,400 LOC — exempt under atomic_exemption
│ │ │ # (a catalog: one "file-type definition"
│ │ │ # per language/format/tool — ~200 entries,
│ │ │ # each ~7 LOC, near-zero CC per entry)
│ │ ├── dir.rs # 1,305 LOC — TODO: ~2 files (today deferred)
│ │ ├── gitignore.rs
│ │ ├── overrides.rs
│ │ ├── pathutil.rs
│ │ ├── types.rs
│ │ │
│ │ │── walk/ # ← NEW submodule (split from walk.rs)
│ │ │ ├── mod.rs # public Walk{,Parallel,Builder,State}
│ │ │ │ # re-exports + the type definitions
│ │ │ │ # (~500 LOC)
│ │ │ ├── builder.rs # WalkBuilder + WalkParallelBuilder
│ │ │ │ # configuration (~600 LOC)
│ │ │ ├── sequential.rs # the single-threaded walker (~600 LOC)
│ │ │ └── parallel.rs # the work-stealing parallel walker,
│ │ │ # channel management, thread pool
│ │ │ # (~800 LOC — at the cap, but this IS
│ │ │ # genuine behavioural complexity that
│ │ │ # should not split further)
│ │ │ # ─────── total: 2,500 LOC across 4 files
│ │ └── #[cfg(test)] mod tests inside each split file
│ │
│ ├── cli/ # unchanged
│ │ └── src/ # all files under cap, no internal changes
│ │
│ │── ─── Layer 3 — Entry ─────────────────────────────────────────
│ ├── core/ # unchanged file list
│ │ ├── main.rs
│ │ ├── search.rs
│ │ ├── haystack.rs
│ │ ├── logger.rs
│ │ ├── messages.rs
│ │ └── flags/
│ │ ├── mod.rs
│ │ ├── parse.rs
│ │ ├── config.rs
│ │ ├── lowargs.rs
│ │ ├── hiargs.rs # 1,480 LOC — at the cap; behavioural;
│ │ │ # candidate for v2.2 to revisit
│ │ ├── defs.rs # 7,779 LOC — exempt under atomic_exemption
│ │ │ # (the textbook case: ~150 Flag impls,
│ │ │ # ~30-50 LOC each, CC/LOC ≈ 0.01)
│ │ ├── complete/ # bash/zsh/fish/PowerShell completion templates
│ │ └── doc/ # man-page + README generation
│ │
│ └── grep/ # unchanged (meta-crate)
│ └── src/lib.rs # 90 LOC, re-exports — unchanged
│
└── tests/ # 15 integration tests, unchanged
```
Two real structural changes hide behind that tree: `printer/src/standard.rs` (3,987 LOC) splits into a six-file submodule per output mode, and `ignore/src/walk.rs` (2,494 LOC) splits into a four-file submodule per walker concern (config, sequential, parallel, types). Everything else is profile declaration plus a few exemption flags on declarative-catalog files.
## Layer 0 — Pure (unchanged)
`matcher::Matcher` is the trait abstraction every other layer depends on. The trait itself + the default-impl helpers happen to total 1,379 LOC, but the file's content is overwhelmingly trait-default-implementations — the kind of body that `atomic_exemption = "declarative"` is designed for:
```rust
// crates/matcher/src/lib.rs — Layer 0, exempt-declarative
pub trait Matcher {
type Captures: Captures;
type Error: fmt::Display;
fn find_at(&self, haystack: &[u8], at: usize)
-> Result