syntaxai/tdd.md · main · content / blog / sama-v2-rust-project-ripgrep.md

sama-v2-rust-project-ripgrep.md 235 lines · 20205 bytes raw · source

Pointing SAMA v2 at `ripgrep`: BurntSushi's exemplar surfaces three findings about the spec

The WordPress plugin audit scored 0/7 because the plugin was written under no architectural discipline at all. The Go dive audit scored ~5/7 because Go's standard layout enforces a lot of v2's rules for free. Both audits taught us something about v2 by what they failed against.

Today's question: what does v2 see when pointed at code that is, by reputation, exemplary?

For the test, BurntSushi/ripgrep — 64k stars, ten years of refinement, ten crates, 45,927 lines of Rust. Andrew Gallant's code is widely studied and the workspace's architectural choices have been imitated across the Rust ecosystem. If v2 has problems, ripgrep is where they'll surface — because the obvious failures (god-classes, scattered I/O, untyped data) just aren't there.

Three findings did surface. All three are about v2, not about ripgrep.

What's in the box

A clean Cargo workspace:

ripgrep/
├── Cargo.toml             # workspace manifest
├── crates/
│   ├── core/        13,128 LOC, 20 files   # the binary itself
│   ├── printer/      9,320 LOC, 13 files   # output formatting (color, JSON, hyperlinks)
│   ├── ignore/       6,639 LOC,  9 files   # gitignore + directory walking
│   ├── searcher/     6,511 LOC, 11 files   # the search loop, line buffering
│   ├── globset/      3,229 LOC,  6 files   # high-performance glob matching
│   ├── regex/        2,782 LOC,  9 files   # Rust regex adapter
│   ├── cli/          1,866 LOC,  8 files   # terminal detection, escapes, stdout helpers
│   ├── matcher/      1,710 LOC,  2 files   # the central `Matcher` trait
│   ├── pcre2/          569 LOC,  3 files   # alternative PCRE2 backend
│   └── grep/            90 LOC,  2 files   # meta-crate (re-exports)
└── tests/                                  # integration tests

Tests: 38 source files contain #[test] blocks inline (the Rust convention — tests live in the same module they test, behind a #[cfg(test)] gate). There are also 15 separate integration test files under tests/. By Rust standards this is well-tested.

The crate dependency graph is strictly acyclic and roughly:

matcher  (the trait)
   ↑
   ├── regex,  pcre2,  globset    (matcher implementations)
   ↑
   ├── searcher                    (uses any Matcher)
   ↑
   ├── printer                     (formats searcher results)
   ↑
   └── core                        (the binary; uses ignore + cli + printer + searcher)

That graph reads exactly like a SAMA v2 layer chart: matcher is the Pure abstraction at the bottom, core is the Entry at the top, everything between is layered by dependency direction. BurntSushi did the layering work; he just didn't call it that.

§4 conformance — what the verifier would report

Walking the seven checks against this workspace:

#1 Sorted — would fail under v2.0

Same finding as the Go dive audit: Rust organizes by crate + module directory, not by filename prefix. Files inside crates/searcher/src/ are glue.rs, mod.rs, searcher.rs, etc. — descriptive, not layer-marking. The v2.0 lex-sort-the-filename-prefixes rule does not translate.

Under the hypothetical v2.1 directory-based dialect (the dive rebuild post proposes it formally), Sorted becomes "crate directories declared in layer order in the profile; no import edge violates that order." ripgrep would pass cleanly because the crate dependency graph already runs that direction.

#2 Architecture — would pass under the directory dialect

A natural profile maps every file unambiguously:

sama_version = "2.0"
profile = "ripgrep"
layout = "directory"

[layers.0]   # the trait abstraction every other crate depends on
crates = ["matcher"]

[layers.1]   # pure algorithms, no I/O
sublayers = [
  { name = "engine",    crates = ["regex", "pcre2", "globset"] },
  { name = "algorithm", crates = ["searcher", "printer"] },
]

[layers.2]   # the filesystem-touching adapter
crates = ["ignore", "cli"]

[layers.3]   # the binary entry
crates = ["core", "grep"]

This passes Architecture. Every file is mapped, no ambiguity.

#3 Modeled (tests) — the second spec-evolution finding

The verifier looks for sibling test files (foo.ts + foo.test.ts, or for PHP foo.php + foo.test.php). Rust's convention is fundamentally different: tests live in the same file as the code they test, gated by #[cfg(test)] and #[test]. ripgrep has 38 source files that contain inline #[test] blocks. By Rust standards they're tested. By v2.0 sibling-file standards they're not.

This is a real spec gap, parallel to Sorted. v2.1 needs an inline-tests mode where the verifier checks for #[test] annotations inside the source file rather than for a sibling test file. Under that mode, ripgrep's 38 inline-tested files would count as tested; the ones without #[test] blocks would still be flagged.

Without that mode, ripgrep "fails" Modeled-tests not because it's untested but because v2.0 doesn't recognize how Rust tests.

#4 Modeled (boundary) — would mostly pass

Boundary patterns (std::fs::read*, std::env::var, serde_json::from_str, std::process::*, raw byte parsing from stdin):

crates/ignore/src/walk.rs + crates/ignore/src/dir.rs + crates/ignore/src/gitignore.rs — filesystem reads, gitignore parsing. Layer 2 (ignore is mapped to L2). ✓
crates/core/main.rs + crates/core/messages.rs + crates/core/flags/config.rs — env::var, stderr writes, config file reads. Layer 3 (core is L3). ✓ (Layer 3 may use Layer 2's facilities, but should it parse external input directly?)
crates/globset/src/serde_impl.rs — serde_json deserialization for glob patterns. Layer 1 (globset is L1 engine). ✗ — this is a boundary call in Layer 1.
crates/searcher/src/line_buffer.rs — raw byte slicing and decoding. Layer 1 algorithm. Borderline: it's parsing bytes-to-lines, which is a kind of boundary work, but it's also the core search algorithm's hot loop.

Two borderline cases (globset/serde_impl.rs and searcher/line_buffer.rs), neither egregious. Under a strict reading: fail; under a profile that explicitly declares "serde derives are not boundary parsing in the §4.4 sense": pass. Either way, closer to 95% than to 100%.

#5 Atomic — fails hard, and surfaces the third finding

The 700-LOC cap is violated by 19 files. The top:

file	LOC	what's in it
`crates/core/flags/defs.rs`	7,779	the catalog of every CLI flag, one `impl Flag` per flag struct
`crates/printer/src/standard.rs`	3,987	the default output formatter (color, line-by-line, multi-line, --vimgrep mode, etc.)
`crates/ignore/src/walk.rs`	2,494	the parallel filesystem walker
`crates/globset/src/glob.rs`	1,686	glob → regex translation
`crates/searcher/src/searcher/glue.rs`	1,549	the search loop assembly
`crates/core/flags/hiargs.rs`	1,480	high-level argument struct
`crates/matcher/src/lib.rs`	1,379	the `Matcher` trait definition + helpers
`crates/ignore/src/dir.rs`	1,305	gitignore directory state
...	...	ten more between 700 and 1,200

That's a lot. And ripgrep is by reputation careful code. Are these god-classes that should be split?

Looking at the largest:

crates/core/flags/defs.rs (7,779 LOC) is — quoting its own docstring — "Defines all of the flags available in ripgrep. Each flag corresponds to a unit struct with a corresponding implementation of Flag." It's a long-form catalog: ~150 flag definitions, each ~30-50 lines, each a small struct + a small impl. The file has near-zero cyclomatic complexity per line. It's a data table written in Rust syntax.

Splitting it into 19 files of 400 lines each would scatter the flag definitions across many files when the natural reading order is "all flags in one place, in display order, with the deprecated ones at the end." The current single-file layout is the right shape for the content. Atomic-700 was designed to catch behavioral god-classes, not declaration catalogs.

This is a real v2 spec-evolution finding. Atomic's 700-LOC cap should have an exemption — or a separate, higher cap — for files whose content is overwhelmingly declarative (data structures, const tables, enum variants, trait implementations with trivial bodies). The verifier could detect this heuristically: a file is "declarative" if its cyclomatic complexity per LOC drops below some threshold, or if its body is mostly impl X for Y / const FOO: T = ... / pub struct ....

crates/printer/src/standard.rs (3,987 LOC) is the opposite case: real behavioral complexity. That one would benefit from splitting per output mode. Same for crates/ignore/src/walk.rs (2,494 LOC) — the parallel filesystem walker is genuinely doing a lot. Those two are honest Atomic failures.

So of the 19 over-cap files, roughly: two or three are catalog files that the spec should learn to recognize, and the rest are real-but-defensible behavioral complexity. The current binary verdict ("19 violations, fail") doesn't capture that nuance. v2.1 needs Atomic-with-categories.

#6 The Law (§1.2) — would pass

Cargo enforces the absence of cyclic crate dependencies — the workspace literally won't build if searcher depends on core. The proposed layer mapping above respects every direction the build already enforces. PASS.

#7 Consistency — would pass

Derives from Law on the same edge set.

Tally: 3 of 7 strict-pass (Architecture, Law, Consistency). With the proposed v2.1 dialects (directory mode, inline-tests mode, declarative-Atomic exemption), the score rises to 5-6 of 7. Without them, ripgrep "fails" v2 mostly because v2 doesn't yet understand Rust.

(Update: all three dialects have since been drafted into /sama/v2 §6.A as v2.1-draft extensions, with the same five-part operational structure — what they relax, what property they preserve, and the falsifiable cross-repo experiment that would invalidate each.)

§5 metrics — measured workingSetFit, estimated the rest

metric	ripgrep	dive (Go)	tdd.md (TS, measured)	WP plugin (PHP)
§4 checks passing	~3/7 strict, ~5/7 under v2.1 dialects (estimated)	~5/7 (estimated)	7/7 ✓	0/7
graphDepth	5 (measured, ripgrep@4519153e) — originally estimated ~5, confirmed exactly	12 (measured, dive@d6c69194) — originally estimated ~5	7	~3
boundaryRatio	~95% (estimated)	~85% (estimated)	100%	<10%
workingSetFit (50–500 LOC)	54.00% (measured, ripgrep@4519153e) — originally estimated ~60%	52.17% (measured, dive@d6c69194) — originally estimated ~80%	80%	~47%
violationCounts (sum)	~50 estimated (Atomic + Modeled-tests under sibling-rule)	~30 (estimated)	0	17+

ripgrep's workingSetFit measures 54.00% (from the polyglot §5 emitter at scripts/measure-working-set.ts, inclusive bounds [50, 500] LOC). The distribution: 100 .rs files total, 16 under 50 LOC, 54 in band, 30 over 500 LOC — appreciably more than the "19 big files" I eyeballed in the original audit. The over-cap list ranges from the textbook declarative-exempt catalog (crates/core/flags/defs.rs at 7,780 LOC) down to genuinely borderline files at 500–800 LOC like crates/pcre2/src/matcher.rs (506) and crates/cli/src/decompress.rs (533).

And yet most of those files are appropriate to their content. workingSetFit by itself doesn't say which side of the line each file falls on — that's what the declarative-exemption dialect is for. The metric surfaces the property; the policy decides what to do with it.

The cross-repo comparison the measurement makes possible is more interesting than the single number. ripgrep (54%) and dive (52%) measure within two percentage points of each other — two unrelated codebases in two different languages, written by different teams under different conventions, landing in the same working-set band when measured against the same bounds. That's the kind of cross-repo signal §6 says it wants. The eyeballed estimates (~60% and ~80%) said the two projects were 20 points apart; the measurement says they're 2 points apart. The metric, not the eye, was right.

This is exactly the §5 intent. The metric surfaces a property; whether that property is good or bad depends on what the file content should be. Compliance scores conflate the two; metrics keep them separate.

graphDepth, measured: 5 (originally estimated ~5 — confirmed exactly)

The polyglot graphDepth emitter at scripts/measure-graph-depth.ts reads ripgrep's root Cargo.toml, identifies workspace members + the root crate, parses each member's [dependencies] section (production deps only — [dev-dependencies] excluded from the runtime DAG), filters to workspace-internal deps (path = "../foo" or workspace = true cross-referenced against [workspace.dependencies]), and computes the longest crate-level chain. The result for ripgrep@4519153e: 10 workspace crates, 15 internal edges, longest dependency chain of depth 5.

Hand-trace (auditable per /sama/v2 §0). The 10 workspace crates and their internal edges, extracted from crates/*/Cargo.toml:

crate	internal deps
`ripgrep` (root, binary `rg`)	`grep`, `ignore`
`grep` (meta-crate)	`grep-cli`, `grep-matcher`, `grep-pcre2`, `grep-printer`, `grep-regex`, `grep-searcher`
`grep-cli`	`globset`
`grep-matcher`	(none — pure trait crate, the abstraction at the bottom)
`grep-pcre2`	`grep-matcher`
`grep-regex`	`grep-matcher`
`grep-searcher`	`grep-matcher`
`grep-printer`	`grep-matcher`, `grep-searcher`
`ignore`	`globset`
`globset`	(none — leaf crate)

15 edges total (count: 2 + 6 + 1 + 0 + 1 + 1 + 1 + 2 + 1 + 0 = 15 ✓).

The longest path: ripgrep → grep → grep-printer → grep-searcher → grep-matcher — five crates, depth 5. Multiple paths reach depth 5 (e.g. ripgrep → grep → grep-pcre2 → grep-matcher is only depth 4; ripgrep → grep → grep-searcher → grep-matcher is depth 4; the printer-via-searcher chain is what wins). The audit's original estimate "(matcher → engine → searcher → printer → core)" turns out to describe the same chain reading bottom-up: matcher ← searcher ← printer ← grep ← ripgrep. Same five nodes, same depth, confirmed by measurement.

Module-granularity note: the polyglot graphDepth metric counts at the Rust crate level — each Cargo workspace member is one node. This is the natural Rust analog to the TS file-level metric (TS one module ≈ one file; Rust one module ≈ one crate). Semantic documented in src/b32_graph_depth_polyglot.ts.

The contrast with dive's measured depth 12 is itself interesting: ripgrep's crate-level graph is flatter than dive's package-directory graph, even though both are mature CLI codebases. Some of that is genuine — ripgrep's workspace is 10 crates organized as a clean DAG; dive's 27 package directories include many subdirectory hops that drive the chain longer. Some is granularity: a Rust crate often contains what a Go developer would split into multiple package directories. The two depths aren't directly comparable for "which codebase is deeper"; they ARE directly comparable as "graphDepth at each language's natural module unit," which is the spec's intent.

What a rebuilt ripgrep would look like — the small version

For the full parallel-architecture sketch — every layer, every file move, predicted §5 metrics, the rebuilt sama.profile.toml, and concrete Rust code samples for the two file splits — see the companion post: ripgrep, rebuilt under SAMA v2.

The audit makes the rebuild sketch short, because BurntSushi's crate split already maps to v2 layers under the directory dialect. The lift to make it pass under v2.1 with the proposed dialects:

Add sama.profile.toml declaring the layer mapping (see profile above). 50 lines, zero code change.
Move two boundary leaks — globset/src/serde_impl.rs is fine if serde derives are exempted; the searcher/line_buffer.rs byte parsing is fundamentally part of the search algorithm, not a boundary. Likely a profile note, not a code change.
Split the two genuine god-files — crates/printer/src/standard.rs (3,987 LOC) splits per output mode (standard, vimgrep, multi-line, color-only) into 4 files of ~1,000 LOC each. crates/ignore/src/walk.rs (2,494 LOC) splits walker-config / walker-loop / walker-results into 3 files of ~800 LOC each. ~2 weeks of focused work.
The catalog files stay. defs.rs (7,779 LOC) is correct shape; the spec needs the exemption, not the file.

That's it. ripgrep is so close to v2-compliant that the work isn't on ripgrep — it's on v2.

Three findings, restated

Sorted needs a directory-based dialect. Already surfaced by the Go audit; ripgrep confirms.
Modeled-tests needs an inline-tests mode that recognizes #[test] annotations inside the source file rather than requiring a sibling file. Rust's convention is fundamentally not sibling-based.
Atomic-700 needs a declarative-file exemption. The 7,779-line defs.rs is the textbook case: a flag-definition catalog that's structurally correct as one file. The spec was written with behavioral complexity in mind; it doesn't yet distinguish "long file because complex" from "long file because catalog."

All three are §6 evolution-policy moves: falsifiable extensions admitted provisionally, measured against §5 metrics across multiple repos. ripgrep is one of those repos.

Four datapoints on the same axes

project	language	§4 score	workingSetFit	boundaryRatio
tdd.md	TypeScript	7/7 ✓ (measured)	80%	100%
wagoodman/dive	Go	~5/7 (estimated)	~80%	~85%
BurntSushi/ripgrep	Rust	~3-5/7 (estimated)	~60%	~95%
Open Graph plugin	PHP/WordPress	0/7 (estimated)	~47%	<10%

n=4, three of them hand-estimated, still far from a "v2 is worth following" claim. But the pattern is clearer now: the strongly-typed compiled-language projects (Go, Rust) cluster near the dogfood; the WordPress codebase is the outlier on every axis. Whether that's "the language enforces architecture for free" or "people who choose Go/Rust care more about architecture" is the experiment §6 hasn't run yet.

See for yourself:

The project: https://github.com/BurntSushi/ripgrep
The full ripgrep rebuild (companion to this audit): ripgrep, rebuilt under SAMA v2
The Go audit (companion): Pointing SAMA v2 at dive
The WP audit + rebuild: WordPress plugin audit · rebuilt
The §5 metrics emitter: Compliance proves the rules followed. Delta proves they were worth following.
The spec being audited against: /sama/v2

Pointing SAMA v2 at ripgrep: BurntSushi's exemplar surfaces three findings about the spec