syntaxai/tdd.md · commit b182ea5

Blog: ripgrep refactor projected as parallel agent-fleet under SAMA mechanical merge gates

Companion to the ripgrep audit + rebuild pair. The rebuild estimated
~8 working days for a careful human; this post decomposes the same
refactor into 8 SAMA-aligned work packages (4 profile-only + 4
code-split), projects parallel execution by a multi-model fleet across
24-hour wall-clock coverage, and lands on ~8 wall-clock hours — ~10x
compression. Load-bearing argument: it's not 'agents are fast' (they
have been for years); it's that SAMA v2 turns every architectural rule
into a mechanical merge gate, which is the property that has
historically broken multi-agent parallel refactor attempts. Sketches
the §6 experiment that would convert the projection into a measurement
and cross-links from the rebuilt-sketch post.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

author: syntaxai <[email protected]>
date: 2026-05-24 10:09:13 +01:00
parent: b1c644b
commit: b182ea556565857e23be3b2c9beab0326e78882b

3 files changed · +209 −0

added content/blog/sama-v2-rust-project-ripgrep-parallel-fleet.md +200 −0

@@ -0,0 +1,200 @@
	1	+# The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection
	2	+
	3	+[Yesterday's `ripgrep` rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) estimated ~1 focused working week for one careful human to land the changes: write the `sama.profile.toml`, split `printer/standard.rs` into a six-file submodule, split `ignore/walk.rs` into a four-file submodule. Plus a few more days for the deferred splits. Eight working days total, end to end.
	4	+
	5	+That's the serial-human estimate. This post is the companion question Bas asked: what does the same refactor look like if it's executed by a fleet of AI agents running in parallel, spread across the planet, under strict SAMA v2 management?
	6	+
	7	+The honest answer: the wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, and the load-bearing reason isn't "AI is fast" — it's that SAMA v2 turns each work-package boundary into a mergeable boundary, which is the part that has historically broken every "let multiple agents refactor in parallel" experiment.
	8	+
	9	+This post is a projection, not a measurement. The number it lands on is conservative, the reasoning is the part to read carefully.
	10	+
	11	+## Why parallel agent refactor has been a non-starter — until now
	12	+
	13	+Pre-SAMA, the cap on multi-agent parallelism isn't agent speed. It's integration. Two agents asked to refactor the same module produce two divergent designs; even when each is internally consistent, merging them is a hand-rolled meta-task that exceeds the cost of having one agent do the work serially. So the field has mostly run one agent at a time, or sharded work along brittle file boundaries that agents end up violating anyway.
	14	+
	15	+The three things that actually break parallel agent work:
	16	+
	17	+1. Scope creep. Agent A asked to "split `walk.rs`" decides the parallel walker also needs a refactor of `dir.rs` and the gitignore parser, because they're "obviously coupled." Now agent C (assigned to `dir.rs`) has a merge conflict and zero context on agent A's design.
	18	+2. Style drift. Two agents producing 600-line files in the same crate format types differently, name variables differently, structure imports differently, and either the maintainer hand-polishes both back to one style (defeats the parallelism) or accepts visible inconsistency in the diff (defeats the maintainability that motivated the refactor).
	19	+3. No mechanical merge gate. When agent B's PR lands, something has to check that B's design still composes with A's design — and absent a verifier, that check is taste, which is exactly the resource you ran out of when you decided to spawn a fleet.
	20	+
	21	+SAMA v2 dissolves all three, in the same way and for the same reason: every architectural rule is also a merge gate. Agents don't have to agree on style as long as they each pass the verifier. The verifier is the same TypeScript code for every agent, mechanically applied, returning a binary verdict.
	22	+
	23	+## The work-package decomposition of the ripgrep rebuild
	24	+
	25	+The previous post listed the mandatory + deferred changes as a single human-serial table. The same table, re-cut as eight parallel work packages, each scoped to a single branch and a single mergeable verifier-passing PR:
	26	+
	27	+\| WP# \| scope \| files touched \| LOC delta \| parallel-safe? \|
	28	+\|---\|---\|---\|---\|---\|
	29	+\| WP-1 \| Write `sama.profile.toml` declaring layout=directory + tests=inline + atomic_exemption=declarative \| `sama.profile.toml` (new) \| +18 \| yes — no other WP reads this until merge \|
	30	+\| WP-2 \| Add profile note: serde derives ≠ boundary parsing \| `sama.profile.toml` (1 line) \| +1 \| yes, but WP-1 must merge first \|
	31	+\| WP-3 \| Add profile note: `searcher/line_buffer.rs` byte parsing is algorithm \| `sama.profile.toml` (1 line) \| +1 \| yes, but WP-1 must merge first \|
	32	+\| WP-4 \| Mark four declarative-exempt files \| `sama.profile.toml` (4 lines) \| +4 \| yes, but WP-1 must merge first \|
	33	+\| WP-5 \| Split `crates/printer/src/standard.rs` (3,987 LOC) → 6-file submodule \| only `crates/printer/src/standard*` \| ±3,987 \| yes — no other WP touches printer/ \|
	34	+\| WP-6 \| Split `crates/ignore/src/walk.rs` (2,494 LOC) → 4-file submodule \| only `crates/ignore/src/walk*` \| ±2,494 \| yes — no other WP touches ignore/walk \|
	35	+\| WP-7 \| Split `crates/searcher/src/searcher/glue.rs` (1,549 LOC) → 3-file submodule \| only `crates/searcher/src/searcher/glue*` \| ±1,549 \| yes — no other WP touches that path \|
	36	+\| WP-8 \| Split `crates/ignore/src/dir.rs` (1,305 LOC) → 2-file submodule \| only `crates/ignore/src/dir` \| ±1,305 \| yes — does not overlap WP-6's `walk` files \|
	37	+
	38	+Every work package is scoped to a path prefix that no other work package touches. That is not coincidence — it is the property the `Atomic` rule and the directory-layered crate graph give you for free. Once a refactor decomposes along verifier-recognized boundaries, the decomposition is also a non-overlapping merge plan.
	39	+
	40	+The four profile-only WPs (1–4) serialize behind WP-1 because they all edit the same one-file. But each is a one-line patch; together they merge in under a minute. The four code-split WPs (5–8) parallelize fully — they touch four disjoint directory subtrees.
	41	+
	42	+## The fleet manifest
	43	+
	44	+A hypothetical SAMA-aware orchestrator (let's call it `sama-fleet`) reads the profile, the audit's findings, and the rebuild sketch, then emits a manifest:
	45	+
	46	+```yaml
	47	+# sama-fleet.manifest.yaml — generated from the audit + rebuild sketch
	48	+target: BurntSushi/ripgrep@main
	49	+spec_version: sama_v2.1
	50	+deadline_hours: 8
	51	+
	52	+work_packages:
	53	+ - id: WP-1
	54	+ title: "Write sama.profile.toml"
	55	+ base_branch: main
	56	+ output_branch: sama-WP-1-profile
	57	+ scope_paths: ["sama.profile.toml"]
	58	+ estimated_minutes: 30
	59	+ merge_gate: [cargo-check, sama-v2-verify]
	60	+ blocks: [WP-2, WP-3, WP-4]
	61	+
	62	+ - id: WP-5
	63	+ title: "Split crates/printer/src/standard.rs into a six-file submodule"
	64	+ base_branch: main
	65	+ output_branch: sama-WP-5-printer-split
	66	+ scope_paths: ["crates/printer/src/standard*"]
	67	+ forbidden_paths: ["crates/printer/src/!(standard)", "crates/!(printer)/*"]
	68	+ estimated_minutes: 360
	69	+ merge_gate: [cargo-check, cargo-test --package printer, sama-v2-verify]
	70	+ invariants:
	71	+ - "every pub item in standard.rs before == pub-reachable from same path after"
	72	+ - "printer crate test suite still passes"
	73	+ - "atomic cap: every produced .rs file <= 700 LOC OR declarative-exempt"
	74	+
	75	+ - id: WP-6
	76	+ title: "Split crates/ignore/src/walk.rs into a four-file submodule"
	77	+ base_branch: main
	78	+ output_branch: sama-WP-6-walk-split
	79	+ scope_paths: ["crates/ignore/src/walk*"]
	80	+ forbidden_paths: ["crates/ignore/src/!(walk)", "crates/!(ignore)/*"]
	81	+ estimated_minutes: 240
	82	+ merge_gate: [cargo-check, cargo-test --package ignore, sama-v2-verify]
	83	+
	84	+ # ... WP-2 .. WP-4, WP-7, WP-8 follow the same shape ...
	85	+
	86	+merge_policy:
	87	+ serial_chain: [WP-1, WP-2, WP-3, WP-4] # profile patches, trivial
	88	+ parallel: [WP-5, WP-6, WP-7, WP-8] # code-split branches
	89	+ final_gate: full-sama-v2-verify # 7 / 7 on the unified main
	90	+```
	91	+
	92	+`scope_paths` and `forbidden_paths` are the load-bearing fields. The orchestrator hands each agent a workspace shadow with the `forbidden_paths` made read-only at the filesystem level. An agent assigned WP-5 physically cannot edit `crates/ignore/` even if its model decides midway through that "the printer split is best done by also refactoring the ignore walker." This is the kinetic version of the §1.2 Law — "agents cannot reach across boundaries" — enforced before any verifier ever runs.
	93	+
	94	+## The fleet (a plausible composition)
	95	+
	96	+The point of describing the fleet across the planet isn't location; it's model diversity. SAMA's verifier doesn't care what model produced the diff, only whether the diff conforms. So the fleet can be deliberately heterogeneous:
	97	+
	98	+\| agent \| model \| region (24h coverage) \| assignment \|
	99	+\|---\|---\|---\|---\|
	100	+\| α \| Claude Opus 4.7 \| US-West (San Francisco) \| WP-5 (printer split — largest, most complex) \|
	101	+\| β \| Claude Sonnet 4.6 \| Asia (Tokyo) \| WP-6 (ignore/walk split — second largest) \|
	102	+\| γ \| Claude Haiku 4.5 \| Asia (Bangalore) \| WP-7 (searcher/glue split — medium) \|
	103	+\| δ \| Claude Sonnet 4.6 \| Europe (Berlin) \| WP-8 (ignore/dir split — smallest code WP) \|
	104	+\| ε \| Claude Haiku 4.5 \| South America (São Paulo) \| WP-1 → WP-2 → WP-3 → WP-4 (profile + notes, serial chain) \|
	105	+\| ω \| Claude Opus 4.7 \| (orchestrator) \| dispatches, monitors verifier output, merges in order, escalates ambiguity \|
	106	+
	107	+Two reasons this composition matters under SAMA v2:
	108	+
	109	+Capability-matched task assignment. Splitting a 3,987-LOC printer file with five output modes is genuinely the hardest WP — give it to the strongest model. Inserting four exemption flags into a TOML is the easiest — give it to the smallest model that can reliably edit YAML. Mixed-capability fleets only work when the merge gate is purely mechanical, because otherwise a maintainer would have to hand-review the small-model patches anyway. SAMA's verifier is purely mechanical, so the mix works.
	110	+
	111	+24-hour wall-clock coverage. Even if some agents take longer than projected, work proceeds around the clock. The orchestrator never sleeps; agents in different regions step in as one finishes. Compared to a single careful human's working-week budget — even a fast human only works ~8 hours/day — the fleet runs ~24 productive hours/day from minute zero.
	112	+
	113	+## The wall-clock timeline
	114	+
	115	+Same eight work packages, plotted against an 8-hour wall clock. Times in (orchestrator-local) hours from start:
	116	+
	117	+```
	118	+time α (printer) β (walk) γ (glue) δ (dir) ε (profile chain)
	119	+──── ───────────── ─────────── ────────── ───────── ──────────────────
	120	+T+0h read sketch read sketch read sketch read sketch WP-1 begin
	121	+T+0h scope-fenced workspaces handed out by orchestrator ω
	122	+T+0.5h plan + tree plan + tree plan + tree plan + tree WP-1 done → CI
	123	+T+1h code code code code WP-2 begin → CI
	124	+T+1.5h " " " test WP-3 begin → CI
	125	+T+2h " " test verify ✓ WP-4 begin → CI
	126	+T+2.5h " test verify ✓ merge ω all profile WPs merged
	127	+T+3h " verify ✓ merge ω
	128	+T+3.5h " merge ω ─
	129	+T+4h test ─
	130	+T+5h verify ✓
	131	+T+5.5h merge ω
	132	+T+6h ─── final-gate verify on unified main ── 7/7 ✓ ──
	133	+T+6–8h integration tests (cargo test --workspace), perf benchmarks,
	134	+ smoke tests against real corpora, ω signs off → PR ready for human
	135	+```
	136	+
	137	+Total wall clock: ~8 hours from "orchestrator reads the sketch" to "single PR ready for human-final-approval."
	138	+
	139	+Compared to the rebuild sketch's serial-human estimate of ~8 working days, that's roughly a 10× wall-clock compression. Not a record because no benchmark exists yet to break; the framing the audit + rebuild + this post together stake out is that the benchmark can now be defined at all. v2 + the verifier + the mechanical-merge-gate is the missing primitive that lets "speed of parallel agent refactor" be measured as a property of the codebase rather than as folklore about the agents.
	140	+
	141	+## Why this is a SAMA-enabled property specifically
	142	+
	143	+A claim worth being careful about: this projection isn't about AI agents being fast. Agents have been "fast" for years. The cap was always integration. The four properties SAMA v2 enforces that make parallel decomposition mergeable:
	144	+
	145	+1. Atomic (700-LOC cap) → work-package scope is bounded. WP-5's printer-split fits in a single agent's working context. So does WP-6's walk-split. So does each output file produced. The agent does not need to see "the whole printer crate" to do its job; it needs `standard.rs` plus the public surface of the rest of the crate.
	146	+
	147	+2. Architecture (layer mapping) → work-package boundaries align with merge boundaries. WP-5 lives entirely in Layer 1 sublayer "algorithm." WP-6 lives entirely in Layer 2. They literally cannot touch each other's layer files because the profile says so and the verifier rejects PRs that do.
	148	+
	149	+3. Sorted (under v2.1's directory dialect) → the dependency direction is publicly readable. The orchestrator can compute "WP-6 depends on Layer 0 + Layer 1 results" by reading the profile and the crate graph; it does not have to ask an agent or guess.
	150	+
	151	+4. Modeled (sibling tests; under v2.1's inline-tests mode) → each agent ships its own tests with each split file. WP-5's six produced files each contain `#[cfg(test)] mod tests` blocks; the verifier checks they exist; `cargo test --package printer` checks they pass. No central "integration test team" bottleneck — the test is the agent's responsibility, located in the same file as the code, gated mechanically.
	152	+
	153	+Without those four properties, every multi-agent refactor attempt I've seen run aground in the same way: agents start with disjoint scopes but converge in the merge phase because nothing structural was keeping them disjoint. The merge becomes a fifth task at least as expensive as any of the four refactor tasks. SAMA v2 is the architectural standard that says: the scope each agent saw is the scope each agent's merge gate enforces.
	154	+
	155	+## What this post is and is not
	156	+
	157	+Is: a careful projection from the rebuild sketch's serial-human estimates to a parallel-agent decomposition, with the work-package boundaries derived from the actual file-and-layer structure of ripgrep + the v2.1 dialects.
	158	+
	159	+Is not: a measured benchmark. No fleet has actually executed this manifest against ripgrep. The 8-hour number is the rebuild post's 8-working-day number divided by sane parallelism + some buffer for verifier roundtrips and Rust compile times. The numbers in the timeline are projection-grade, not measurement-grade.
	160	+
	161	+The §6 hook that makes the projection eventually testable: §5 of the v2 spec already says "compliance proves the rules were followed; the delta is what proves the rules were worth following." This post identifies a new delta v2 can take credit for that no other architectural standard can: parallel-refactor wall-clock. The cost of a refactor under v2-management is a separate, falsifiable empirical property — one that doesn't even exist as a measurable quantity in arbitrary codebases, because in arbitrary codebases parallel refactors don't merge cleanly.
	162	+
	163	+If §6 promotes the three v2.1 dialects, a follow-up experiment writes itself:
	164	+
	165	+1. Fork a v2-conforming open-source repo (this site, eventually ripgrep, eventually dive).
	166	+2. Generate a manifest like the one above.
	167	+3. Run a real fleet under a real orchestrator.
	168	+4. Measure: wall-clock to verifier-green merged main, number of agent-attempts per WP, number of orchestrator escalations, post-merge defect rate.
	169	+5. Compare against the same refactor done by a single agent serially, against the same refactor done by a single human serially, and (the cross-spec comparison the whole §5 + §6 program is for) against the same refactor attempted on a non-v2 codebase.
	170	+
	171	+The interesting comparison is not "how fast was the agent fleet" — it's the fourth row, the non-v2 attempt, which is the one we expect to never finish because the work packages won't stay disjoint.
	172	+
	173	+That's the experiment SAMA v2's empirical program is laying cable for, three blog posts at a time.
	174	+
	175	+## Three projections, three datapoints, one bracket
	176	+
	177	+The series so far on this Rust example:
	178	+
	179	+\| post \| scope \| wall-clock estimate \| confidence \|
	180	+\|---\|---\|---\|---\|
	181	+\| [the audit](/blog/sama-v2-rust-project-ripgrep) \| score ripgrep as-is against §4 \| n/a (read-only) \| empirical (the source was read) \|
	182	+\| [the rebuild](/blog/sama-v2-rust-project-ripgrep-rebuilt) \| serial-human refactor to 7/7 ✓ \| ~8 working days \| informed estimate from concrete file deltas \|
	183	+\| this post \| parallel-fleet refactor to 7/7 ✓ \| ~8 wall-clock hours \| projection, ~10× compression from above \|
	184	+
	185	+Each post tightens a different lever:
	186	+- The audit tells you where the codebase sits today.
	187	+- The rebuild tells you what it costs to get to compliant.
	188	+- The fleet projection tells you how that cost decomposes when the merge gate is mechanical.
	189	+
	190	+None of the three is a measured "v2 is worth following" claim by itself. Together they are the empirical chain §5 + §6 are pointing at: define the metrics, show what changes when the rules are followed, project what becomes mechanically possible when the verifier is the merge gate, and then — eventually — run the experiment that converts each projection into a measurement.
	191	+
	192	+---
	193	+
	194	+Companion posts:
	195	+
	196	+- [The ripgrep audit](/blog/sama-v2-rust-project-ripgrep) — the source of the work-package list
	197	+- [The ripgrep rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) — the serial-human cost estimate this post divides by parallelism
	198	+- [The dive rebuild](/blog/sama-v2-go-project-dive-rebuilt) — equivalent decomposition on a Go codebase, ~10 days serial; parallel-fleet projection would land similar 10× compression
	199	+- [The §5 metrics emitter](/blog/sama-v2-metrics-emitter) — the empirical apparatus the §6 experiment plugs into
	200	+- [The v2 spec](/sama/v2) — particularly §4 (Atomic, Architecture, Sorted, Modeled) and §6 (evolution policy)

modified content/blog/sama-v2-rust-project-ripgrep-rebuilt.md +3 −0

@@ -391,6 +391,8 @@ No new tests need to be written — `tests = "inline"` recognises the 38 source
391	391
392	392	For context, the [WordPress plugin parallel-architecture rebuild](/blog/sama-v2-wordpress-plugin-rebuilt) required splitting a 1,554-line public god-class into eleven files, redesigning the settings option as a typed value, and writing 20+ test files from scratch. Months of work, real risk of breaking the PRO add-on, WooCommerce, Yoast, and AIOSEO integrations. `dive` to 7/7 was ten working days of test writing plus one package split. `ripgrep` to 7/7 is one focused week of file splitting plus a TOML file.
393	393
	394	+(One focused week is the serial-human number. For the parallel-fleet projection that divides this estimate by SAMA-mechanical work-package boundaries and lands on ~8 wall-clock hours, see the [companion post](/blog/sama-v2-rust-project-ripgrep-parallel-fleet).)
	395	+
394	396	## Predicted §5 metrics for the rebuilt ripgrep
395	397
396	398	\| metric \| ripgrep today (estimated) \| ripgrep rebuilt (predicted) \| dive rebuilt \| tdd.md (measured) \|
@@ -425,6 +427,7 @@ Four observations:
425	427
426	428	Companion posts:
427	429
	430	+- [The same rebuild, run by a fleet of AI agents in parallel](/blog/sama-v2-rust-project-ripgrep-parallel-fleet) — projecting this post's ~8-working-day serial estimate into ~8 wall-clock hours under SAMA-mechanical merge gates
428	431	- [Today's `ripgrep` audit](/blog/sama-v2-rust-project-ripgrep) — where the 3/7-strict, 5/7-with-dialects score comes from, and the three findings this rebuild assumes get adopted
429	432	- [The `dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt) — same exercise on a Go codebase, the directory-dialect's first appearance
430	433	- [The `dive` prefix-scheme variant](/blog/sama-v2-go-project-dive-prefix-scheme) — what the dramatic file-rename refactor costs in Go (and would cost even more in Rust)

modified src/a31_blog.ts +6 −0

@@ -12,6 +12,12 @@ export interface BlogEntry {
12	12	}
13	13
14	14	export const ALL_POSTS: BlogEntry[] = [
	15	+ {
	16	+ slug: "sama-v2-rust-project-ripgrep-parallel-fleet",
	17	+ title: "The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection",
	18	+ description: "Yesterday's ripgrep rebuild sketch estimated ~1 focused working week (~8 working days) for one careful human. This post is the companion projection Bas asked: what does the same refactor look like if executed by a fleet of AI agents running in parallel across the planet, under strict SAMA v2 management? Honest answer: wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, ~10× compression. The load-bearing reason isn't 'AI is fast' — agents have been fast for years. The cap was always integration: pre-SAMA, every multi-agent refactor experiment grounded on the same three failure modes (scope creep, style drift, no mechanical merge gate). SAMA v2 dissolves all three because every architectural rule is also a merge gate. Decomposes the rebuild into 8 work packages — 4 profile-only WPs (serial chain, ~2h) + 4 code-split WPs (parallel, ~6h) — and demonstrates that the file-prefix + directory-layered crate graph give you a property no other architecture standard does: work-package boundaries are physically non-overlapping, so the orchestrator can scope-fence each agent's workspace with forbidden_paths and the agents literally cannot reach across boundaries. Plausible fleet composition with capability-matched task assignment (Opus on the hardest split, Haiku on TOML patches), 24-hour wall-clock coverage by spreading across time zones, mechanical verifier as merge gate so mixed-model output is fine. Timeline diagram showing T+0 through T+8h, then the section that matters most: why this is a SAMA-enabled property specifically — Atomic bounds working-context, Architecture aligns work-package boundaries with merge boundaries, Sorted makes dependency direction publicly readable to the orchestrator, Modeled makes tests the agent's local responsibility rather than a central bottleneck. Careful about framing: this is a projection from concrete file deltas + sane parallelism, not a measurement. The §6 experiment that would convert this into measurement is sketched at the end — fork a v2-conforming repo, generate the manifest, run the fleet, measure wall-clock vs serial vs serial-human vs the same refactor attempted on a non-v2 codebase. The interesting comparison is the fourth row, the non-v2 attempt — the one we expect to never finish because the work packages won't stay disjoint. That's the SAMA-specific empirical claim this post lays cable for.",
	19	+ date: "2026-05-26",
	20	+ },
15	21	{
16	22	slug: "sama-v2-rust-project-ripgrep-rebuilt",
17	23	title: "`ripgrep`, rebuilt under SAMA v2 — a thought experiment",

raw .diff