syntaxai/tdd.md · commit b182ea5

Blog: ripgrep refactor projected as parallel agent-fleet under SAMA mechanical merge gates

Companion to the ripgrep audit + rebuild pair. The rebuild estimated
~8 working days for a careful human; this post decomposes the same
refactor into 8 SAMA-aligned work packages (4 profile-only + 4
code-split), projects parallel execution by a multi-model fleet across
24-hour wall-clock coverage, and lands on ~8 wall-clock hours — ~10x
compression. Load-bearing argument: it's not 'agents are fast' (they
have been for years); it's that SAMA v2 turns every architectural rule
into a mechanical merge gate, which is the property that has
historically broken multi-agent parallel refactor attempts. Sketches
the §6 experiment that would convert the projection into a measurement
and cross-links from the rebuilt-sketch post.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
author
syntaxai <[email protected]>
date
2026-05-24 10:09:13 +01:00
parent
b1c644b
commit
b182ea556565857e23be3b2c9beab0326e78882b

3 files changed · +209 −0

added content/blog/sama-v2-rust-project-ripgrep-parallel-fleet.md +200 −0
@@ -0,0 +1,200 @@
1+# The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection
2+
3+[Yesterday's `ripgrep` rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) estimated **~1 focused working week** for one careful human to land the changes: write the `sama.profile.toml`, split `printer/standard.rs` into a six-file submodule, split `ignore/walk.rs` into a four-file submodule. Plus a few more days for the deferred splits. Eight working days total, end to end.
4+
5+That's the *serial-human* estimate. This post is the companion question Bas asked: *what does the same refactor look like if it's executed by a fleet of AI agents running in parallel, spread across the planet, under strict SAMA v2 management?*
6+
7+The honest answer: **the wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, and the load-bearing reason isn't "AI is fast" — it's that SAMA v2 turns each work-package boundary into a mergeable boundary**, which is the part that has historically broken every "let multiple agents refactor in parallel" experiment.
8+
9+This post is a *projection*, not a measurement. The number it lands on is conservative, the reasoning is the part to read carefully.
10+
11+## Why parallel agent refactor has been a non-starter — until now
12+
13+Pre-SAMA, the cap on multi-agent parallelism isn't agent speed. It's *integration*. Two agents asked to refactor the same module produce two divergent designs; even when each is internally consistent, merging them is a hand-rolled meta-task that exceeds the cost of having one agent do the work serially. So the field has mostly run one agent at a time, or sharded work along brittle file boundaries that agents end up violating anyway.
14+
15+The three things that actually break parallel agent work:
16+
17+1. **Scope creep.** Agent A asked to "split `walk.rs`" decides the parallel walker also needs a refactor of `dir.rs` and the gitignore parser, because they're "obviously coupled." Now agent C (assigned to `dir.rs`) has a merge conflict and zero context on agent A's design.
18+2. **Style drift.** Two agents producing 600-line files in the same crate format types differently, name variables differently, structure imports differently, and either the maintainer hand-polishes both back to one style (defeats the parallelism) or accepts visible inconsistency in the diff (defeats the maintainability that motivated the refactor).
19+3. **No mechanical merge gate.** When agent B's PR lands, *something* has to check that B's design still composes with A's design — and absent a verifier, that check is *taste*, which is exactly the resource you ran out of when you decided to spawn a fleet.
20+
21+SAMA v2 dissolves all three, in the same way and for the same reason: **every architectural rule is also a merge gate**. Agents don't have to agree on style as long as they each pass the verifier. The verifier is the same TypeScript code for every agent, mechanically applied, returning a binary verdict.
22+
23+## The work-package decomposition of the ripgrep rebuild
24+
25+The previous post listed the mandatory + deferred changes as a single human-serial table. The same table, re-cut as **eight parallel work packages**, each scoped to a single branch and a single mergeable verifier-passing PR:
26+
27+| WP# | scope | files touched | LOC delta | parallel-safe? |
28+|---|---|---|---|---|
29+| WP-1 | Write `sama.profile.toml` declaring layout=directory + tests=inline + atomic_exemption=declarative | `sama.profile.toml` (new) | +18 | yes — no other WP reads this until merge |
30+| WP-2 | Add profile note: serde derives ≠ boundary parsing | `sama.profile.toml` (1 line) | +1 | yes, but WP-1 must merge first |
31+| WP-3 | Add profile note: `searcher/line_buffer.rs` byte parsing is algorithm | `sama.profile.toml` (1 line) | +1 | yes, but WP-1 must merge first |
32+| WP-4 | Mark four declarative-exempt files | `sama.profile.toml` (4 lines) | +4 | yes, but WP-1 must merge first |
33+| WP-5 | Split `crates/printer/src/standard.rs` (3,987 LOC) → 6-file submodule | only `crates/printer/src/standard*` | ±3,987 | yes — no other WP touches printer/ |
34+| WP-6 | Split `crates/ignore/src/walk.rs` (2,494 LOC) → 4-file submodule | only `crates/ignore/src/walk*` | ±2,494 | yes — no other WP touches ignore/walk |
35+| WP-7 | Split `crates/searcher/src/searcher/glue.rs` (1,549 LOC) → 3-file submodule | only `crates/searcher/src/searcher/glue*` | ±1,549 | yes — no other WP touches that path |
36+| WP-8 | Split `crates/ignore/src/dir.rs` (1,305 LOC) → 2-file submodule | only `crates/ignore/src/dir*` | ±1,305 | yes — does not overlap WP-6's `walk*` files |
37+
38+**Every work package is scoped to a path prefix that no other work package touches.** That is not coincidence — it is the property the `Atomic` rule and the directory-layered crate graph give you for free. Once a refactor decomposes along verifier-recognized boundaries, the decomposition is *also* a non-overlapping merge plan.
39+
40+The four profile-only WPs (1–4) serialize behind WP-1 because they all edit the same one-file. But each is a one-line patch; together they merge in under a minute. The four code-split WPs (5–8) parallelize fully — they touch four disjoint directory subtrees.
41+
42+## The fleet manifest
43+
44+A hypothetical SAMA-aware orchestrator (let's call it `sama-fleet`) reads the profile, the audit's findings, and the rebuild sketch, then emits a manifest:
45+
46+```yaml
47+# sama-fleet.manifest.yaml — generated from the audit + rebuild sketch
48+target: BurntSushi/ripgrep@main
49+spec_version: sama_v2.1
50+deadline_hours: 8
51+
52+work_packages:
53+ - id: WP-1
54+ title: "Write sama.profile.toml"
55+ base_branch: main
56+ output_branch: sama-WP-1-profile
57+ scope_paths: ["sama.profile.toml"]
58+ estimated_minutes: 30
59+ merge_gate: [cargo-check, sama-v2-verify]
60+ blocks: [WP-2, WP-3, WP-4]
61+
62+ - id: WP-5
63+ title: "Split crates/printer/src/standard.rs into a six-file submodule"
64+ base_branch: main
65+ output_branch: sama-WP-5-printer-split
66+ scope_paths: ["crates/printer/src/standard*"]
67+ forbidden_paths: ["crates/printer/src/!(standard*)", "crates/!(printer)/**"]
68+ estimated_minutes: 360
69+ merge_gate: [cargo-check, cargo-test --package printer, sama-v2-verify]
70+ invariants:
71+ - "every pub item in standard.rs before == pub-reachable from same path after"
72+ - "printer crate test suite still passes"
73+ - "atomic cap: every produced .rs file <= 700 LOC OR declarative-exempt"
74+
75+ - id: WP-6
76+ title: "Split crates/ignore/src/walk.rs into a four-file submodule"
77+ base_branch: main
78+ output_branch: sama-WP-6-walk-split
79+ scope_paths: ["crates/ignore/src/walk*"]
80+ forbidden_paths: ["crates/ignore/src/!(walk*)", "crates/!(ignore)/**"]
81+ estimated_minutes: 240
82+ merge_gate: [cargo-check, cargo-test --package ignore, sama-v2-verify]
83+
84+ # ... WP-2 .. WP-4, WP-7, WP-8 follow the same shape ...
85+
86+merge_policy:
87+ serial_chain: [WP-1, WP-2, WP-3, WP-4] # profile patches, trivial
88+ parallel: [WP-5, WP-6, WP-7, WP-8] # code-split branches
89+ final_gate: full-sama-v2-verify # 7 / 7 on the unified main
90+```
91+
92+`scope_paths` and `forbidden_paths` are the load-bearing fields. The orchestrator hands each agent a *workspace shadow* with the `forbidden_paths` made read-only at the filesystem level. An agent assigned WP-5 *physically cannot* edit `crates/ignore/` even if its model decides midway through that "the printer split is best done by also refactoring the ignore walker." This is the kinetic version of the §1.2 Law — "agents cannot reach across boundaries" — enforced before any verifier ever runs.
93+
94+## The fleet (a plausible composition)
95+
96+The point of describing the fleet across the planet isn't location; it's *model diversity*. SAMA's verifier doesn't care what model produced the diff, only whether the diff conforms. So the fleet can be deliberately heterogeneous:
97+
98+| agent | model | region (24h coverage) | assignment |
99+|---|---|---|---|
100+| α | Claude Opus 4.7 | US-West (San Francisco) | WP-5 (printer split — largest, most complex) |
101+| β | Claude Sonnet 4.6 | Asia (Tokyo) | WP-6 (ignore/walk split — second largest) |
102+| γ | Claude Haiku 4.5 | Asia (Bangalore) | WP-7 (searcher/glue split — medium) |
103+| δ | Claude Sonnet 4.6 | Europe (Berlin) | WP-8 (ignore/dir split — smallest code WP) |
104+| ε | Claude Haiku 4.5 | South America (São Paulo) | WP-1 → WP-2 → WP-3 → WP-4 (profile + notes, serial chain) |
105+| ω | Claude Opus 4.7 | (orchestrator) | dispatches, monitors verifier output, merges in order, escalates ambiguity |
106+
107+Two reasons this composition matters under SAMA v2:
108+
109+**Capability-matched task assignment.** Splitting a 3,987-LOC printer file with five output modes is genuinely the hardest WP — give it to the strongest model. Inserting four exemption flags into a TOML is the easiest — give it to the smallest model that can reliably edit YAML. Mixed-capability fleets only work when the merge gate is purely mechanical, because otherwise a maintainer would have to hand-review the small-model patches anyway. SAMA's verifier is purely mechanical, so the mix works.
110+
111+**24-hour wall-clock coverage.** Even if some agents take longer than projected, work proceeds around the clock. The orchestrator never sleeps; agents in different regions step in as one finishes. Compared to a single careful human's working-week budget — even a fast human only works ~8 hours/day — the fleet runs ~24 productive hours/day from minute zero.
112+
113+## The wall-clock timeline
114+
115+Same eight work packages, plotted against an 8-hour wall clock. Times in (orchestrator-local) hours from start:
116+
117+```
118+time α (printer) β (walk) γ (glue) δ (dir) ε (profile chain)
119+──── ───────────── ─────────── ────────── ───────── ──────────────────
120+T+0h read sketch read sketch read sketch read sketch WP-1 begin
121+T+0h scope-fenced workspaces handed out by orchestrator ω
122+T+0.5h plan + tree plan + tree plan + tree plan + tree WP-1 done → CI
123+T+1h code code code code WP-2 begin → CI
124+T+1.5h " " " test WP-3 begin → CI
125+T+2h " " test verify ✓ WP-4 begin → CI
126+T+2.5h " test verify ✓ merge ω all profile WPs merged
127+T+3h " verify ✓ merge ω
128+T+3.5h " merge ω ─
129+T+4h test ─
130+T+5h verify ✓
131+T+5.5h merge ω
132+T+6h ─── final-gate verify on unified main ── 7/7 ✓ ──
133+T+6–8h integration tests (cargo test --workspace), perf benchmarks,
134+ smoke tests against real corpora, ω signs off → PR ready for human
135+```
136+
137+Total wall clock: **~8 hours** from "orchestrator reads the sketch" to "single PR ready for human-final-approval."
138+
139+Compared to the rebuild sketch's serial-human estimate of ~8 working days, that's **roughly a 10× wall-clock compression**. Not a record because no benchmark exists yet to break; the framing the audit + rebuild + this post together stake out is *that the benchmark can now be defined at all*. v2 + the verifier + the mechanical-merge-gate is the missing primitive that lets "speed of parallel agent refactor" be measured as a property of the codebase rather than as folklore about the agents.
140+
141+## Why this is a SAMA-enabled property specifically
142+
143+A claim worth being careful about: this projection isn't about AI agents being fast. Agents have been "fast" for years. The cap was always **integration**. The four properties SAMA v2 enforces that make parallel decomposition mergeable:
144+
145+1. **Atomic** (700-LOC cap) → work-package scope is bounded. WP-5's printer-split fits in a single agent's working context. So does WP-6's walk-split. So does each output file produced. The agent does not need to see "the whole printer crate" to do its job; it needs `standard.rs` plus the public surface of the rest of the crate.
146+
147+2. **Architecture** (layer mapping) → work-package boundaries align with merge boundaries. WP-5 lives entirely in Layer 1 sublayer "algorithm." WP-6 lives entirely in Layer 2. They literally cannot touch each other's layer files because the profile says so and the verifier rejects PRs that do.
148+
149+3. **Sorted** (under v2.1's directory dialect) → the dependency direction is publicly readable. The orchestrator can compute "WP-6 depends on Layer 0 + Layer 1 results" by reading the profile and the crate graph; it does not have to ask an agent or guess.
150+
151+4. **Modeled** (sibling tests; under v2.1's inline-tests mode) → each agent ships its own tests with each split file. WP-5's six produced files each contain `#[cfg(test)] mod tests` blocks; the verifier checks they exist; `cargo test --package printer` checks they pass. No central "integration test team" bottleneck — the test is the agent's responsibility, located in the same file as the code, gated mechanically.
152+
153+Without those four properties, every multi-agent refactor attempt I've seen run aground in the same way: agents start with disjoint scopes but converge in the merge phase because nothing structural was keeping them disjoint. The merge becomes a fifth task at least as expensive as any of the four refactor tasks. SAMA v2 is the architectural standard that says: **the scope each agent saw is the scope each agent's merge gate enforces**.
154+
155+## What this post is and is not
156+
157+**Is**: a careful projection from the rebuild sketch's serial-human estimates to a parallel-agent decomposition, with the work-package boundaries derived from the actual file-and-layer structure of ripgrep + the v2.1 dialects.
158+
159+**Is not**: a measured benchmark. No fleet has actually executed this manifest against ripgrep. The 8-hour number is the rebuild post's 8-working-day number divided by sane parallelism + some buffer for verifier roundtrips and Rust compile times. The numbers in the timeline are projection-grade, not measurement-grade.
160+
161+**The §6 hook** that makes the projection eventually testable: §5 of the v2 spec already says *"compliance proves the rules were followed; the delta is what proves the rules were worth following."* This post identifies *a new delta v2 can take credit for that no other architectural standard can*: parallel-refactor wall-clock. The cost of a refactor under v2-management is a separate, falsifiable empirical property — one that doesn't even exist as a measurable quantity in arbitrary codebases, because in arbitrary codebases parallel refactors don't merge cleanly.
162+
163+If §6 promotes the three v2.1 dialects, a follow-up experiment writes itself:
164+
165+1. Fork a v2-conforming open-source repo (this site, eventually ripgrep, eventually dive).
166+2. Generate a manifest like the one above.
167+3. Run a real fleet under a real orchestrator.
168+4. Measure: wall-clock to verifier-green merged main, number of agent-attempts per WP, number of orchestrator escalations, post-merge defect rate.
169+5. Compare against the same refactor done by a single agent serially, against the same refactor done by a single human serially, and (the cross-spec comparison the whole §5 + §6 program is for) against the same refactor attempted on a non-v2 codebase.
170+
171+The interesting comparison is not "how fast was the agent fleet" — it's the *fourth* row, the non-v2 attempt, which is the one we expect to never finish because the work packages won't stay disjoint.
172+
173+That's the experiment SAMA v2's empirical program is laying cable for, three blog posts at a time.
174+
175+## Three projections, three datapoints, one bracket
176+
177+The series so far on this Rust example:
178+
179+| post | scope | wall-clock estimate | confidence |
180+|---|---|---|---|
181+| [the audit](/blog/sama-v2-rust-project-ripgrep) | score ripgrep as-is against §4 | n/a (read-only) | empirical (the source was read) |
182+| [the rebuild](/blog/sama-v2-rust-project-ripgrep-rebuilt) | serial-human refactor to 7/7 ✓ | ~8 working days | informed estimate from concrete file deltas |
183+| this post | parallel-fleet refactor to 7/7 ✓ | ~8 wall-clock hours | projection, ~10× compression from above |
184+
185+Each post tightens a different lever:
186+- The audit tells you *where* the codebase sits today.
187+- The rebuild tells you *what* it costs to get to compliant.
188+- The fleet projection tells you *how that cost decomposes when the merge gate is mechanical*.
189+
190+None of the three is a measured "v2 is worth following" claim by itself. Together they are the empirical chain §5 + §6 are pointing at: define the metrics, show what changes when the rules are followed, project what becomes mechanically possible when the verifier is the merge gate, and then — eventually — run the experiment that converts each projection into a measurement.
191+
192+---
193+
194+**Companion posts:**
195+
196+- [The ripgrep audit](/blog/sama-v2-rust-project-ripgrep) — the source of the work-package list
197+- [The ripgrep rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) — the serial-human cost estimate this post divides by parallelism
198+- [The dive rebuild](/blog/sama-v2-go-project-dive-rebuilt) — equivalent decomposition on a Go codebase, ~10 days serial; parallel-fleet projection would land similar 10× compression
199+- [The §5 metrics emitter](/blog/sama-v2-metrics-emitter) — the empirical apparatus the §6 experiment plugs into
200+- [The v2 spec](/sama/v2) — particularly §4 (Atomic, Architecture, Sorted, Modeled) and §6 (evolution policy)
modified content/blog/sama-v2-rust-project-ripgrep-rebuilt.md +3 −0
@@ -391,6 +391,8 @@ No new tests need to be written — `tests = "inline"` recognises the 38 source
391391
392392 For context, the [WordPress plugin parallel-architecture rebuild](/blog/sama-v2-wordpress-plugin-rebuilt) required splitting a 1,554-line public god-class into eleven files, redesigning the settings option as a typed value, and writing 20+ test files from scratch. Months of work, real risk of breaking the PRO add-on, WooCommerce, Yoast, and AIOSEO integrations. `dive` to 7/7 was ten working days of test writing plus one package split. `ripgrep` to 7/7 is one focused week of file splitting plus a TOML file.
393393
394+(One focused week is the *serial-human* number. For the parallel-fleet projection that divides this estimate by SAMA-mechanical work-package boundaries and lands on ~8 wall-clock hours, see the [companion post](/blog/sama-v2-rust-project-ripgrep-parallel-fleet).)
395+
394396 ## Predicted §5 metrics for the rebuilt ripgrep
395397
396398 | metric | ripgrep today (estimated) | ripgrep rebuilt (predicted) | dive rebuilt | tdd.md (measured) |
@@ -425,6 +427,7 @@ Four observations:
425427
426428 **Companion posts:**
427429
430+- **[The same rebuild, run by a fleet of AI agents in parallel](/blog/sama-v2-rust-project-ripgrep-parallel-fleet)** — projecting this post's ~8-working-day serial estimate into ~8 wall-clock hours under SAMA-mechanical merge gates
428431 - [Today's `ripgrep` audit](/blog/sama-v2-rust-project-ripgrep) — where the 3/7-strict, 5/7-with-dialects score comes from, and the three findings this rebuild assumes get adopted
429432 - [The `dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt) — same exercise on a Go codebase, the directory-dialect's first appearance
430433 - [The `dive` prefix-scheme variant](/blog/sama-v2-go-project-dive-prefix-scheme) — what the dramatic file-rename refactor costs in Go (and would cost even more in Rust)
modified src/a31_blog.ts +6 −0
@@ -12,6 +12,12 @@ export interface BlogEntry {
1212 }
1313
1414 export const ALL_POSTS: BlogEntry[] = [
15+ {
16+ slug: "sama-v2-rust-project-ripgrep-parallel-fleet",
17+ title: "The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection",
18+ description: "Yesterday's ripgrep rebuild sketch estimated ~1 focused working week (~8 working days) for one careful human. This post is the companion projection Bas asked: what does the same refactor look like if executed by a fleet of AI agents running in parallel across the planet, under strict SAMA v2 management? Honest answer: wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, ~10× compression. The load-bearing reason isn't 'AI is fast' — agents have been fast for years. The cap was always integration: pre-SAMA, every multi-agent refactor experiment grounded on the same three failure modes (scope creep, style drift, no mechanical merge gate). SAMA v2 dissolves all three because every architectural rule is also a merge gate. Decomposes the rebuild into 8 work packages — 4 profile-only WPs (serial chain, ~2h) + 4 code-split WPs (parallel, ~6h) — and demonstrates that the file-prefix + directory-layered crate graph give you a property no other architecture standard does: work-package boundaries are physically non-overlapping, so the orchestrator can scope-fence each agent's workspace with forbidden_paths and the agents literally cannot reach across boundaries. Plausible fleet composition with capability-matched task assignment (Opus on the hardest split, Haiku on TOML patches), 24-hour wall-clock coverage by spreading across time zones, mechanical verifier as merge gate so mixed-model output is fine. Timeline diagram showing T+0 through T+8h, then the section that matters most: why this is a SAMA-enabled property specifically — Atomic bounds working-context, Architecture aligns work-package boundaries with merge boundaries, Sorted makes dependency direction publicly readable to the orchestrator, Modeled makes tests the agent's local responsibility rather than a central bottleneck. Careful about framing: this is a projection from concrete file deltas + sane parallelism, not a measurement. The §6 experiment that would convert this into measurement is sketched at the end — fork a v2-conforming repo, generate the manifest, run the fleet, measure wall-clock vs serial vs serial-human vs the same refactor attempted on a non-v2 codebase. The interesting comparison is the fourth row, the non-v2 attempt — the one we expect to never finish because the work packages won't stay disjoint. That's the SAMA-specific empirical claim this post lays cable for.",
19+ date: "2026-05-26",
20+ },
1521 {
1622 slug: "sama-v2-rust-project-ripgrep-rebuilt",
1723 title: "`ripgrep`, rebuilt under SAMA v2 — a thought experiment",