| 1 | +# The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection |
| 2 | + |
| 3 | +[Yesterday's `ripgrep` rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) estimated **~1 focused working week** for one careful human to land the changes: write the `sama.profile.toml`, split `printer/standard.rs` into a six-file submodule, split `ignore/walk.rs` into a four-file submodule. Plus a few more days for the deferred splits. Eight working days total, end to end. |
| 4 | + |
| 5 | +That's the *serial-human* estimate. This post is the companion question Bas asked: *what does the same refactor look like if it's executed by a fleet of AI agents running in parallel, spread across the planet, under strict SAMA v2 management?* |
| 6 | + |
| 7 | +The honest answer: **the wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, and the load-bearing reason isn't "AI is fast" — it's that SAMA v2 turns each work-package boundary into a mergeable boundary**, which is the part that has historically broken every "let multiple agents refactor in parallel" experiment. |
| 8 | + |
| 9 | +This post is a *projection*, not a measurement. The number it lands on is conservative, the reasoning is the part to read carefully. |
| 10 | + |
| 11 | +## Why parallel agent refactor has been a non-starter — until now |
| 12 | + |
| 13 | +Pre-SAMA, the cap on multi-agent parallelism isn't agent speed. It's *integration*. Two agents asked to refactor the same module produce two divergent designs; even when each is internally consistent, merging them is a hand-rolled meta-task that exceeds the cost of having one agent do the work serially. So the field has mostly run one agent at a time, or sharded work along brittle file boundaries that agents end up violating anyway. |
| 14 | + |
| 15 | +The three things that actually break parallel agent work: |
| 16 | + |
| 17 | +1. **Scope creep.** Agent A asked to "split `walk.rs`" decides the parallel walker also needs a refactor of `dir.rs` and the gitignore parser, because they're "obviously coupled." Now agent C (assigned to `dir.rs`) has a merge conflict and zero context on agent A's design. |
| 18 | +2. **Style drift.** Two agents producing 600-line files in the same crate format types differently, name variables differently, structure imports differently, and either the maintainer hand-polishes both back to one style (defeats the parallelism) or accepts visible inconsistency in the diff (defeats the maintainability that motivated the refactor). |
| 19 | +3. **No mechanical merge gate.** When agent B's PR lands, *something* has to check that B's design still composes with A's design — and absent a verifier, that check is *taste*, which is exactly the resource you ran out of when you decided to spawn a fleet. |
| 20 | + |
| 21 | +SAMA v2 dissolves all three, in the same way and for the same reason: **every architectural rule is also a merge gate**. Agents don't have to agree on style as long as they each pass the verifier. The verifier is the same TypeScript code for every agent, mechanically applied, returning a binary verdict. |
| 22 | + |
| 23 | +## The work-package decomposition of the ripgrep rebuild |
| 24 | + |
| 25 | +The previous post listed the mandatory + deferred changes as a single human-serial table. The same table, re-cut as **eight parallel work packages**, each scoped to a single branch and a single mergeable verifier-passing PR: |
| 26 | + |
| 27 | +| WP# | scope | files touched | LOC delta | parallel-safe? | |
| 28 | +|---|---|---|---|---| |
| 29 | +| WP-1 | Write `sama.profile.toml` declaring layout=directory + tests=inline + atomic_exemption=declarative | `sama.profile.toml` (new) | +18 | yes — no other WP reads this until merge | |
| 30 | +| WP-2 | Add profile note: serde derives ≠ boundary parsing | `sama.profile.toml` (1 line) | +1 | yes, but WP-1 must merge first | |
| 31 | +| WP-3 | Add profile note: `searcher/line_buffer.rs` byte parsing is algorithm | `sama.profile.toml` (1 line) | +1 | yes, but WP-1 must merge first | |
| 32 | +| WP-4 | Mark four declarative-exempt files | `sama.profile.toml` (4 lines) | +4 | yes, but WP-1 must merge first | |
| 33 | +| WP-5 | Split `crates/printer/src/standard.rs` (3,987 LOC) → 6-file submodule | only `crates/printer/src/standard*` | ±3,987 | yes — no other WP touches printer/ | |
| 34 | +| WP-6 | Split `crates/ignore/src/walk.rs` (2,494 LOC) → 4-file submodule | only `crates/ignore/src/walk*` | ±2,494 | yes — no other WP touches ignore/walk | |
| 35 | +| WP-7 | Split `crates/searcher/src/searcher/glue.rs` (1,549 LOC) → 3-file submodule | only `crates/searcher/src/searcher/glue*` | ±1,549 | yes — no other WP touches that path | |
| 36 | +| WP-8 | Split `crates/ignore/src/dir.rs` (1,305 LOC) → 2-file submodule | only `crates/ignore/src/dir*` | ±1,305 | yes — does not overlap WP-6's `walk*` files | |
| 37 | + |
| 38 | +**Every work package is scoped to a path prefix that no other work package touches.** That is not coincidence — it is the property the `Atomic` rule and the directory-layered crate graph give you for free. Once a refactor decomposes along verifier-recognized boundaries, the decomposition is *also* a non-overlapping merge plan. |
| 39 | + |
| 40 | +The four profile-only WPs (1–4) serialize behind WP-1 because they all edit the same one-file. But each is a one-line patch; together they merge in under a minute. The four code-split WPs (5–8) parallelize fully — they touch four disjoint directory subtrees. |
| 41 | + |
| 42 | +## The fleet manifest |
| 43 | + |
| 44 | +A hypothetical SAMA-aware orchestrator (let's call it `sama-fleet`) reads the profile, the audit's findings, and the rebuild sketch, then emits a manifest: |
| 45 | + |
| 46 | +```yaml |
| 47 | +# sama-fleet.manifest.yaml — generated from the audit + rebuild sketch |
| 48 | +target: BurntSushi/ripgrep@main |
| 49 | +spec_version: sama_v2.1 |
| 50 | +deadline_hours: 8 |
| 51 | + |
| 52 | +work_packages: |
| 53 | + - id: WP-1 |
| 54 | + title: "Write sama.profile.toml" |
| 55 | + base_branch: main |
| 56 | + output_branch: sama-WP-1-profile |
| 57 | + scope_paths: ["sama.profile.toml"] |
| 58 | + estimated_minutes: 30 |
| 59 | + merge_gate: [cargo-check, sama-v2-verify] |
| 60 | + blocks: [WP-2, WP-3, WP-4] |
| 61 | + |
| 62 | + - id: WP-5 |
| 63 | + title: "Split crates/printer/src/standard.rs into a six-file submodule" |
| 64 | + base_branch: main |
| 65 | + output_branch: sama-WP-5-printer-split |
| 66 | + scope_paths: ["crates/printer/src/standard*"] |
| 67 | + forbidden_paths: ["crates/printer/src/!(standard*)", "crates/!(printer)/**"] |
| 68 | + estimated_minutes: 360 |
| 69 | + merge_gate: [cargo-check, cargo-test --package printer, sama-v2-verify] |
| 70 | + invariants: |
| 71 | + - "every pub item in standard.rs before == pub-reachable from same path after" |
| 72 | + - "printer crate test suite still passes" |
| 73 | + - "atomic cap: every produced .rs file <= 700 LOC OR declarative-exempt" |
| 74 | + |
| 75 | + - id: WP-6 |
| 76 | + title: "Split crates/ignore/src/walk.rs into a four-file submodule" |
| 77 | + base_branch: main |
| 78 | + output_branch: sama-WP-6-walk-split |
| 79 | + scope_paths: ["crates/ignore/src/walk*"] |
| 80 | + forbidden_paths: ["crates/ignore/src/!(walk*)", "crates/!(ignore)/**"] |
| 81 | + estimated_minutes: 240 |
| 82 | + merge_gate: [cargo-check, cargo-test --package ignore, sama-v2-verify] |
| 83 | + |
| 84 | + # ... WP-2 .. WP-4, WP-7, WP-8 follow the same shape ... |
| 85 | + |
| 86 | +merge_policy: |
| 87 | + serial_chain: [WP-1, WP-2, WP-3, WP-4] # profile patches, trivial |
| 88 | + parallel: [WP-5, WP-6, WP-7, WP-8] # code-split branches |
| 89 | + final_gate: full-sama-v2-verify # 7 / 7 on the unified main |
| 90 | +``` |
| 91 | + |
| 92 | +`scope_paths` and `forbidden_paths` are the load-bearing fields. The orchestrator hands each agent a *workspace shadow* with the `forbidden_paths` made read-only at the filesystem level. An agent assigned WP-5 *physically cannot* edit `crates/ignore/` even if its model decides midway through that "the printer split is best done by also refactoring the ignore walker." This is the kinetic version of the §1.2 Law — "agents cannot reach across boundaries" — enforced before any verifier ever runs. |
| 93 | + |
| 94 | +## The fleet (a plausible composition) |
| 95 | + |
| 96 | +The point of describing the fleet across the planet isn't location; it's *model diversity*. SAMA's verifier doesn't care what model produced the diff, only whether the diff conforms. So the fleet can be deliberately heterogeneous: |
| 97 | + |
| 98 | +| agent | model | region (24h coverage) | assignment | |
| 99 | +|---|---|---|---| |
| 100 | +| α | Claude Opus 4.7 | US-West (San Francisco) | WP-5 (printer split — largest, most complex) | |
| 101 | +| β | Claude Sonnet 4.6 | Asia (Tokyo) | WP-6 (ignore/walk split — second largest) | |
| 102 | +| γ | Claude Haiku 4.5 | Asia (Bangalore) | WP-7 (searcher/glue split — medium) | |
| 103 | +| δ | Claude Sonnet 4.6 | Europe (Berlin) | WP-8 (ignore/dir split — smallest code WP) | |
| 104 | +| ε | Claude Haiku 4.5 | South America (São Paulo) | WP-1 → WP-2 → WP-3 → WP-4 (profile + notes, serial chain) | |
| 105 | +| ω | Claude Opus 4.7 | (orchestrator) | dispatches, monitors verifier output, merges in order, escalates ambiguity | |
| 106 | + |
| 107 | +Two reasons this composition matters under SAMA v2: |
| 108 | + |
| 109 | +**Capability-matched task assignment.** Splitting a 3,987-LOC printer file with five output modes is genuinely the hardest WP — give it to the strongest model. Inserting four exemption flags into a TOML is the easiest — give it to the smallest model that can reliably edit YAML. Mixed-capability fleets only work when the merge gate is purely mechanical, because otherwise a maintainer would have to hand-review the small-model patches anyway. SAMA's verifier is purely mechanical, so the mix works. |
| 110 | + |
| 111 | +**24-hour wall-clock coverage.** Even if some agents take longer than projected, work proceeds around the clock. The orchestrator never sleeps; agents in different regions step in as one finishes. Compared to a single careful human's working-week budget — even a fast human only works ~8 hours/day — the fleet runs ~24 productive hours/day from minute zero. |
| 112 | + |
| 113 | +## The wall-clock timeline |
| 114 | + |
| 115 | +Same eight work packages, plotted against an 8-hour wall clock. Times in (orchestrator-local) hours from start: |
| 116 | + |
| 117 | +``` |
| 118 | +time α (printer) β (walk) γ (glue) δ (dir) ε (profile chain) |
| 119 | +──── ───────────── ─────────── ────────── ───────── ────────────────── |
| 120 | +T+0h read sketch read sketch read sketch read sketch WP-1 begin |
| 121 | +T+0h scope-fenced workspaces handed out by orchestrator ω |
| 122 | +T+0.5h plan + tree plan + tree plan + tree plan + tree WP-1 done → CI |
| 123 | +T+1h code code code code WP-2 begin → CI |
| 124 | +T+1.5h " " " test WP-3 begin → CI |
| 125 | +T+2h " " test verify ✓ WP-4 begin → CI |
| 126 | +T+2.5h " test verify ✓ merge ω all profile WPs merged |
| 127 | +T+3h " verify ✓ merge ω |
| 128 | +T+3.5h " merge ω ─ |
| 129 | +T+4h test ─ |
| 130 | +T+5h verify ✓ |
| 131 | +T+5.5h merge ω |
| 132 | +T+6h ─── final-gate verify on unified main ── 7/7 ✓ ── |
| 133 | +T+6–8h integration tests (cargo test --workspace), perf benchmarks, |
| 134 | + smoke tests against real corpora, ω signs off → PR ready for human |
| 135 | +``` |
| 136 | + |
| 137 | +Total wall clock: **~8 hours** from "orchestrator reads the sketch" to "single PR ready for human-final-approval." |
| 138 | + |
| 139 | +Compared to the rebuild sketch's serial-human estimate of ~8 working days, that's **roughly a 10× wall-clock compression**. Not a record because no benchmark exists yet to break; the framing the audit + rebuild + this post together stake out is *that the benchmark can now be defined at all*. v2 + the verifier + the mechanical-merge-gate is the missing primitive that lets "speed of parallel agent refactor" be measured as a property of the codebase rather than as folklore about the agents. |
| 140 | + |
| 141 | +## Why this is a SAMA-enabled property specifically |
| 142 | + |
| 143 | +A claim worth being careful about: this projection isn't about AI agents being fast. Agents have been "fast" for years. The cap was always **integration**. The four properties SAMA v2 enforces that make parallel decomposition mergeable: |
| 144 | + |
| 145 | +1. **Atomic** (700-LOC cap) → work-package scope is bounded. WP-5's printer-split fits in a single agent's working context. So does WP-6's walk-split. So does each output file produced. The agent does not need to see "the whole printer crate" to do its job; it needs `standard.rs` plus the public surface of the rest of the crate. |
| 146 | + |
| 147 | +2. **Architecture** (layer mapping) → work-package boundaries align with merge boundaries. WP-5 lives entirely in Layer 1 sublayer "algorithm." WP-6 lives entirely in Layer 2. They literally cannot touch each other's layer files because the profile says so and the verifier rejects PRs that do. |
| 148 | + |
| 149 | +3. **Sorted** (under v2.1's directory dialect) → the dependency direction is publicly readable. The orchestrator can compute "WP-6 depends on Layer 0 + Layer 1 results" by reading the profile and the crate graph; it does not have to ask an agent or guess. |
| 150 | + |
| 151 | +4. **Modeled** (sibling tests; under v2.1's inline-tests mode) → each agent ships its own tests with each split file. WP-5's six produced files each contain `#[cfg(test)] mod tests` blocks; the verifier checks they exist; `cargo test --package printer` checks they pass. No central "integration test team" bottleneck — the test is the agent's responsibility, located in the same file as the code, gated mechanically. |
| 152 | + |
| 153 | +Without those four properties, every multi-agent refactor attempt I've seen run aground in the same way: agents start with disjoint scopes but converge in the merge phase because nothing structural was keeping them disjoint. The merge becomes a fifth task at least as expensive as any of the four refactor tasks. SAMA v2 is the architectural standard that says: **the scope each agent saw is the scope each agent's merge gate enforces**. |
| 154 | + |
| 155 | +## What this post is and is not |
| 156 | + |
| 157 | +**Is**: a careful projection from the rebuild sketch's serial-human estimates to a parallel-agent decomposition, with the work-package boundaries derived from the actual file-and-layer structure of ripgrep + the v2.1 dialects. |
| 158 | + |
| 159 | +**Is not**: a measured benchmark. No fleet has actually executed this manifest against ripgrep. The 8-hour number is the rebuild post's 8-working-day number divided by sane parallelism + some buffer for verifier roundtrips and Rust compile times. The numbers in the timeline are projection-grade, not measurement-grade. |
| 160 | + |
| 161 | +**The §6 hook** that makes the projection eventually testable: §5 of the v2 spec already says *"compliance proves the rules were followed; the delta is what proves the rules were worth following."* This post identifies *a new delta v2 can take credit for that no other architectural standard can*: parallel-refactor wall-clock. The cost of a refactor under v2-management is a separate, falsifiable empirical property — one that doesn't even exist as a measurable quantity in arbitrary codebases, because in arbitrary codebases parallel refactors don't merge cleanly. |
| 162 | + |
| 163 | +If §6 promotes the three v2.1 dialects, a follow-up experiment writes itself: |
| 164 | + |
| 165 | +1. Fork a v2-conforming open-source repo (this site, eventually ripgrep, eventually dive). |
| 166 | +2. Generate a manifest like the one above. |
| 167 | +3. Run a real fleet under a real orchestrator. |
| 168 | +4. Measure: wall-clock to verifier-green merged main, number of agent-attempts per WP, number of orchestrator escalations, post-merge defect rate. |
| 169 | +5. Compare against the same refactor done by a single agent serially, against the same refactor done by a single human serially, and (the cross-spec comparison the whole §5 + §6 program is for) against the same refactor attempted on a non-v2 codebase. |
| 170 | + |
| 171 | +The interesting comparison is not "how fast was the agent fleet" — it's the *fourth* row, the non-v2 attempt, which is the one we expect to never finish because the work packages won't stay disjoint. |
| 172 | + |
| 173 | +That's the experiment SAMA v2's empirical program is laying cable for, three blog posts at a time. |
| 174 | + |
| 175 | +## Three projections, three datapoints, one bracket |
| 176 | + |
| 177 | +The series so far on this Rust example: |
| 178 | + |
| 179 | +| post | scope | wall-clock estimate | confidence | |
| 180 | +|---|---|---|---| |
| 181 | +| [the audit](/blog/sama-v2-rust-project-ripgrep) | score ripgrep as-is against §4 | n/a (read-only) | empirical (the source was read) | |
| 182 | +| [the rebuild](/blog/sama-v2-rust-project-ripgrep-rebuilt) | serial-human refactor to 7/7 ✓ | ~8 working days | informed estimate from concrete file deltas | |
| 183 | +| this post | parallel-fleet refactor to 7/7 ✓ | ~8 wall-clock hours | projection, ~10× compression from above | |
| 184 | + |
| 185 | +Each post tightens a different lever: |
| 186 | +- The audit tells you *where* the codebase sits today. |
| 187 | +- The rebuild tells you *what* it costs to get to compliant. |
| 188 | +- The fleet projection tells you *how that cost decomposes when the merge gate is mechanical*. |
| 189 | + |
| 190 | +None of the three is a measured "v2 is worth following" claim by itself. Together they are the empirical chain §5 + §6 are pointing at: define the metrics, show what changes when the rules are followed, project what becomes mechanically possible when the verifier is the merge gate, and then — eventually — run the experiment that converts each projection into a measurement. |
| 191 | + |
| 192 | +--- |
| 193 | + |
| 194 | +**Companion posts:** |
| 195 | + |
| 196 | +- [The ripgrep audit](/blog/sama-v2-rust-project-ripgrep) — the source of the work-package list |
| 197 | +- [The ripgrep rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) — the serial-human cost estimate this post divides by parallelism |
| 198 | +- [The dive rebuild](/blog/sama-v2-go-project-dive-rebuilt) — equivalent decomposition on a Go codebase, ~10 days serial; parallel-fleet projection would land similar 10× compression |
| 199 | +- [The §5 metrics emitter](/blog/sama-v2-metrics-emitter) — the empirical apparatus the §6 experiment plugs into |
| 200 | +- [The v2 spec](/sama/v2) — particularly §4 (Atomic, Architecture, Sorted, Modeled) and §6 (evolution policy) |