b182ea556565857e23be3b2c9beab0326e78882b diff --git a/content/blog/sama-v2-rust-project-ripgrep-parallel-fleet.md b/content/blog/sama-v2-rust-project-ripgrep-parallel-fleet.md new file mode 100644 index 0000000000000000000000000000000000000000..c7e60f70f7fd99cf48a9d29f7830a022b066b94f --- /dev/null +++ b/content/blog/sama-v2-rust-project-ripgrep-parallel-fleet.md @@ -0,0 +1,200 @@ +# The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection + +[Yesterday's `ripgrep` rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) estimated **~1 focused working week** for one careful human to land the changes: write the `sama.profile.toml`, split `printer/standard.rs` into a six-file submodule, split `ignore/walk.rs` into a four-file submodule. Plus a few more days for the deferred splits. Eight working days total, end to end. + +That's the *serial-human* estimate. This post is the companion question Bas asked: *what does the same refactor look like if it's executed by a fleet of AI agents running in parallel, spread across the planet, under strict SAMA v2 management?* + +The honest answer: **the wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, and the load-bearing reason isn't "AI is fast" — it's that SAMA v2 turns each work-package boundary into a mergeable boundary**, which is the part that has historically broken every "let multiple agents refactor in parallel" experiment. + +This post is a *projection*, not a measurement. The number it lands on is conservative, the reasoning is the part to read carefully. + +## Why parallel agent refactor has been a non-starter — until now + +Pre-SAMA, the cap on multi-agent parallelism isn't agent speed. It's *integration*. Two agents asked to refactor the same module produce two divergent designs; even when each is internally consistent, merging them is a hand-rolled meta-task that exceeds the cost of having one agent do the work serially. So the field has mostly run one agent at a time, or sharded work along brittle file boundaries that agents end up violating anyway. + +The three things that actually break parallel agent work: + +1. **Scope creep.** Agent A asked to "split `walk.rs`" decides the parallel walker also needs a refactor of `dir.rs` and the gitignore parser, because they're "obviously coupled." Now agent C (assigned to `dir.rs`) has a merge conflict and zero context on agent A's design. +2. **Style drift.** Two agents producing 600-line files in the same crate format types differently, name variables differently, structure imports differently, and either the maintainer hand-polishes both back to one style (defeats the parallelism) or accepts visible inconsistency in the diff (defeats the maintainability that motivated the refactor). +3. **No mechanical merge gate.** When agent B's PR lands, *something* has to check that B's design still composes with A's design — and absent a verifier, that check is *taste*, which is exactly the resource you ran out of when you decided to spawn a fleet. + +SAMA v2 dissolves all three, in the same way and for the same reason: **every architectural rule is also a merge gate**. Agents don't have to agree on style as long as they each pass the verifier. The verifier is the same TypeScript code for every agent, mechanically applied, returning a binary verdict. + +## The work-package decomposition of the ripgrep rebuild + +The previous post listed the mandatory + deferred changes as a single human-serial table. The same table, re-cut as **eight parallel work packages**, each scoped to a single branch and a single mergeable verifier-passing PR: + +| WP# | scope | files touched | LOC delta | parallel-safe? | +|---|---|---|---|---| +| WP-1 | Write `sama.profile.toml` declaring layout=directory + tests=inline + atomic_exemption=declarative | `sama.profile.toml` (new) | +18 | yes — no other WP reads this until merge | +| WP-2 | Add profile note: serde derives ≠ boundary parsing | `sama.profile.toml` (1 line) | +1 | yes, but WP-1 must merge first | +| WP-3 | Add profile note: `searcher/line_buffer.rs` byte parsing is algorithm | `sama.profile.toml` (1 line) | +1 | yes, but WP-1 must merge first | +| WP-4 | Mark four declarative-exempt files | `sama.profile.toml` (4 lines) | +4 | yes, but WP-1 must merge first | +| WP-5 | Split `crates/printer/src/standard.rs` (3,987 LOC) → 6-file submodule | only `crates/printer/src/standard*` | ±3,987 | yes — no other WP touches printer/ | +| WP-6 | Split `crates/ignore/src/walk.rs` (2,494 LOC) → 4-file submodule | only `crates/ignore/src/walk*` | ±2,494 | yes — no other WP touches ignore/walk | +| WP-7 | Split `crates/searcher/src/searcher/glue.rs` (1,549 LOC) → 3-file submodule | only `crates/searcher/src/searcher/glue*` | ±1,549 | yes — no other WP touches that path | +| WP-8 | Split `crates/ignore/src/dir.rs` (1,305 LOC) → 2-file submodule | only `crates/ignore/src/dir*` | ±1,305 | yes — does not overlap WP-6's `walk*` files | + +**Every work package is scoped to a path prefix that no other work package touches.** That is not coincidence — it is the property the `Atomic` rule and the directory-layered crate graph give you for free. Once a refactor decomposes along verifier-recognized boundaries, the decomposition is *also* a non-overlapping merge plan. + +The four profile-only WPs (1–4) serialize behind WP-1 because they all edit the same one-file. But each is a one-line patch; together they merge in under a minute. The four code-split WPs (5–8) parallelize fully — they touch four disjoint directory subtrees. + +## The fleet manifest + +A hypothetical SAMA-aware orchestrator (let's call it `sama-fleet`) reads the profile, the audit's findings, and the rebuild sketch, then emits a manifest: + +```yaml +# sama-fleet.manifest.yaml — generated from the audit + rebuild sketch +target: BurntSushi/ripgrep@main +spec_version: sama_v2.1 +deadline_hours: 8 + +work_packages: + - id: WP-1 + title: "Write sama.profile.toml" + base_branch: main + output_branch: sama-WP-1-profile + scope_paths: ["sama.profile.toml"] + estimated_minutes: 30 + merge_gate: [cargo-check, sama-v2-verify] + blocks: [WP-2, WP-3, WP-4] + + - id: WP-5 + title: "Split crates/printer/src/standard.rs into a six-file submodule" + base_branch: main + output_branch: sama-WP-5-printer-split + scope_paths: ["crates/printer/src/standard*"] + forbidden_paths: ["crates/printer/src/!(standard*)", "crates/!(printer)/**"] + estimated_minutes: 360 + merge_gate: [cargo-check, cargo-test --package printer, sama-v2-verify] + invariants: + - "every pub item in standard.rs before == pub-reachable from same path after" + - "printer crate test suite still passes" + - "atomic cap: every produced .rs file <= 700 LOC OR declarative-exempt" + + - id: WP-6 + title: "Split crates/ignore/src/walk.rs into a four-file submodule" + base_branch: main + output_branch: sama-WP-6-walk-split + scope_paths: ["crates/ignore/src/walk*"] + forbidden_paths: ["crates/ignore/src/!(walk*)", "crates/!(ignore)/**"] + estimated_minutes: 240 + merge_gate: [cargo-check, cargo-test --package ignore, sama-v2-verify] + + # ... WP-2 .. WP-4, WP-7, WP-8 follow the same shape ... + +merge_policy: + serial_chain: [WP-1, WP-2, WP-3, WP-4] # profile patches, trivial + parallel: [WP-5, WP-6, WP-7, WP-8] # code-split branches + final_gate: full-sama-v2-verify # 7 / 7 on the unified main +``` + +`scope_paths` and `forbidden_paths` are the load-bearing fields. The orchestrator hands each agent a *workspace shadow* with the `forbidden_paths` made read-only at the filesystem level. An agent assigned WP-5 *physically cannot* edit `crates/ignore/` even if its model decides midway through that "the printer split is best done by also refactoring the ignore walker." This is the kinetic version of the §1.2 Law — "agents cannot reach across boundaries" — enforced before any verifier ever runs. + +## The fleet (a plausible composition) + +The point of describing the fleet across the planet isn't location; it's *model diversity*. SAMA's verifier doesn't care what model produced the diff, only whether the diff conforms. So the fleet can be deliberately heterogeneous: + +| agent | model | region (24h coverage) | assignment | +|---|---|---|---| +| α | Claude Opus 4.7 | US-West (San Francisco) | WP-5 (printer split — largest, most complex) | +| β | Claude Sonnet 4.6 | Asia (Tokyo) | WP-6 (ignore/walk split — second largest) | +| γ | Claude Haiku 4.5 | Asia (Bangalore) | WP-7 (searcher/glue split — medium) | +| δ | Claude Sonnet 4.6 | Europe (Berlin) | WP-8 (ignore/dir split — smallest code WP) | +| ε | Claude Haiku 4.5 | South America (São Paulo) | WP-1 → WP-2 → WP-3 → WP-4 (profile + notes, serial chain) | +| ω | Claude Opus 4.7 | (orchestrator) | dispatches, monitors verifier output, merges in order, escalates ambiguity | + +Two reasons this composition matters under SAMA v2: + +**Capability-matched task assignment.** Splitting a 3,987-LOC printer file with five output modes is genuinely the hardest WP — give it to the strongest model. Inserting four exemption flags into a TOML is the easiest — give it to the smallest model that can reliably edit YAML. Mixed-capability fleets only work when the merge gate is purely mechanical, because otherwise a maintainer would have to hand-review the small-model patches anyway. SAMA's verifier is purely mechanical, so the mix works. + +**24-hour wall-clock coverage.** Even if some agents take longer than projected, work proceeds around the clock. The orchestrator never sleeps; agents in different regions step in as one finishes. Compared to a single careful human's working-week budget — even a fast human only works ~8 hours/day — the fleet runs ~24 productive hours/day from minute zero. + +## The wall-clock timeline + +Same eight work packages, plotted against an 8-hour wall clock. Times in (orchestrator-local) hours from start: + +``` +time α (printer) β (walk) γ (glue) δ (dir) ε (profile chain) +──── ───────────── ─────────── ────────── ───────── ────────────────── +T+0h read sketch read sketch read sketch read sketch WP-1 begin +T+0h scope-fenced workspaces handed out by orchestrator ω +T+0.5h plan + tree plan + tree plan + tree plan + tree WP-1 done → CI +T+1h code code code code WP-2 begin → CI +T+1.5h " " " test WP-3 begin → CI +T+2h " " test verify ✓ WP-4 begin → CI +T+2.5h " test verify ✓ merge ω all profile WPs merged +T+3h " verify ✓ merge ω +T+3.5h " merge ω ─ +T+4h test ─ +T+5h verify ✓ +T+5.5h merge ω +T+6h ─── final-gate verify on unified main ── 7/7 ✓ ── +T+6–8h integration tests (cargo test --workspace), perf benchmarks, + smoke tests against real corpora, ω signs off → PR ready for human +``` + +Total wall clock: **~8 hours** from "orchestrator reads the sketch" to "single PR ready for human-final-approval." + +Compared to the rebuild sketch's serial-human estimate of ~8 working days, that's **roughly a 10× wall-clock compression**. Not a record because no benchmark exists yet to break; the framing the audit + rebuild + this post together stake out is *that the benchmark can now be defined at all*. v2 + the verifier + the mechanical-merge-gate is the missing primitive that lets "speed of parallel agent refactor" be measured as a property of the codebase rather than as folklore about the agents. + +## Why this is a SAMA-enabled property specifically + +A claim worth being careful about: this projection isn't about AI agents being fast. Agents have been "fast" for years. The cap was always **integration**. The four properties SAMA v2 enforces that make parallel decomposition mergeable: + +1. **Atomic** (700-LOC cap) → work-package scope is bounded. WP-5's printer-split fits in a single agent's working context. So does WP-6's walk-split. So does each output file produced. The agent does not need to see "the whole printer crate" to do its job; it needs `standard.rs` plus the public surface of the rest of the crate. + +2. **Architecture** (layer mapping) → work-package boundaries align with merge boundaries. WP-5 lives entirely in Layer 1 sublayer "algorithm." WP-6 lives entirely in Layer 2. They literally cannot touch each other's layer files because the profile says so and the verifier rejects PRs that do. + +3. **Sorted** (under v2.1's directory dialect) → the dependency direction is publicly readable. The orchestrator can compute "WP-6 depends on Layer 0 + Layer 1 results" by reading the profile and the crate graph; it does not have to ask an agent or guess. + +4. **Modeled** (sibling tests; under v2.1's inline-tests mode) → each agent ships its own tests with each split file. WP-5's six produced files each contain `#[cfg(test)] mod tests` blocks; the verifier checks they exist; `cargo test --package printer` checks they pass. No central "integration test team" bottleneck — the test is the agent's responsibility, located in the same file as the code, gated mechanically. + +Without those four properties, every multi-agent refactor attempt I've seen run aground in the same way: agents start with disjoint scopes but converge in the merge phase because nothing structural was keeping them disjoint. The merge becomes a fifth task at least as expensive as any of the four refactor tasks. SAMA v2 is the architectural standard that says: **the scope each agent saw is the scope each agent's merge gate enforces**. + +## What this post is and is not + +**Is**: a careful projection from the rebuild sketch's serial-human estimates to a parallel-agent decomposition, with the work-package boundaries derived from the actual file-and-layer structure of ripgrep + the v2.1 dialects. + +**Is not**: a measured benchmark. No fleet has actually executed this manifest against ripgrep. The 8-hour number is the rebuild post's 8-working-day number divided by sane parallelism + some buffer for verifier roundtrips and Rust compile times. The numbers in the timeline are projection-grade, not measurement-grade. + +**The §6 hook** that makes the projection eventually testable: §5 of the v2 spec already says *"compliance proves the rules were followed; the delta is what proves the rules were worth following."* This post identifies *a new delta v2 can take credit for that no other architectural standard can*: parallel-refactor wall-clock. The cost of a refactor under v2-management is a separate, falsifiable empirical property — one that doesn't even exist as a measurable quantity in arbitrary codebases, because in arbitrary codebases parallel refactors don't merge cleanly. + +If §6 promotes the three v2.1 dialects, a follow-up experiment writes itself: + +1. Fork a v2-conforming open-source repo (this site, eventually ripgrep, eventually dive). +2. Generate a manifest like the one above. +3. Run a real fleet under a real orchestrator. +4. Measure: wall-clock to verifier-green merged main, number of agent-attempts per WP, number of orchestrator escalations, post-merge defect rate. +5. Compare against the same refactor done by a single agent serially, against the same refactor done by a single human serially, and (the cross-spec comparison the whole §5 + §6 program is for) against the same refactor attempted on a non-v2 codebase. + +The interesting comparison is not "how fast was the agent fleet" — it's the *fourth* row, the non-v2 attempt, which is the one we expect to never finish because the work packages won't stay disjoint. + +That's the experiment SAMA v2's empirical program is laying cable for, three blog posts at a time. + +## Three projections, three datapoints, one bracket + +The series so far on this Rust example: + +| post | scope | wall-clock estimate | confidence | +|---|---|---|---| +| [the audit](/blog/sama-v2-rust-project-ripgrep) | score ripgrep as-is against §4 | n/a (read-only) | empirical (the source was read) | +| [the rebuild](/blog/sama-v2-rust-project-ripgrep-rebuilt) | serial-human refactor to 7/7 ✓ | ~8 working days | informed estimate from concrete file deltas | +| this post | parallel-fleet refactor to 7/7 ✓ | ~8 wall-clock hours | projection, ~10× compression from above | + +Each post tightens a different lever: +- The audit tells you *where* the codebase sits today. +- The rebuild tells you *what* it costs to get to compliant. +- The fleet projection tells you *how that cost decomposes when the merge gate is mechanical*. + +None of the three is a measured "v2 is worth following" claim by itself. Together they are the empirical chain §5 + §6 are pointing at: define the metrics, show what changes when the rules are followed, project what becomes mechanically possible when the verifier is the merge gate, and then — eventually — run the experiment that converts each projection into a measurement. + +--- + +**Companion posts:** + +- [The ripgrep audit](/blog/sama-v2-rust-project-ripgrep) — the source of the work-package list +- [The ripgrep rebuild sketch](/blog/sama-v2-rust-project-ripgrep-rebuilt) — the serial-human cost estimate this post divides by parallelism +- [The dive rebuild](/blog/sama-v2-go-project-dive-rebuilt) — equivalent decomposition on a Go codebase, ~10 days serial; parallel-fleet projection would land similar 10× compression +- [The §5 metrics emitter](/blog/sama-v2-metrics-emitter) — the empirical apparatus the §6 experiment plugs into +- [The v2 spec](/sama/v2) — particularly §4 (Atomic, Architecture, Sorted, Modeled) and §6 (evolution policy) diff --git a/content/blog/sama-v2-rust-project-ripgrep-rebuilt.md b/content/blog/sama-v2-rust-project-ripgrep-rebuilt.md index 33809f4dc52d2a9c60f7b23c09e338c8fbac79e2..f416d9c5fa0f75241acf99164cfc933102cf0026 100644 --- a/content/blog/sama-v2-rust-project-ripgrep-rebuilt.md +++ b/content/blog/sama-v2-rust-project-ripgrep-rebuilt.md @@ -391,6 +391,8 @@ No new tests need to be written — `tests = "inline"` recognises the 38 source For context, the [WordPress plugin parallel-architecture rebuild](/blog/sama-v2-wordpress-plugin-rebuilt) required splitting a 1,554-line public god-class into eleven files, redesigning the settings option as a typed value, and writing 20+ test files from scratch. Months of work, real risk of breaking the PRO add-on, WooCommerce, Yoast, and AIOSEO integrations. `dive` to 7/7 was ten working days of test writing plus one package split. `ripgrep` to 7/7 is one focused week of file splitting plus a TOML file. +(One focused week is the *serial-human* number. For the parallel-fleet projection that divides this estimate by SAMA-mechanical work-package boundaries and lands on ~8 wall-clock hours, see the [companion post](/blog/sama-v2-rust-project-ripgrep-parallel-fleet).) + ## Predicted §5 metrics for the rebuilt ripgrep | metric | ripgrep today (estimated) | ripgrep rebuilt (predicted) | dive rebuilt | tdd.md (measured) | @@ -425,6 +427,7 @@ Four observations: **Companion posts:** +- **[The same rebuild, run by a fleet of AI agents in parallel](/blog/sama-v2-rust-project-ripgrep-parallel-fleet)** — projecting this post's ~8-working-day serial estimate into ~8 wall-clock hours under SAMA-mechanical merge gates - [Today's `ripgrep` audit](/blog/sama-v2-rust-project-ripgrep) — where the 3/7-strict, 5/7-with-dialects score comes from, and the three findings this rebuild assumes get adopted - [The `dive` rebuild](/blog/sama-v2-go-project-dive-rebuilt) — same exercise on a Go codebase, the directory-dialect's first appearance - [The `dive` prefix-scheme variant](/blog/sama-v2-go-project-dive-prefix-scheme) — what the dramatic file-rename refactor costs in Go (and would cost even more in Rust) diff --git a/src/a31_blog.ts b/src/a31_blog.ts index 3e99aa871c3e656bf786c33733428e1f127b7b30..3614fc2d52734edb50616e44f24a2ca0b601075f 100644 --- a/src/a31_blog.ts +++ b/src/a31_blog.ts @@ -12,6 +12,12 @@ export interface BlogEntry { } export const ALL_POSTS: BlogEntry[] = [ + { + slug: "sama-v2-rust-project-ripgrep-parallel-fleet", + title: "The same `ripgrep` rebuild, run by a fleet of AI agents in parallel across the planet — a projection", + description: "Yesterday's ripgrep rebuild sketch estimated ~1 focused working week (~8 working days) for one careful human. This post is the companion projection Bas asked: what does the same refactor look like if executed by a fleet of AI agents running in parallel across the planet, under strict SAMA v2 management? Honest answer: wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, ~10× compression. The load-bearing reason isn't 'AI is fast' — agents have been fast for years. The cap was always integration: pre-SAMA, every multi-agent refactor experiment grounded on the same three failure modes (scope creep, style drift, no mechanical merge gate). SAMA v2 dissolves all three because every architectural rule is also a merge gate. Decomposes the rebuild into 8 work packages — 4 profile-only WPs (serial chain, ~2h) + 4 code-split WPs (parallel, ~6h) — and demonstrates that the file-prefix + directory-layered crate graph give you a property no other architecture standard does: work-package boundaries are physically non-overlapping, so the orchestrator can scope-fence each agent's workspace with forbidden_paths and the agents literally cannot reach across boundaries. Plausible fleet composition with capability-matched task assignment (Opus on the hardest split, Haiku on TOML patches), 24-hour wall-clock coverage by spreading across time zones, mechanical verifier as merge gate so mixed-model output is fine. Timeline diagram showing T+0 through T+8h, then the section that matters most: why this is a SAMA-enabled property specifically — Atomic bounds working-context, Architecture aligns work-package boundaries with merge boundaries, Sorted makes dependency direction publicly readable to the orchestrator, Modeled makes tests the agent's local responsibility rather than a central bottleneck. Careful about framing: this is a projection from concrete file deltas + sane parallelism, not a measurement. The §6 experiment that would convert this into measurement is sketched at the end — fork a v2-conforming repo, generate the manifest, run the fleet, measure wall-clock vs serial vs serial-human vs the same refactor attempted on a non-v2 codebase. The interesting comparison is the fourth row, the non-v2 attempt — the one we expect to never finish because the work packages won't stay disjoint. That's the SAMA-specific empirical claim this post lays cable for.", + date: "2026-05-26", + }, { slug: "sama-v2-rust-project-ripgrep-rebuilt", title: "`ripgrep`, rebuilt under SAMA v2 — a thought experiment",