syntaxai/tdd.md · main · content / blog / sama-v2-rust-project-ripgrep-parallel-fleet.md

sama-v2-rust-project-ripgrep-parallel-fleet.md 201 lines · 17360 bytes raw · source

The same ripgrep rebuild, run by a fleet of AI agents in parallel across the planet — a projection

Yesterday's ripgrep rebuild sketch estimated ~1 focused working week for one careful human to land the changes: write the sama.profile.toml, split printer/standard.rs into a six-file submodule, split ignore/walk.rs into a four-file submodule. Plus a few more days for the deferred splits. Eight working days total, end to end.

That's the serial-human estimate. This post is the companion question Bas asked: what does the same refactor look like if it's executed by a fleet of AI agents running in parallel, spread across the planet, under strict SAMA v2 management?

The honest answer: the wall-clock projection collapses from ~8 working days to ~8 wall-clock hours, and the load-bearing reason isn't "AI is fast" — it's that SAMA v2 turns each work-package boundary into a mergeable boundary, which is the part that has historically broken every "let multiple agents refactor in parallel" experiment.

This post is a projection, not a measurement. The number it lands on is conservative, the reasoning is the part to read carefully.

Why parallel agent refactor has been a non-starter — until now

Pre-SAMA, the cap on multi-agent parallelism isn't agent speed. It's integration. Two agents asked to refactor the same module produce two divergent designs; even when each is internally consistent, merging them is a hand-rolled meta-task that exceeds the cost of having one agent do the work serially. So the field has mostly run one agent at a time, or sharded work along brittle file boundaries that agents end up violating anyway.

The three things that actually break parallel agent work:

  1. Scope creep. Agent A asked to "split walk.rs" decides the parallel walker also needs a refactor of dir.rs and the gitignore parser, because they're "obviously coupled." Now agent C (assigned to dir.rs) has a merge conflict and zero context on agent A's design.
  2. Style drift. Two agents producing 600-line files in the same crate format types differently, name variables differently, structure imports differently, and either the maintainer hand-polishes both back to one style (defeats the parallelism) or accepts visible inconsistency in the diff (defeats the maintainability that motivated the refactor).
  3. No mechanical merge gate. When agent B's PR lands, something has to check that B's design still composes with A's design — and absent a verifier, that check is taste, which is exactly the resource you ran out of when you decided to spawn a fleet.

SAMA v2 dissolves all three, in the same way and for the same reason: every architectural rule is also a merge gate. Agents don't have to agree on style as long as they each pass the verifier. The verifier is the same TypeScript code for every agent, mechanically applied, returning a binary verdict.

The work-package decomposition of the ripgrep rebuild

The previous post listed the mandatory + deferred changes as a single human-serial table. The same table, re-cut as eight parallel work packages, each scoped to a single branch and a single mergeable verifier-passing PR:

WP# scope files touched LOC delta parallel-safe?
WP-1 Write sama.profile.toml declaring layout=directory + tests=inline + atomic_exemption=declarative sama.profile.toml (new) +18 yes — no other WP reads this until merge
WP-2 Add profile note: serde derives ≠ boundary parsing sama.profile.toml (1 line) +1 yes, but WP-1 must merge first
WP-3 Add profile note: searcher/line_buffer.rs byte parsing is algorithm sama.profile.toml (1 line) +1 yes, but WP-1 must merge first
WP-4 Mark four declarative-exempt files sama.profile.toml (4 lines) +4 yes, but WP-1 must merge first
WP-5 Split crates/printer/src/standard.rs (3,987 LOC) → 6-file submodule only crates/printer/src/standard* ±3,987 yes — no other WP touches printer/
WP-6 Split crates/ignore/src/walk.rs (2,494 LOC) → 4-file submodule only crates/ignore/src/walk* ±2,494 yes — no other WP touches ignore/walk
WP-7 Split crates/searcher/src/searcher/glue.rs (1,549 LOC) → 3-file submodule only crates/searcher/src/searcher/glue* ±1,549 yes — no other WP touches that path
WP-8 Split crates/ignore/src/dir.rs (1,305 LOC) → 2-file submodule only crates/ignore/src/dir* ±1,305 yes — does not overlap WP-6's walk* files

Every work package is scoped to a path prefix that no other work package touches. That is not coincidence — it is the property the Atomic rule and the directory-layered crate graph give you for free. Once a refactor decomposes along verifier-recognized boundaries, the decomposition is also a non-overlapping merge plan.

The four profile-only WPs (1–4) serialize behind WP-1 because they all edit the same one-file. But each is a one-line patch; together they merge in under a minute. The four code-split WPs (5–8) parallelize fully — they touch four disjoint directory subtrees.

The fleet manifest

A hypothetical SAMA-aware orchestrator (let's call it sama-fleet) reads the profile, the audit's findings, and the rebuild sketch, then emits a manifest:

# sama-fleet.manifest.yaml — generated from the audit + rebuild sketch
target: BurntSushi/ripgrep@main
spec_version: sama_v2.1
deadline_hours: 8

work_packages:
  - id: WP-1
    title: "Write sama.profile.toml"
    base_branch: main
    output_branch: sama-WP-1-profile
    scope_paths: ["sama.profile.toml"]
    estimated_minutes: 30
    merge_gate: [cargo-check, sama-v2-verify]
    blocks: [WP-2, WP-3, WP-4]

  - id: WP-5
    title: "Split crates/printer/src/standard.rs into a six-file submodule"
    base_branch: main
    output_branch: sama-WP-5-printer-split
    scope_paths: ["crates/printer/src/standard*"]
    forbidden_paths: ["crates/printer/src/!(standard*)", "crates/!(printer)/**"]
    estimated_minutes: 360
    merge_gate: [cargo-check, cargo-test --package printer, sama-v2-verify]
    invariants:
      - "every pub item in standard.rs before == pub-reachable from same path after"
      - "printer crate test suite still passes"
      - "atomic cap: every produced .rs file <= 700 LOC OR declarative-exempt"

  - id: WP-6
    title: "Split crates/ignore/src/walk.rs into a four-file submodule"
    base_branch: main
    output_branch: sama-WP-6-walk-split
    scope_paths: ["crates/ignore/src/walk*"]
    forbidden_paths: ["crates/ignore/src/!(walk*)", "crates/!(ignore)/**"]
    estimated_minutes: 240
    merge_gate: [cargo-check, cargo-test --package ignore, sama-v2-verify]

  # ... WP-2 .. WP-4, WP-7, WP-8 follow the same shape ...

merge_policy:
  serial_chain: [WP-1, WP-2, WP-3, WP-4]   # profile patches, trivial
  parallel: [WP-5, WP-6, WP-7, WP-8]       # code-split branches
  final_gate: full-sama-v2-verify          # 7 / 7 on the unified main

scope_paths and forbidden_paths are the load-bearing fields. The orchestrator hands each agent a workspace shadow with the forbidden_paths made read-only at the filesystem level. An agent assigned WP-5 physically cannot edit crates/ignore/ even if its model decides midway through that "the printer split is best done by also refactoring the ignore walker." This is the kinetic version of the §1.2 Law — "agents cannot reach across boundaries" — enforced before any verifier ever runs.

The fleet (a plausible composition)

The point of describing the fleet across the planet isn't location; it's model diversity. SAMA's verifier doesn't care what model produced the diff, only whether the diff conforms. So the fleet can be deliberately heterogeneous:

agent model region (24h coverage) assignment
α Claude Opus 4.7 US-West (San Francisco) WP-5 (printer split — largest, most complex)
β Claude Sonnet 4.6 Asia (Tokyo) WP-6 (ignore/walk split — second largest)
γ Claude Haiku 4.5 Asia (Bangalore) WP-7 (searcher/glue split — medium)
δ Claude Sonnet 4.6 Europe (Berlin) WP-8 (ignore/dir split — smallest code WP)
ε Claude Haiku 4.5 South America (São Paulo) WP-1 → WP-2 → WP-3 → WP-4 (profile + notes, serial chain)
ω Claude Opus 4.7 (orchestrator) dispatches, monitors verifier output, merges in order, escalates ambiguity

Two reasons this composition matters under SAMA v2:

Capability-matched task assignment. Splitting a 3,987-LOC printer file with five output modes is genuinely the hardest WP — give it to the strongest model. Inserting four exemption flags into a TOML is the easiest — give it to the smallest model that can reliably edit YAML. Mixed-capability fleets only work when the merge gate is purely mechanical, because otherwise a maintainer would have to hand-review the small-model patches anyway. SAMA's verifier is purely mechanical, so the mix works.

24-hour wall-clock coverage. Even if some agents take longer than projected, work proceeds around the clock. The orchestrator never sleeps; agents in different regions step in as one finishes. Compared to a single careful human's working-week budget — even a fast human only works ~8 hours/day — the fleet runs ~24 productive hours/day from minute zero.

The wall-clock timeline

Same eight work packages, plotted against an 8-hour wall clock. Times in (orchestrator-local) hours from start:

time   α (printer)    β (walk)     γ (glue)    δ (dir)    ε (profile chain)
────  ─────────────  ───────────  ──────────  ─────────  ──────────────────
T+0h   read sketch    read sketch  read sketch read sketch  WP-1 begin
T+0h         scope-fenced workspaces handed out by orchestrator ω
T+0.5h plan + tree    plan + tree  plan + tree plan + tree  WP-1 done → CI
T+1h   code           code         code        code         WP-2 begin → CI
T+1.5h    "              "            "          test       WP-3 begin → CI
T+2h      "              "          test        verify ✓    WP-4 begin → CI
T+2.5h    "            test         verify ✓    merge ω     all profile WPs merged
T+3h      "            verify ✓     merge ω
T+3.5h    "            merge ω        ─
T+4h   test                ─
T+5h   verify ✓
T+5.5h merge ω
T+6h ─── final-gate verify on unified main ── 7/7 ✓ ──
T+6–8h    integration tests (cargo test --workspace), perf benchmarks,
          smoke tests against real corpora, ω signs off → PR ready for human

Total wall clock: ~8 hours from "orchestrator reads the sketch" to "single PR ready for human-final-approval."

Compared to the rebuild sketch's serial-human estimate of ~8 working days, that's roughly a 10× wall-clock compression. Not a record because no benchmark exists yet to break; the framing the audit + rebuild + this post together stake out is that the benchmark can now be defined at all. v2 + the verifier + the mechanical-merge-gate is the missing primitive that lets "speed of parallel agent refactor" be measured as a property of the codebase rather than as folklore about the agents.

Why this is a SAMA-enabled property specifically

A claim worth being careful about: this projection isn't about AI agents being fast. Agents have been "fast" for years. The cap was always integration. The four properties SAMA v2 enforces that make parallel decomposition mergeable:

  1. Atomic (700-LOC cap) → work-package scope is bounded. WP-5's printer-split fits in a single agent's working context. So does WP-6's walk-split. So does each output file produced. The agent does not need to see "the whole printer crate" to do its job; it needs standard.rs plus the public surface of the rest of the crate.

  2. Architecture (layer mapping) → work-package boundaries align with merge boundaries. WP-5 lives entirely in Layer 1 sublayer "algorithm." WP-6 lives entirely in Layer 2. They literally cannot touch each other's layer files because the profile says so and the verifier rejects PRs that do.

  3. Sorted (under v2.1's directory dialect) → the dependency direction is publicly readable. The orchestrator can compute "WP-6 depends on Layer 0 + Layer 1 results" by reading the profile and the crate graph; it does not have to ask an agent or guess.

  4. Modeled (sibling tests; under v2.1's inline-tests mode) → each agent ships its own tests with each split file. WP-5's six produced files each contain #[cfg(test)] mod tests blocks; the verifier checks they exist; cargo test --package printer checks they pass. No central "integration test team" bottleneck — the test is the agent's responsibility, located in the same file as the code, gated mechanically.

Without those four properties, every multi-agent refactor attempt I've seen run aground in the same way: agents start with disjoint scopes but converge in the merge phase because nothing structural was keeping them disjoint. The merge becomes a fifth task at least as expensive as any of the four refactor tasks. SAMA v2 is the architectural standard that says: the scope each agent saw is the scope each agent's merge gate enforces.

What this post is and is not

Is: a careful projection from the rebuild sketch's serial-human estimates to a parallel-agent decomposition, with the work-package boundaries derived from the actual file-and-layer structure of ripgrep + the v2.1 dialects.

Is not: a measured benchmark. No fleet has actually executed this manifest against ripgrep. The 8-hour number is the rebuild post's 8-working-day number divided by sane parallelism + some buffer for verifier roundtrips and Rust compile times. The numbers in the timeline are projection-grade, not measurement-grade.

The §6 hook that makes the projection eventually testable: §5 of the v2 spec already says "compliance proves the rules were followed; the delta is what proves the rules were worth following." This post identifies a new delta v2 can take credit for that no other architectural standard can: parallel-refactor wall-clock. The cost of a refactor under v2-management is a separate, falsifiable empirical property — one that doesn't even exist as a measurable quantity in arbitrary codebases, because in arbitrary codebases parallel refactors don't merge cleanly.

If §6 promotes the three v2.1 dialects (now drafted formally in /sama/v2 §6.A), a follow-up experiment writes itself:

  1. Fork a v2-conforming open-source repo (this site, eventually ripgrep, eventually dive).
  2. Generate a manifest like the one above.
  3. Run a real fleet under a real orchestrator.
  4. Measure: wall-clock to verifier-green merged main, number of agent-attempts per WP, number of orchestrator escalations, post-merge defect rate.
  5. Compare against the same refactor done by a single agent serially, against the same refactor done by a single human serially, and (the cross-spec comparison the whole §5 + §6 program is for) against the same refactor attempted on a non-v2 codebase.

The interesting comparison is not "how fast was the agent fleet" — it's the fourth row, the non-v2 attempt, which is the one we expect to never finish because the work packages won't stay disjoint.

That's the experiment SAMA v2's empirical program is laying cable for, three blog posts at a time.

Three projections, three datapoints, one bracket

The series so far on this Rust example:

post scope wall-clock estimate confidence
the audit score ripgrep as-is against §4 n/a (read-only) empirical (the source was read)
the rebuild serial-human refactor to 7/7 ✓ ~8 working days informed estimate from concrete file deltas
this post parallel-fleet refactor to 7/7 ✓ ~8 wall-clock hours projection, ~10× compression from above

Each post tightens a different lever:

  • The audit tells you where the codebase sits today.
  • The rebuild tells you what it costs to get to compliant.
  • The fleet projection tells you how that cost decomposes when the merge gate is mechanical.

None of the three is a measured "v2 is worth following" claim by itself. Together they are the empirical chain §5 + §6 are pointing at: define the metrics, show what changes when the rules are followed, project what becomes mechanically possible when the verifier is the merge gate, and then — eventually — run the experiment that converts each projection into a measurement.


Companion posts:

  • The ripgrep audit — the source of the work-package list
  • The ripgrep rebuild sketch — the serial-human cost estimate this post divides by parallelism
  • The dive rebuild — equivalent decomposition on a Go codebase, ~10 days serial; parallel-fleet projection would land similar 10× compression
  • The §5 metrics emitter — the empirical apparatus the §6 experiment plugs into
  • The v2 spec — particularly §4 (Atomic, Architecture, Sorted, Modeled) and §6 (evolution policy)