SAMA v2 — Core Specification

Status: Draft for v2.0. This document defines the frozen core and the profile mechanism. The core is normative and stable; profiles are the only extension point. Anything a CI job cannot check deterministically does not belong in this spec — it belongs in AGENTS.md.

#0. Design contract

SAMA separates the law (how layers may depend on each other — frozen, language-neutral, identical in every repo) from the vocabulary (which named sublayers a given domain uses — supplied by a profile).

Two SAMA repositories in different languages and different profiles remain comparable at the core level. That comparability is what makes cross-repo empirical measurement possible: every repo, regardless of language or profile, emits the same core metrics.

A conformant verifier is a deterministic program. No LLM judgment sits in the enforcement loop. An agent may use SAMA to decide where a file goes; it may never be the referee that decides whether a file conforms.

#1. The frozen core

#1.1 The four canonical layers

Every file in a SAMA repository belongs to exactly one of four layers. This set is frozen: no profile, repo, or version may add, remove, renumber, or rename a canonical layer.

Layer	Name	Contains	May import
0	Pure	Types, constants, pure functions, domain models. No I/O, no side effects.	nothing above 0 (i.e. only other Layer 0)
1	Core	Domain logic and decisions. No network, disk, clock, or framework.	0
2	Adapter	The boundary. External input is parsed here and only here (never cast). DB, network, filesystem, framework bindings.	0, 1
3	Entry	Outermost shell: `main`, CLI handler, HTTP route, UI mount, job entry.	0, 1, 2

Layer 0 depends on nothing. Layer 3 is depended on by nothing.

#1.2 The Law (frozen, one sentence)

Imports always point to a strictly lower layer number — never upward, never sideways across a higher number, never cyclic.

Formally, for any import edge A → B: layer(B) < layer(A), OR A and B are in the same layer and the active profile declares a sublayer ordering that permits A → B (see §2.2). The whole-program import graph must be acyclic.

This single law is what makes the Sorted pillar enforceable: the layer number is the lexicographic sort key, so file order is dependency direction.

#1.3 Why exactly four

Four rings is the minimal set that captures the only relation that matters for context rot: what is allowed to depend on what. Fewer cannot express the parse-at-the-boundary rule (it needs a distinct Adapter layer). More reintroduce ambiguity about where a file belongs — which is the drift SAMA exists to kill. This is the "as simple as possible, but not simpler" line.

#2. Profiles

A profile is the only extension mechanism. It does exactly one thing:

A profile MAY subdivide a canonical layer into named, ordered sublayers. A profile MUST NOT introduce, remove, reverse, or otherwise alter any dependency relation between the four canonical layers.

If a proposed profile rule cannot be expressed as "subdivide layer N into ordered sublayers," it is not a profile rule. It is either core (and frozen) or out of scope (and belongs in AGENTS.md).

#2.1 What a profile may and may not do

Allowed	Forbidden
Split Layer 2 into `repository → gateway → controller`	Let Layer 1 import Layer 2
Leave a canonical layer empty (e.g. CLI has no DB)	Add a fifth canonical layer
Define intra-layer sublayer ordering	Make Layer 0 perform I/O
Map sublayer names to filename prefixes	Reverse the import direction between core layers

#2.2 Sublayer ordering

Within a layer, sublayers are totally ordered. An import between two files in the same canonical layer is legal only if it points to an equal-or-lower sublayer in the profile's declared order. Cross-layer imports are governed solely by §1.2 and ignore sublayer order.

#2.3 Profile declaration format

A profile is a single machine-readable file the verifier ingests (sama.profile.toml). Example — an HTTP service:

sama_version = "2.0"
profile = "http-service"

# Map each canonical layer to ordered sublayers and their filename prefixes.
# Order in the array = dependency order (later may import earlier, never reverse).

[layers.0] # Pure — not subdivided
prefixes = ["p0_"]

[layers.1] # Core
sublayers = [
  { name = "policy",  prefix = "c1a_" },
  { name = "service", prefix = "c1b_" },  # service may import policy
]

[layers.2] # Adapter
sublayers = [
  { name = "repository", prefix = "a2a_" },
  { name = "gateway",    prefix = "a2b_" },
  { name = "controller", prefix = "a2c_" },  # controller → gateway → repository
]

[layers.3] # Entry
prefixes = ["e3_"]

A cli profile would leave [layers.2] minimal and subdivide [layers.3] into arg-parser → dispatch. A frontend profile would subdivide Layer 1 into store vs view-logic. Same law, different dialect.

→ Worked examples: a CRUD HTTP service under v2 (TypeScript) · a WordPress plugin under v2 (PHP) — both ship full profiles, directory trees, per-layer code signatures, and the common mistakes each §4 check catches.

#3. Layer assignment & the consistency check

The hard step is not checking the law — it is knowing each file's layer. SAMA uses prefix as the source of truth, with a consistency check against actual imports.

Declared layer = the canonical layer implied by the file's prefix (per the active profile's prefix map).
Observed layer ceiling = the highest layer any of the file's imports resolves to.
Consistency rule: the verifier FAILS if a file imports from a layer that its declared layer is not permitted to import — i.e. if the prefix claims something the imports contradict.

This gives a deterministic gate and protection against a misdeclared (or dishonest) prefix. A file cannot launder a forbidden dependency by lying about its layer: the import graph exposes it.

#4. Conformance — the binary gate

A repository conforms to SAMA v2 if and only if all of the following pass. Each is a deterministic check; the result is binary.

Sorted — every file carries a profile-recognized prefix; lexicographic prefix order equals layer order.
Architecture — every file maps to exactly one canonical layer via §2.3; no file is unprefixed or maps to two layers.
Modeled (tests) — every Layer 1 and Layer 2 behavior file has a sibling test file.
Modeled (boundary) — external input is parsed only in Layer 2. (Verifier support is profile-dependent; see §6.)
Atomic — no file exceeds the line cap (default ~700; profile may lower, never raise). No barrel re-export files.
The Law — the import graph is acyclic and every edge satisfies §1.2.
Consistency — no file's imports contradict its declared layer (§3).

If any check fails, the repository does not conform. There is no partial pass, no score-to-taste. (Profiles and measurement are graded; conformance is binary.)

#5. Core metrics (the SAMA-independent outcome)

Every conformant repo emits these, identically, regardless of language or profile. These are the variables for A/B measurement (SAMA on vs off) — and crucially, none of them is a compliance score. They measure properties an agent's task performance should correlate with:

Graph depth — longest path in the import DAG.
Fan-in / fan-out distribution per layer.
Boundary ratio — share of external-input parsing that occurs in Layer 2.
Working-set fit — share of files within the editor LOC sweet spot.
Violation count over time — emitted even on conforming repos as a trailing signal (which rules agents almost break).

Report the delta between SAMA-on and SAMA-off runs on these metrics — not the compliance rate. Compliance proves the rules were followed; the delta is what proves the rules were worth following.

#5 (operational) — Core metrics definitions

This subsection pins how the §5 metrics are computed by the verifier at /sama/v2/verify. The values are functions of (sama.profile.toml, src/**.ts) alone: same source tree + same profile → identical numbers across runs.

graphDepth = length of the longest path in the import DAG. Nodes are SAMA source files (src/*.ts non-test, matching a profile prefix); edges are static relative-path imports (from "./...ts") between them. A file with no imports has depth 1. Empty graph = 0. Cycles (which the Law check would flag separately) are bounded so the metric still terminates.
fanByLayer = for each canonical layer L ∈ {0,1,2,3}, two distribution summaries: fanIn (count of edges arriving at files in L) and fanOut (count of edges leaving files in L). Each summary reports {mean, p50, p95, max} (nearest-rank percentile) over the files in L. Empty layers report all-zero summaries.
boundaryRatio = (parse-boundary call sites in Layer 2 files) ÷ (parse-boundary call sites anywhere in the source tree). The set of "parse-boundary call sites" is defined by the shared detector that also powers the §4.4 Modeled-boundary check — currently JSON.parse(...) and new URL(...) outside string literals and comments. Both consumers share the helper in src/a31_sama_v2.ts, so they cannot diverge. When no parse boundaries exist anywhere, boundaryRatio = 1.0 (vacuously satisfied).
workingSetFit = (count of source files with WORKING_SET_MIN_LOC ≤ LOC ≤ WORKING_SET_MAX_LOC) ÷ (total source files). The bounds are intentional defaults documented before the numbers, not retrofitted to flatter this repo:
- Upper 500 — comfortably below the §4.5 Atomic 700-LOC cap, leaving headroom before a file approaches "split soon" territory.
- Lower 50 — below this, a file is too small to be a substantive module; it is usually a type-only file, a stub, or a single helper that would read better inlined into a sibling. Type-only files (Layer 0 model shards) and minimal test fixtures fall here by design. They are acceptable but counted as "not in the working-set sweet spot" because they are not load-bearing modules.
Bounds are hard-coded constants WORKING_SET_MIN_LOC = 50 and WORKING_SET_MAX_LOC = 500 in src/a31_sama_v2.ts for v1 of the metrics emitter. Making them profile-configurable is a deliberate later step (requires extending the TOML subset parser to handle integer values).
violationCounts = a record keyed by the seven §4 checks (sorted, architecture, modeledTests, modeledBoundary, atomic, law, consistency), each holding the integer count of violations that check produced on this run. Reported even when a check passes (value = 0) — this is §5's "trailing signal: which rules agents almost break." The verifier enumerates all violations per check (no short-circuit on first failure within a check), so the count is meaningful — not "1 if failed, 0 if passed".

#Worked example — boundaryRatio for this repo (hand-traced)

The §0 contract ("deterministic program; no LLM judgment") is auditable only if the metric output matches a hand trace. Walking boundaryRatio for this repo's src/ against the live verifier:

A raw grep across non-test src/*.ts finds seven hits matching JSON.parse( and four hits matching new URL(. The shared detector strips comments and string literals first, which removes the explanatory mentions inside // ... lines and inside docstring literals. After stripping, the surviving real call sites are:

call site	layer (prefix → L)
`src/c13_database.ts:133` `JSON.parse(row.verdict_json)`	`c13_` → L2
`src/c13_database.ts:159` `JSON.parse(r.tracked_branches)`	`c13_` → L2
`src/c13_database.ts:273` `JSON.parse(r.doc_json)`	`c13_` → L2
`src/c13_database.ts:373` `JSON.parse(r.verdict_json)`	`c13_` → L2
`src/c14_request_parse.ts:28` `JSON.parse(text)`	`c14_` → L2
`src/c14_request_parse.ts:20` `new URL(text)`	`c14_` → L2
`src/c14_client_bundle.ts:72` `new URL(import.meta.url)`	`c14_` → L2

Total: 7 parse-boundary call sites; all 7 fall under prefixes the profile maps to Layer 2.

boundaryRatio = 7 / 7 = 1.0 = 100.0% — which is exactly what /sama/v2/verify reports under §5 Core metrics. The hand count and the verifier's count match by construction: both consume findParseBoundaryCallSites in src/a31_sama_v2.ts, and the Modeled-boundary check (#4) uses the same source of truth — so it cannot diverge.

#6. Evolution policy (how the standard stays alive without rotting)

The core (§1) is frozen. Changing the four layers or the Law requires a major version and an extraordinarily high evidentiary bar: cross-repo data showing the current core measurably harms agent performance.
Profiles are the moving edge. A new profile is a falsifiable hypothesis: "this sublayer split lowers context cost for this domain." It is admitted provisionally, measured against §5, and promoted to "official" only if the delta holds across multiple repos.
A rule agents structurally violate is a signal — to be triaged, not auto-relaxed. Either the rule is right and the agent must improve (signal to agent-builders), or the rule is impractical and the profile adapts (never the core). The feedback loop tunes profiles; it does not erode the law.

#6.A v2.1 dialects (provisional)

Three falsifiable extensions are admitted under §6 as v2.1-draft dialects. Each was surfaced by a real-world audit that found the v2.0 surface syntax mismatched a target language's idiom. Each is opt-in per profile, defaults to v2.0 behaviour when its flag is absent, and preserves — by different surface syntax — the architectural property the original rule was protecting. Per the bullet above, promotion to "official" requires cross-repo §5 metric data showing the dialect catches the same class of drift the unrelaxed rule did.

The conformant verifier MUST parse the three dialect flags (layout, tests, atomic_exemption) as optional top-level profile fields and MUST reject unknown values with a clear error. A verifier MAY refuse to activate a dialect's relaxed semantics (i.e. continue applying the v2.0 rule even when a dialect is declared) — dialect activation is a separate, later promotion event that requires §5 cross-repo evidence. The flags themselves are tolerated today so opt-in profiles for non-TS/PHP languages do not get rejected as malformed.

#6.1 Directory-layout dialect

Profile syntax. Top-level optional flag:

sama_version = "2.1"
profile = "..."
layout = "directory"   # default when absent: "prefix" (v2.0 behaviour)

What v2.0 rule it relaxes. §4.1 Sorted — "every file carries a profile-recognized prefix; lexicographic prefix order equals layer order."

The architectural property the original rule was protecting. The dependency direction of the codebase is publicly readable from the file system without running any analysis: a reviewer's ls src/ | sort reads top-to-bottom in dependency order, and a layer change is visible in a git diff without any tool support.

How the dialect preserves that property. Under layout = "directory", the Sorted check verifies that the profile declares packages or crate directories in layer order, and that the language's compile-time dependency check (Go's internal/ semantics, Rust's Cargo crate graph, etc.) plus the absence of upward edges in the import graph confirms the lex-order of declared package paths matches actual import direction. The reviewer's analogue of ls src/ | sort becomes cat sama.profile.toml | grep packages = — still mechanical, still ahead of the build. The property is the same; the surface syntax shifts from per-file prefix to per-directory declaration.

Falsifiable cross-repo experiment. Run the §5 metrics emitter on a corpus of agent-authored Go and Rust commits, half against layout = "prefix" (with a synthetic prefix renaming) and half against layout = "directory" (against the natural package layout). The dialect is invalidated if the directory mode systematically reports a different violation set on Sorted than the prefix mode does on the same logical defects. Originally surfaced in the dive audit and dive rebuild sketch; confirmed independently by the ripgrep audit.

#6.2 Inline-tests dialect

Profile syntax. Top-level optional flag:

sama_version = "2.1"
profile = "..."
tests = "inline"       # default when absent: "sibling" (v2.0 behaviour)

What v2.0 rule it relaxes. §4.3 Modeled (tests) — "every Layer 1 and Layer 2 behavior file has a sibling test file." The v2.0 rule was written assuming Jest/PHPUnit-style sibling test files (foo.ts + foo.test.ts).

The architectural property the original rule was protecting. Every behavioural source unit has an attached test, mechanically discoverable by the verifier — the test is not centralised in a separate tests/ tree that may drift out of sync with the source.

How the dialect preserves that property. Under tests = "inline", the Modeled-tests check scans each Layer 1 / Layer 2 source file for in-file test attachments (#[cfg(test)] mod tests { #[test] fn ... } in Rust; equivalent annotations in other languages whose convention is inline tests). A behavioural file with no inline #[test] block fails the check exactly as a file with no sibling *.test.ts would under v2.0. Where the test attaches changes (same file vs sibling file); that every behavioural unit has an attached test does not.

Falsifiable cross-repo experiment. Audit a Rust corpus (e.g. the popular CLI tools bat, fd, ripgrep, eza) under both tests = "sibling" (which the convention does not produce) and tests = "inline" (which it does). The dialect is invalidated if inline-mode systematically classifies files as tested that sibling-mode-on-a-renamed-corpus would not — i.e. if the surface-syntax change quietly admits genuinely untested files. Originally surfaced in the ripgrep audit and ripgrep rebuild sketch.

#6.3 Declarative-exemption dialect

Profile syntax. Top-level optional flag:

sama_version = "2.1"
profile = "..."
atomic_exemption = "declarative"   # default when absent: "none" (v2.0 behaviour)

What v2.0 rule it relaxes. §4.5 Atomic — "no file exceeds the line cap (default ~700; profile may lower, never raise)."

The architectural property the original rule was protecting. Working-set fit — every load-bearing source file fits inside the agent's editor context with headroom. A file at 700+ LOC forces the agent to load it incrementally or summarise; either response is a drift surface.

How the dialect preserves that property. Under atomic_exemption = "declarative", the Atomic check exempts from the LOC cap files whose content is overwhelmingly declarative — a file is declarative if it crosses the cap and its cyclomatic complexity per LOC drops below 0.05 and its body is predominantly impl X for Y / const FOO: T = ... / pub struct ... items (or the language's equivalent). The intuition: a flag-definition catalog or a static type-table is structurally large but does not require holistic loading by an agent — the agent indexes into it by name, not by reading it linearly. The working-set property is preserved for the files that would harm an agent's context (behavioural complexity) and selectively waived for files where the cap was a false positive (declarative shape). The 7,779-line crates/core/flags/defs.rs in ripgrep is the textbook case: 150 flag definitions, each a small struct + small impl, CC/LOC ≈ 0.01.

Falsifiable cross-repo experiment. Across a multi-language corpus of agent-edit failures (cases where an LLM produced a regression while editing a single file), compute the share that fall in declarative-exempt files vs in over-cap behavioural files. The dialect is invalidated if declarative-exempt files correlate with edit failures at the same or higher rate than over-cap behavioural files do — i.e. if the heuristic exempts files the agent actually struggles with. Originally surfaced in the ripgrep audit and ripgrep rebuild sketch.

#Appendix A — Mapping to the four pillars

Pillar	Where it lives in v2
S — Sorted	§1.2 Law + §4.1; prefix = layer = sort key
A — Architecture	§1.1 four layers + §2 profiles (the fix for the weak A)
M — Modeled	§4.3 sibling tests + §4.4 boundary parsing (Layer 2)
A — Atomic	§4.5 line cap + no barrels