0eef256f378b0802027782dcd6b3557e878a5609 diff --git a/content/home.md b/content/home.md index d27751cc308f3e69bb401729e3ed095e746fc91c..a9fc088b21985eee7f43227fd76f55162fc372c2 100644 --- a/content/home.md +++ b/content/home.md @@ -6,6 +6,18 @@ SAMA is to agent-written code what Conventional Commits is to git history: a sma **Four pillars. One verifier. Zero ambiguity for your agent.** +## This site is the live dogfood + +The formal specification — frozen core + profile mechanism, written so a deterministic verifier in any language can ingest it — lives at **[/sama/v2](/sama/v2)** (v2.0 draft). The legacy practitioner-facing v1 pages live at [/sama](/sama). + +The verifier at **[/sama/v2/verify](/sama/v2/verify)** runs the seven §4 conformance checks against this very repository's source on every deploy. Right now it reports **7 of 7 ✓ conforming · 91 files examined**. The TypeScript code that implements the verifier is checked by the verifier. The website is the spec is the verifier is the test suite. + +The empirical claim the spec actually makes is not the compliance score. Quoting §5 verbatim: + +> *Compliance proves the rules were followed; the delta is what proves the rules were worth following.* + +The five §5 core metrics — **graphDepth · fanByLayer · boundaryRatio · workingSetFit · violationCounts** — are emitted alongside the verdict ([live, scroll to "Core metrics"](/sama/v2/verify)) so any later claim about SAMA's value can be measured as a delta against today's baseline rather than against itself. + ## The four pillars - **[S — Sorted.](/sama/sorted)** Lexicographic file order equals import direction. The dependency graph is the file tree. @@ -13,8 +25,6 @@ SAMA is to agent-written code what Conventional Commits is to git history: a sma - **[M — Modeled.](/sama/modeled)** Every behavior file has a sibling test. Every external input is parsed at the boundary, never cast. - **[A — Atomic.](/sama/atomic)** Files cap at ~700 lines. Split per domain, never via barrel re-exports. -Read the full discussion at [/sama](/sama) — that page is also where the v1.0 specification lives today. It will move to a standalone, language-neutral repo once the multi-language adapter work is far enough along to make the split useful. - ## SAMA in your agent-coding stack SAMA composes with the tools you already use. Use AGENTS.md to instruct the agent and SAMA to shape the code; use Factory's scorecard for breadth and SAMA for depth on the architectural pillar; run SWE-bench to grade the agent and SAMA to grade what the agent left behind. @@ -44,10 +54,24 @@ LLMs degrade as input context grows. Chroma's [Context Rot research](https://res SAMA bundles those findings into four constraints a CI job can enforce. *Sorted* makes structural retrieval cheap. *Atomic* keeps every file inside the agent's working set. *Modeled* makes every change reviewable by its sibling test. *Architecture* lets the agent answer "where does this go?" without re-deriving the tree each session. +**The load-bearing property isn't that LLMs have small context windows — modern models have 200k+ tokens.** The load-bearing property is **mechanical enforceability**: the verifier fails the build when a file crosses the line cap or an import points the wrong way. Discipline that lives only in code review quietly slips under agent pressure; discipline that lives in a CI gate keeps its shape across an arbitrary number of agent commits. The context-window research above explains the *why*; the verifier explains the *how*. + +## Three datapoints on the same axes + +Empirical baseline so far (the §5 metrics, [computed live](/sama/v2/verify) for this site and hand-traced for the two audits): + +| project | language | §4 score | workingSetFit | boundaryRatio | graphDepth | +|---|---|---|---|---|---| +| **tdd.md** (this site) | TypeScript | **7 / 7 ✓** (measured) | 80% | 100% | 7 | +| [**wagoodman/dive**](/blog/sama-v2-go-project-dive) | Go | ~5 / 7 (estimated) | ~80% | ~85% | ~5 | +| [**Open Graph plugin**](/blog/sama-v2-wordpress-plugin-audit) | PHP / WordPress | 0 / 7 (estimated) | ~47% | <10% | ~3 | + +Three points is not yet a "v2 is worth following" claim. §6 of the spec is explicit that promotion to official requires cross-repo deltas, not a single dogfood. But the same five numbers are now defined, computable, and published — which is the prerequisite the spec sets before any later claim becomes testable. + ## See it in practice - **[Pick a kata →](/games)** — small codebases that get scored against SAMA, with public verdicts per agent run. - **[Leaderboard →](/leaderboard)** — current standings across registered agents. -- **[Blog →](/blog)** — what the runs revealed about Claude Code, Cursor, and Aider. +- **[Blog →](/blog)** — what the runs revealed about Claude Code, Cursor, and Aider, plus the audit-and-rebuild series on a WordPress plugin and a Go project. Agent-specific walkthroughs: [Claude Code](/guides/claude-code) · [Cursor](/guides/cursor) · [Aider](/guides/aider).