e23face46a09fce389ec31c69689d3289b41a69d diff --git a/content/blog/sama-v2-sitemap-implementation-plan.md b/content/blog/sama-v2-sitemap-implementation-plan.md new file mode 100644 index 0000000000000000000000000000000000000000..0ff340a8d263b79c5df02c616220430958b0d75b --- /dev/null +++ b/content/blog/sama-v2-sitemap-implementation-plan.md @@ -0,0 +1,116 @@ +# Building `/sitemap.xml` under SAMA v2 — a Claude Code `/goal` walkthrough + +This site has 23 blog posts, 4 SAMA discipline pages, a `/sama/v2` spec, a verifier, an example library, and a guides section. None of them are listed in a sitemap. Search engines and AI crawlers have to discover everything by random link-walking from the home page — slow, lossy, and especially bad for the long-tail spec pages that few internal links point at. + +This post is the **implementation plan** for `/sitemap.xml`, written *before* the code lands. The plan itself uses two artifacts worth pointing at: + +1. **The `/goal` slash command in Claude Code.** The 38-line spec I'm working from is checked in at [`goal.md`](/GIT/syntaxai/tdd.md/blob/main/goal.md) — that's the exact `/goal` text fed to the agent. It declares *Done when*, *Constraints (anti-fudge)*, and *Load-bearing files to read FIRST*. The agent has to read those files before writing any code, and it has to satisfy every *Done when* clause before declaring done. +2. **SAMA v2 itself.** The feature has to land conformant — `/sama/v2/verify` reports 7/7 ✓ on the live site, and breaking it on the way in is what the verifier catches. Every architectural decision below traces to a §4 check. + +The combination is what this post is really about. The `/goal` is the *what*; SAMA v2 is the *how*; the verifier is the *anti-fudge gate*. Each constrains the other in a way that turns "add a feature" into a mechanical exercise. + +## Why a sitemap, and why now + +Three concrete signals: + +- **`src/a31_blog.ts` already promises it.** The top-of-file comment reads: *"this file is just the registry that drives `/blog`, `/blog/:slug`, **and the sitemap**."* The promise has been there since the registry was written. The sitemap was always meant to exist; it just didn't. +- **Empirical chain visibility.** The cross-repo measurement posts ([dive](/blog/sama-v2-go-project-dive), [ripgrep](/blog/sama-v2-rust-project-ripgrep), [n=7 baseline](/blog/sama-v2-workingset-cross-repo-baseline)) are exactly the kind of long-tail content that needs explicit sitemap entries — they're not in the navigation, they don't get linked from the home page indiscriminately, and they're the load-bearing artifacts for the "is SAMA worth following" argument. +- **AI crawlers index sitemaps preferentially.** If the empirical chain is the thing I want to be found, the sitemap is the first place to put it. + +## The decomposition — three files, one per layer + +The `/goal` text fixes the structure before the agent picks up a keyboard. Mapped onto the [four canonical layers](/sama/v2#11-layers): + +![/sitemap.xml feature decomposition by SAMA layer](/sitemap-layers.png?v=1) + +Read top to bottom — imports flow downward, only: + +- **Layer 3 · Entry** — `d21_app.ts` gains one route, `/sitemap.xml`. Its only job: import `ALL_POSTS` + `ALL_SAMA` + the static list, call the Core helper, return a `Response` with `Content-Type: application/xml; charset=utf-8` and `Cache-Control: public, max-age=3600`. No XML construction in the handler — that's the helper's job, and the layer boundary is the reason the helper exists. +- **Layer 2 · Adapter — empty, by design.** This is the structurally interesting part. Sitemaps for most CMSs are an Adapter problem: hit the database, query the post table, emit XML. Here the "database" is already an in-process TypeScript array (`ALL_POSTS`). No DB, no network, no filesystem read at request time. So Layer 2 stays empty, and the verifier won't complain because the §4.4 modeled-boundary check only fires when a boundary actually exists. +- **Layer 1 · Core** — `b32_sitemap.ts`. Pure logic: takes `Array<{ loc: string; lastmod?: string }>`, returns the well-formed XML string. No I/O. No `Date.now()`. No `process.env`. Deterministic: the same input array always produces the same output bytes. This is what makes the sibling test cheap to write — six cases, all data-in / string-out. +- **Layer 0 · Pure** — already exists. `ALL_POSTS`, `ALL_SAMA`, `ALL_GUIDES`, and the canonical base URL from `a31_site_config.ts`. No new code at this layer; we're consuming what's already there. + +The visual property of the diagram — *Layer 2 stays empty* — is the SAMA-specific reason this feature is so small. In an idiomatic-WordPress shape, a sitemap is a database adapter plus a query plus a renderer plus a cache, four files minimum. In a layer-disciplined shape with the registries already in memory, the adapter dissolves and the feature collapses to one new helper and one new route. + +## The data flow + +Same picture, rotated 90°: + +![sitemap data flow — registries to helper to XML response](/sitemap-flow.png?v=1) + +Three registries fan in to one helper; one helper fans out to one response. The shape of that picture is the answer to the question *"how does a new blog post get into the sitemap?"* — it doesn't get added to anything. It's already in `ALL_POSTS` (where the new post entry was added when the markdown file was written), so the next `/sitemap.xml` fetch after the next deploy already lists it. Zero human edit. That's the **load-bearing automatic property** the `/goal` calls out: + +> The sitemap is NOT committed as a static file — it's generated per-request […]. New blog post → next sitemap fetch already includes it without any human edit. + +This is also the anti-fudge constraint: if I were to commit a static `sitemap.xml`, the verifier wouldn't catch it (sitemaps aren't part of the §4 checks), but the *property the feature claims* would silently rot the first time a post was added without manually editing the file. + +## What the `/goal` slash command actually does + +For readers unfamiliar with Claude Code's `/goal`: it's a slash command that takes a structured task description and hands it to the agent with three mandatory sections: + +``` +Goal: … ← one-paragraph statement of intent +Done when: … ← bullet list of testable post-conditions +Constraints (anti-fudge): … +Load-bearing files to read FIRST: … +``` + +The shape is what makes it useful. *Goal* is the human-readable intent. *Done when* is a checklist the agent will be evaluated against — and "all tests pass" is one bullet among many, not the only one. *Constraints* are the things the agent might be tempted to short-circuit (in this case: "URLs MUST come from existing registries — no second source of truth that can drift"). *Load-bearing files to read FIRST* prevents the agent from inventing structure that already exists — every existing helper, registry, and pattern is one Read tool call away, but only if the agent is told to look. + +The full text of this feature's `/goal` is in [`goal.md`](/GIT/syntaxai/tdd.md/blob/main/goal.md). Cherry-picking the most load-bearing bullets: + +- *"URLs are derived from the registries (no hand-maintained slug list)"* — fixes the source-of-truth violation that would otherwise emerge over time. +- *"Sibling test covers: empty list → valid urlset with no `` children; single URL with lastmod; single URL without lastmod; multiple URLs preserve order; XML-escape any `&` or `<` in URLs"* — fixes the Modeled-tests check (§4.3) before it gets a chance to fail. +- *"`/sama/v2/verify` still reports 7/7 ✓ (anti-fudge)"* — the structural gate. The verifier is the only thing standing between "I shipped a feature" and "I shipped a feature *while preserving the property the site is making claims about*." +- *"Site language English-only."* — captured from a prior correction, surfaced here so the agent doesn't slip back into Dutch in the response body or error text. + +This pattern — `/goal` as the contract, SAMA v2 as the discipline, verifier as the gate — is the agentic-coding workflow this site is shaped around. The `/goal` is verbose by design. Verbose-`/goal` + automatic-`/verify` is much cheaper than terse-prompt + manual-review. + +## Mapping the feature onto §4 + +Each of the seven [§4 conformance checks](/sama/v2#4-conformance) applies. The decomposition above is the answer to each: + +| §4 check | How this feature satisfies it | +|---|---| +| §4.1 Sorted | `b32_sitemap.ts` prefix matches Layer 1, `d21_app.ts` prefix matches Layer 3. `ls src/` reads top-down: a31 (Pure) → b32 (Core) → d21 (Entry). No prefix gap, no out-of-order import. | +| §4.2 Architecture | The number is the layer. Helper at b32 must be pure; handler at d21 may import from anywhere below. The verifier rejects any cross-cluster violation mechanically. | +| §4.3 Modeled-tests | Sibling `b32_sitemap.test.ts` covers all five `Done when` test cases. Without the sibling, §4.3 turns red. | +| §4.4 Modeled-boundary | Boundary parsing lives in the registries (already done at module load) and at the helper's input shape (typed `Array<{loc, lastmod?}>`). The helper doesn't see strings from the wire. | +| §4.5 Atomic | Helper estimated 50–100 LOC; handler closure 15–25 LOC. Both well under the 700-LOC cap. | +| §4.6 The Law (§1.2) | d21 → b32 → a31. Strictly downward. No upward edge — the helper doesn't know what the handler does with its output. | +| §4.7 Consistency | The `b32_` prefix matches what the file actually contains: Layer 1 pure logic. No misnamed file, no logic at the wrong layer. | + +That table isn't decorative. It's what the agent's `Done when` bullet *"`/sama/v2/verify` still reports 7/7 ✓"* expands into when run against the live site. Each row corresponds to one line in [`b32_sama_v2_verify.ts`](/GIT/syntaxai/tdd.md/blob/main/src/b32_sama_v2_verify.ts) that the agent doesn't get to touch. + +## Anti-fudge — the things this plan deliberately does NOT do + +A naive sitemap implementation has several seductive shortcuts. The `/goal` rules them out explicitly: + +- **No second list of URLs anywhere.** The temptation: keep a `SITEMAP_URLS` const next to `b32_sitemap.ts` listing everything. Cost: it drifts the first time someone forgets. Rule: derive from `ALL_POSTS` and `ALL_SAMA`; the only hand-curated list is the small set of static routes that have no registry (`/`, `/blog`, `/games`, etc.) and that's intentional. +- **No string concatenation for XML.** The temptation: `` `${url}` `` is one line. Cost: an `&` or `<` in a URL produces invalid XML. Rule: write a tiny XML-escape helper inside `b32_sitemap.ts` and run every interpolation through it. The existing renderer's HTML-escape is a superset and would work, but a dedicated XML helper is cleaner Layer-1 code — and it's two lines, not a library. +- **No static commit of `sitemap.xml`.** Already covered above. The automatic property is the entire point of the feature. +- **No dynamic/user-specific URLs.** No `/p/:slug` session pages, no `/sama/verify?repo=…` query-string variants, no `/api/*`. Only stable indexable content. +- **No verifier change.** The §4 check logic is frozen. If a structural choice this feature makes would fail the verifier, the choice changes — the verifier doesn't. + +This is what "anti-fudge" means in practice. The verifier is what catches structural drift; the `/goal` constraints are what catch the smaller fudges the verifier doesn't know to look for. Together they leave very little room for shortcuts that would only hurt later. + +## What lands when this ships + +After the next deploy: + +- `GET https://tdd.md/sitemap.xml` returns a sitemaps.org 0.9 document with ~40 URLs (23 posts + 4 SAMA pages + spec/example/skill/verify pages + static routes). +- `GET https://tdd.md/robots.txt` references the sitemap on its last line. +- The next time I write a blog post, it appears in the sitemap automatically — the same way it appears on `/blog` automatically — because the `ALL_POSTS` array is the single source of truth for both. +- `/sama/v2/verify` continues to report **7 ✓ / 7** on the live site. + +The empirical chain — the load-bearing argument that SAMA is worth following — gets one more thing pointing at it: a sitemap entry per measurement post, indexed and rankable. That's a small thing on its own. But the *way* this small thing lands — `/goal` as contract, layer discipline as structure, verifier as gate — is the same way every other feature on this site lands. The point of the post is the workflow, not the URL list. + +## Next, this post turns into a postmortem + +This is a plan. The companion postmortem will follow once the PR is merged and deployed, with three things the plan can't predict: + +- The actual file diff (likely close to what's sketched here, but not always — agent + verifier sometimes surface a cleaner factoring mid-build). +- The verifier output before and after the merge (must read 7/7 ✓ at both timestamps). +- Whatever the `/goal`'s anti-fudge clauses caught that the plan missed. + +If `/sitemap.xml` lands without surfacing anything, that's its own data point: the workflow is mature enough that the boring features stay boring. If it surfaces something — a spec gap, a verifier blind spot, a registry that wasn't where I thought — that's the *next* post. diff --git a/goal.md b/goal.md new file mode 100644 index 0000000000000000000000000000000000000000..440a9a8d54afea17e651f6b522a2712a1e7f6de5 --- /dev/null +++ b/goal.md @@ -0,0 +1,42 @@ +Here's the /goal voor auto-generated sitemap. ~3,700 chars. + + +Goal: Add an automatically-generated /sitemap.xml so search engines and AI crawlers can index the full site without a hand-maintained URL list. The sitemap is generated on demand from the existing registries (ALL_POSTS, ALL_SAMA, the route table, the guides list wherever it lives), so a new blog post or discipline page lands in the sitemap immediately on deploy with zero human edit. Note: src/a31_blog.ts already declares in its top comment that ALL_POSTS "drives /blog, /blog/:slug, and the sitemap" — this goal makes that comment true. + +Done when: +- A new route /sitemap.xml returns 200 with Content-Type "application/xml; charset=utf-8" and a valid sitemaps.org 0.9 document: + + + https://tdd.md/...[YYYY-MM-DD] + ... + +- URLs are derived from the registries (no hand-maintained slug list): + * Every entry in ALL_POSTS → /blog/ with = the post's date field. + * Every entry in ALL_SAMA → /sama/. + * Every guide entry from whichever registry exists (search for ALL_GUIDES, GUIDES, or grep src/d21_app.ts for /guides routes). + * Static load-bearing URLs: /, /blog, /games, /leaderboard, /sama, /sama/v2, /sama/v2/verify, /sama/v2/example-crud, /sama/v2/example-wordpress, /sama/skill, /guides. These can stay as a small const list in the new helper (each one corresponds to a literal route in d21_app.ts). +- All URLs use the absolute base https://tdd.md (use the constant from src/a31_site_config.ts if one exists). +- A new pure Layer 1 helper at src/b32_sitemap.ts takes Array<{ loc: string; lastmod?: string }> → returns the well-formed XML string. No I/O; deterministic output. Sibling test covers: empty list → valid urlset with no children; single URL with lastmod; single URL without lastmod; multiple URLs preserve order; XML-escape any & or < in URLs (rare here but the helper must be safe). +- The handler is a single closure registered in src/d21_app.ts (or split into d21_handlers_sitemap.ts if it grows). Imports ALL_POSTS + ALL_SAMA + the static list, calls the helper, returns the Response with Cache-Control "public, max-age=3600". +- /robots.txt updated to include "Sitemap: https://tdd.md/sitemap.xml" at the end. If it doesn't exist yet, create the minimal: "User-agent: *\nAllow: /\nSitemap: https://tdd.md/sitemap.xml". +- The sitemap is NOT committed as a static file — it's generated per-request (or once at process startup). New blog post → next sitemap fetch already includes it without any human edit. This is the load-bearing "automatic" property. +- All 367+ tests still pass; new helper test adds ~6-8 cases. +- /sama/v2/verify still reports 7/7 ✓ (anti-fudge). +- Deployed; live-verify: curl https://tdd.md/sitemap.xml returns 200 + valid XML; the response includes /blog/sama-v2-workingset-cross-repo-baseline (the most recent post); /robots.txt references the sitemap. + +Constraints (anti-fudge): +- URLs MUST come from existing registries — no second source of truth that can drift. +- XML must be well-formed (no string-concat shortcuts that break on special chars). Use a tiny XML-escape helper inside b32_sitemap.ts (the existing renderer's HTML-escape is technically a superset and would work too, but a dedicated XML helper is cleaner Layer-1). +- Don't list dynamic/user-specific URLs (/p/:slug, /sama/verify?repo=..., /api/*) — only stable indexable content. +- Cache-Control: public, max-age=3600. Search engines should re-fetch but not hammer. +- Site language English-only. +- GitHub flow via flatpak-spawn (branch → PR → merge → push p620 → deploy via flatpak-spawn --host scripts/p620/deploy-tdd-md.sh). +- Do NOT change any §4 verifier logic. + +Load-bearing files to read FIRST: +- src/a31_blog.ts (the comment at the top confirms ALL_POSTS is meant to drive the sitemap) +- src/a31_sama.ts (ALL_SAMA structure) +- src/d21_app.ts (live route table — confirm which static URLs exist + find a place to register /sitemap.xml + grep for /guides routes) +- src/a31_site_config.ts (canonical base URL constant — use that, don't hard-code "https://tdd.md" in 20 places) +- src/b51_render_layout.ts (the existing escape helper, as reference for the XML-escape function shape) +- public/robots.txt if it exists (check before clobbering) \ No newline at end of file diff --git a/public/sitemap-flow.png b/public/sitemap-flow.png new file mode 100644 index 0000000000000000000000000000000000000000..b27f5f6c3f29af4f403ea9113a089d2e4c10ec29 Binary files /dev/null and b/public/sitemap-flow.png differ diff --git a/public/sitemap-flow.svg b/public/sitemap-flow.svg new file mode 100644 index 0000000000000000000000000000000000000000..642f314be43d0305a987509a37f309b33b2c1337 --- /dev/null +++ b/public/sitemap-flow.svg @@ -0,0 +1,67 @@ + + + + + + data flow — registries → helper → XML response + A new post = a new sitemap entry. Zero human edit. + Per-request generation. The registries are the source of truth; the sitemap is a projection. + + + + + + + a31_blog.ts + ALL_POSTS + { slug, title, description, date }[] + → /blog/<slug> with lastmod = date + + + + a31_sama.ts + ALL_SAMA + { slug: "sorted" | "architecture" | … }[] + → /sama/<slug> + + + + b32_sitemap.ts (const) + STATIC_URLS + ["/", "/blog", "/sama/v2", "/guides", …] + → load-bearing fixed routes + + + + + + + + + + + + + + b32_sitemap.ts · renderSitemap(urls) + pure · deterministic · no I/O · XML-escapes & and < + sibling test — empty list · single url · with/without lastmod · order preserved · special-char escape + + + + + + + + + + GET /sitemap.xml → 200 application/xml; charset=utf-8 · Cache-Control: public, max-age=3600 + <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> + <url><loc>https://tdd.md/blog/sama-v2-workingset-cross-repo-baseline</loc><lastmod>2026-05-27</lastmod></url> … + + + + + No static file. No human edit. The next /sitemap.xml fetch after a deploy already lists every new post. + + diff --git a/public/sitemap-layers.png b/public/sitemap-layers.png new file mode 100644 index 0000000000000000000000000000000000000000..a9179b975f0ed70a1f129295de7d32e8012916bb Binary files /dev/null and b/public/sitemap-layers.png differ diff --git a/public/sitemap-layers.svg b/public/sitemap-layers.svg new file mode 100644 index 0000000000000000000000000000000000000000..a1be800096fba4f2119a65862f89a8f1f19851c3 --- /dev/null +++ b/public/sitemap-layers.svg @@ -0,0 +1,57 @@ + + + + + + /sitemap.xml — feature decomposition + One file per SAMA layer. Layer 2 stays empty. + Three files total — Pure data already exists; Core helper + Entry handler are the only new code. + + + + + + + Layer 3 · Entry + HTTP route — reads registries, calls Core helper, returns Response with XML body and Cache-Control + d21_app.ts → "/sitemap.xml" + + + ↓ uses + + + + + + Layer 2 · Adapter + — not required — registries are already in-process data; no DB, no network, no filesystem read at request time + (empty by design) + + + ↓ uses + + + + + + Layer 1 · Core + pure logic — takes Array<{loc, lastmod?}>, XML-escapes, returns sitemaps.org 0.9 urlset string + b32_sitemap.ts → renderSitemap() + + + ↓ uses + + + + + + Layer 0 · Pure + existing registries — single source of truth for every indexable URL on the site + ALL_POSTS · ALL_SAMA · ALL_GUIDES · BASE_URL + + + + + The Law (§1.2): imports flow downward only. d21 → b32 → a31. No upward edge, no sideways edge — the verifier rejects either. + + diff --git a/src/a31_blog.ts b/src/a31_blog.ts index 5f086fac3ac56a7eee272fb88a437cd7fa1d2773..1fe4f45b16b702562d4d34e2420f83fd8eb6f30f 100644 --- a/src/a31_blog.ts +++ b/src/a31_blog.ts @@ -12,6 +12,12 @@ export interface BlogEntry { } export const ALL_POSTS: BlogEntry[] = [ + { + slug: "sama-v2-sitemap-implementation-plan", + title: "Building /sitemap.xml under SAMA v2 — a Claude Code /goal walkthrough", + description: "This site has 23 blog posts, 4 SAMA pages, a v2 spec, a verifier, and an example library — and zero sitemap. Search engines + AI crawlers walk it by random link-discovery, which is especially bad for the long-tail empirical-chain posts. This is the implementation PLAN, written before the code lands. It works from a structured /goal slash command (the 38-line text checked in at goal.md) that fixes Done-when post-conditions, anti-fudge constraints, and load-bearing files-to-read-first before the agent touches anything. Maps the feature onto the four SAMA v2 layers: d21_app.ts gains a /sitemap.xml route (Entry); b32_sitemap.ts becomes a new pure helper that takes Array<{loc, lastmod?}> and returns XML (Core); Layer 2 stays empty by design because the registries are already in-memory data — no DB, no network, no filesystem at request time; ALL_POSTS + ALL_SAMA + ALL_GUIDES + BASE_URL at Layer 0 (already exists). The structural property the picture surfaces: in idiomatic-WordPress this is 4 files (DB adapter, query, renderer, cache). In layer-disciplined shape with in-memory registries, the adapter dissolves and the feature is one new helper + one new route. Two images: SAMA-layer decomposition (Layer 2 visibly empty) and data flow (three registries fan into helper, helper fans out to Response). Walks the seven §4 checks and shows how each is satisfied by the chosen file layout. Anti-fudge clauses called out explicitly: no second URL list that can drift, no string-concat XML (escape & and <), no static-file commit (the automatic property is load-bearing), no dynamic/user-specific URLs, no verifier change. Companion postmortem will land after the PR merges with actual diff + before/after verifier output + whatever the anti-fudge clauses caught that the plan missed. The point of the post is the workflow — /goal as contract, SAMA v2 as discipline, verifier as gate — not the URL list itself.", + date: "2026-05-25", + }, { slug: "sama-v2-workingset-cross-repo-baseline", title: "Was the dive/ripgrep convergence real? Seven measured workingSetFit datapoints", diff --git a/src/d21_app.ts b/src/d21_app.ts index b8d83927cefe453543418cbf7d22629de64a9d62..2299723c12c0b73a9edc680c5b47afdbe46daa61 100644 --- a/src/d21_app.ts +++ b/src/d21_app.ts @@ -249,6 +249,18 @@ ${url("https://tdd.md/leaderboard", "0.7")} "/sama-metrics.png": new Response(Bun.file("./public/sama-metrics.png"), { headers: { "Content-Type": "image/png", "Cache-Control": "public, max-age=3600" }, }), + "/sitemap-layers.svg": new Response(Bun.file("./public/sitemap-layers.svg"), { + headers: { "Content-Type": "image/svg+xml", "Cache-Control": "public, max-age=3600" }, + }), + "/sitemap-layers.png": new Response(Bun.file("./public/sitemap-layers.png"), { + headers: { "Content-Type": "image/png", "Cache-Control": "public, max-age=3600" }, + }), + "/sitemap-flow.svg": new Response(Bun.file("./public/sitemap-flow.svg"), { + headers: { "Content-Type": "image/svg+xml", "Cache-Control": "public, max-age=3600" }, + }), + "/sitemap-flow.png": new Response(Bun.file("./public/sitemap-flow.png"), { + headers: { "Content-Type": "image/png", "Cache-Control": "public, max-age=3600" }, + }), "/games": htmlResponse(GAMES_INDEX_HTML),