Add automatically-generated /sitemap.xml from existing registries

Status: ✓ shipped · Date: 2026-05-25 · PR: #40 · Commit: 3280af8

Related posts: sama-v2-sitemap-implementation-plan


Goal: Add an automatically-generated /sitemap.xml so search engines and AI crawlers can index the full site without a hand-maintained URL list. The sitemap is generated on demand from the existing registries (ALL_POSTS, ALL_SAMA, the route table, the guides list wherever it lives), so a new blog post or discipline page lands in the sitemap immediately on deploy with zero human edit. Note: src/a31_blog.ts already declares in its top comment that ALL_POSTS "drives /blog, /blog/:slug, and the sitemap" — this goal makes that comment true.

Done when:

  • A new route /sitemap.xml returns 200 with Content-Type "application/xml; charset=utf-8" and a valid sitemaps.org 0.9 document: https://tdd.md/...[YYYY-MM-DD] ...
  • URLs are derived from the registries (no hand-maintained slug list):
    • Every entry in ALL_POSTS → /blog/ with = the post's date field.
    • Every entry in ALL_SAMA → /sama/.
    • Every guide entry from whichever registry exists (search for ALL_GUIDES, GUIDES, or grep src/d21_app.ts for /guides routes).
    • Static load-bearing URLs: /, /blog, /games, /leaderboard, /sama, /sama/v2, /sama/v2/verify, /sama/v2/example-crud, /sama/v2/example-wordpress, /sama/skill, /guides. These can stay as a small const list in the new helper (each one corresponds to a literal route in d21_app.ts).
  • All URLs use the absolute base https://tdd.md (use the constant from src/a31_site_config.ts if one exists).
  • A new pure Layer 1 helper at src/b32_sitemap.ts takes Array<{ loc: string; lastmod?: string }> → returns the well-formed XML string. No I/O; deterministic output. Sibling test covers: empty list → valid urlset with no children; single URL with lastmod; single URL without lastmod; multiple URLs preserve order; XML-escape any & or < in URLs (rare here but the helper must be safe).
  • The handler is a single closure registered in src/d21_app.ts (or split into d21_handlers_sitemap.ts if it grows). Imports ALL_POSTS + ALL_SAMA + the static list, calls the helper, returns the Response with Cache-Control "public, max-age=3600".
  • /robots.txt updated to include "Sitemap: https://tdd.md/sitemap.xml" at the end. If it doesn't exist yet, create the minimal: "User-agent: *\nAllow: /\nSitemap: https://tdd.md/sitemap.xml".
  • The sitemap is NOT committed as a static file — it's generated per-request (or once at process startup). New blog post → next sitemap fetch already includes it without any human edit. This is the load-bearing "automatic" property.
  • All 367+ tests still pass; new helper test adds ~6-8 cases.
  • /sama/v2/verify still reports 7/7 ✓ (anti-fudge).
  • Deployed; live-verify: curl https://tdd.md/sitemap.xml returns 200 + valid XML; the response includes /blog/2026-05/sama-v2-workingset-cross-repo-baseline (the most recent post); /robots.txt references the sitemap.

Constraints (anti-fudge):

  • URLs MUST come from existing registries — no second source of truth that can drift.
  • XML must be well-formed (no string-concat shortcuts that break on special chars). Use a tiny XML-escape helper inside b32_sitemap.ts (the existing renderer's HTML-escape is technically a superset and would work too, but a dedicated XML helper is cleaner Layer-1).
  • Don't list dynamic/user-specific URLs (/p/:slug, /sama/verify?repo=..., /api/*) — only stable indexable content.
  • Cache-Control: public, max-age=3600. Search engines should re-fetch but not hammer.
  • Site language English-only.
  • GitHub flow via flatpak-spawn (branch → PR → merge → push p620 → deploy via flatpak-spawn --host scripts/p620/deploy-tdd-md.sh).
  • Do NOT change any §4 verifier logic.

Load-bearing files to read FIRST:

  • src/a31_blog.ts (the comment at the top confirms ALL_POSTS is meant to drive the sitemap)
  • src/a31_sama.ts (ALL_SAMA structure)
  • src/d21_app.ts (live route table — confirm which static URLs exist + find a place to register /sitemap.xml + grep for /guides routes)
  • src/a31_site_config.ts (canonical base URL constant — use that, don't hard-code "https://tdd.md" in 20 places)
  • src/b51_render_layout.ts (the existing escape helper, as reference for the XML-escape function shape)
  • public/robots.txt if it exists (check before clobbering)