syntaxai/tdd.md · main · research-migration.md

research-migration.md 568 lines · 25096 bytes raw · source

research-migration — porting podman/syntax CMS into SAMA-native tdd.md

Companion to /var/home/scri/Documents/tdd.md/plan.md. Read that first for the high-level mapping; this goes deep on the points plan.md handwaved. All line references are to files in /var/home/scri/Documents/podman/ and /var/home/scri/Documents/tdd.md/.

What I found that plan.md misses

  1. c32_sama_verify.ts enforces stricter rules than plan.md assumed. Layer-prefix whitelist is {11, 13, 14, 21, 31, 32, 51} (line 188). Plan.md proposes c31_image_resize.ts, but sharp(...) is I/O — per content/sama/architecture.md:13-16 resize belongs in c14, OR c32 with sharp passed via DI. Same for plan.md's c31_ai_edit_block.ts (calls OpenRouter — must split into c14+c32).
  2. The verifier's import scanner only inspects relative ./xxx.ts paths (line 119-120). A bare import sharp from "sharp" in a c31 file is invisible to the gate. The "no I/O in c31" rule is discipline, not enforcement.
  3. Atomic threshold is 700 lines (line 309). Two podman files over/at the line on day one: sx-editor/src/client/render.ts (775 — violation), sx-filter/src/shortcodes.ts (650 — one new shortcode tips it). Plan.md doesn't budget these splits.
  4. Placeholder-test detection is part of Atomic (lines 254-298). Every test()/it() body needs ≥1 expect(). Snapshot tests (toMatchSnapshot) qualify but rule it out as the default.
  5. Modeled is asymmetric (lines 219-248). c32 without sibling test = hard violation; c31 missing sibling = informational only. So c31_sxdoc.ts (types) is fine without a test; c32_sxdoc_parse.ts (logic) is not. Plan.md's c31_sxdoc_parse.ts is the wrong layer — the parser is a deterministic transform, not pure types/registry.
  6. Podman uses subdirectories (sxdoc/, core/, db/, client/). tdd.md's src/ is flat (verified: no subdirs). SAMA's verifier doesn't walk subdirs, but the convention bans them — server-side files must flatten into top-level cXX_*.ts. plan.md mentions this only for client/ and only obliquely.
  7. Live-preview cannot be commit-driven. Plan.md picks git-canon (commit on every save), but /admin/preview runs on a ~200ms debounce. The preview path must skip c14_git entirely and render from in-memory sxdoc. Call this out so the handler is shaped correctly from the start.
  8. Ghost-style /blog/{primary_tag}/{slug}/ permalink breaks 9 existing post URLs. Plan.md asks the question but doesn't count. Keep /blog/{slug}/ unless there's a content reason to migrate.

1 — SAMA-verifier compliance

Exact rules (src/c32_sama_verify.ts)

letter rule line
S c1*/c3* must NOT relative-import c5*/c9* (c21 exempt) 149-185
A prefix ∈ {11,13,14,21,31,32,51} 188
M c32_* needs sibling .test.ts (hard); c31_* missing = info only 219-248
A cXX_*.ts ≤ 700 lines; every test() body needs ≥1 expect() 300-326

Verifier walks only cXX_*.ts files; everything else under src/ is ignored. Client-bundle source under src/client/**.ts is therefore out of scope — fine.

Subdirectories

Server code in podman is split across sx-editor/src/{sxdoc,core,db}/ and sx-content/src/{sxdoc,core,db}/. tdd.md is flat: ls src/ returns only cXX_*.ts + .test.ts siblings. SAMA prefix replaces folder semantics. All server-side podman files flatten:

  • sxdoc/types.tsc31_sxdoc.ts
  • sxdoc/html-to-sx.tsc32_sxdoc_parse.ts (+ .test.ts)
  • sxdoc/sx-to-html.tsc32_sxdoc_render.ts (+ .test.ts)
  • sxdoc/db.tsc14_sxdoc_sidecar.ts (Option A) or c14_sxdoc_store.ts (Option B)
  • core/schema.ts + db/sqlite.ts → merge into existing c13_database.ts
  • core/posts.ts (editor & content) → one c13_posts.ts
  • core/settings.ts → extend c31_site_config.ts
  • sxdoc/index.ts (barrel) → DELETE (SAMA bans barrel re-exports per content/sama/atomic.md)

Client-side placement

tdd.md has no precedent for client TS today: public/ holds og.svg, style.css, sama-cli (binary). e2e/ holds Playwright specs. Options for the block-editor client:

  • A. src/client/**.ts — outside verifier glob, relative imports to ../c31_sxdoc.ts work, Bun.build bundles from here. Recommended.
  • B. client/ at repo root — separates browser more clearly; new top-level dir.
  • C. public/src/**.ts — confusing; public/ is "served verbatim".

client/render.ts (775 lines) must split before landing. Natural axis: one file per block-kind (matches the existing blocks/*.ts breakdown) + a small client/render-dispatch.ts switch on block.t.

Test convention

tdd.md tests live as siblings under src/: c31_commits.test.ts, c31_diff_parse.test.ts, c31_edit_validation.test.ts, c31_git_parse.test.ts, c31_commit_meta.test.ts, c31_games.test.ts, c32_anchor_extract.test.ts, c32_edit_resolve.test.ts, c32_sama_verify.test.ts.

Podman's sx-editor/tests/unit.test.ts and sx-content/tests/setup.ts are incompatible — verifier looks for <file>.test.ts next to <file>.ts. Every kept test becomes a sibling file.

E2E remains in e2e/*.spec.ts (Playwright, ignored by verifier).


2 — Storage-model conflict

SxDocument shape (sx-editor/src/sxdoc/types.ts)

{ v: 1, blocks: SxBlock[] }. Single-letter keys (t, c) for compactness (line 1-12). 17 block kinds: p, h, ul, ol, li, quote, code, img, hr, html, shortcode, embed, plus 7 typed marketing blocks (hero, feature-card, feature-grid, stats-row, steps-grid, use-case-card, cta-band). Inline marks b/i/u/s/c; links are inline.

No footnotes, no tables — tables fall through to {t:"html"} escape hatch.

SQLite tables (sx-editor/src/core/schema.ts)

Six Ghost-shaped tables: posts, tags, users, posts_tags, posts_authors, api_keys, settings. Plus sx_documents (one row per post, holds the typed-block JSON): (post_id PK, doc TEXT, doc_version INT, hash TEXT, updated_at TEXT).

Option A (git-canon, default) write flow

POST /admin/edit/blog/foo:

  1. validate + parse form → (markdown_body, sxdoc_json)
  2. c14_git.commitFile({ paths: [ {path:"content/blog/foo.md", content:markdown_body}, {path:"content/blog/foo.sxdoc.json", content:sxdoc_json} ]})needs new commitFiles (multi-path) variant.
  3. mirror to live FS so the next render reflects it.
  4. show "applied · sha XXXXXXX".

Commit message: piggy-back the existing helper buildCommitMessage from c31_commit_meta.ts (already used by c21_handlers_edit.ts:96). Message format stays as today: Edit: <title> by <author> via /admin\n\n<filePath>.

c14_git.commitFile (lines 192-250) is single-path. Extending to multi-path is ~30 added lines — same 5-step flow, with step 3 (read-tree + update-index) looping over paths.

Sidecar regen. Because markdown is canonical and sxdoc is derivable, treat sidecars as cache. If sidecar missing or older than .md, regenerate via marked.parse(md) → htmlToSx(html). Makes the "drop SQLite index, replay git log" rebuild story plan.md mentions actually trivial.

Real-content survey (assessed by full-read of 3 files + grep)

file code fences tables embedded HTML frontmatter
content/home.md (3.2 KB) 0 1 (5 rows) 0 no
content/blog/sama-meets-git-cms.md 4 0 0 no
content/blog/three-constraints-agentic-coding.md 7 0 0 no
content/sama/architecture.md 1 1 (4×4) 0 no
content/sama/skill.md many many 0 YES
(other 13 .md) many mixed 0 no

Confirmed by grep: only content/sama/skill.md has YAML frontmatter (---\nname: …\n---). Other matches for ^--- are markdown horizontal rules (<hr>) inside the body of sama/*.md and a few blog posts — not frontmatter. The migration script must distinguish: frontmatter = ^---\n[a-zA-Z_]+: at byte 0.

What htmlToSx handles vs doesn't (sx-editor/src/sxdoc/html-to-sx.ts)

Block-level handled: p, h1..h6, ul/ol/li, blockquote, pre/code (with language-X detection), img, figure, hr. Container divs (div, section, article) recurse into children. Everything else → {t:"html", src: el.outerHTML} escape hatch (line 183).

Inline handled: <a>, <br>, <strong>/<b>, <em>/<i>, <u>, <s>/<strike>/<del>, <code>. <span>/<font> strip wrapper, keep content.

Implication for our content:

  • Tables → single html block per table. Renders identically but un-editable as discrete blocks. Acceptable.
  • HR (---) → {t:"hr"}. Good.
  • Code fences → {t:"code", lang:"sh", src:"..."}. Good.
  • Quote-blocks (> … markdown) → <blockquote> HTML → {t:"quote", c:[…]}. Good.
  • Frontmatter (skill.md only) — marked doesn't strip it by default in tdd.md's current c51_render_layout.ts:8 call. Pre-check what the live site does today before migrating.
  • Round-trip drift exists: mark order is normalised (sx-to-html.ts:227), <b> collapses to <strong>, whitespace shifts. Acceptable for the migration since markdown stays authoritative.

Option B (SQLite-canon) trade

git as audit-trail disappears. Compensation table: content_history (id, slug, type, doc, html, edited_at, edited_by, msg) — append-only. Has rollback but no cryptographic immutability, no git blame, no PR diffs, no mirror story.

The content/blog/sama-meets-git-cms.md post (149 lines) is the product pitch for "every save = a real commit". B contradicts published copy. Recommend A. Mechanical concerns (multi-path commit, sidecar regen) are small; the "stop saying SAMA meets git" cost is large.


3 — Handlebars-theme port

Helpers used (exhaustive, source: sx-content/src/render.ts)

Handlebars.registerHelper calls at lines 78, 86, 93, 109, 129, 135, 158, 166, 184, 202, 205, 210, 390, 393:

helper line use TS-port effort
asset 78 {{asset "css/syntax.css"}}/assets/... trivial
img_url 86 pass-through today (no transforms) trivial
post_class 93 join class strings from featured/tags trivial
ghost_head 109 5-10 meta/og tags + codeinjection medium — existing c51_render_layout.ts already emits a similar block
ghost_foot 129 code injection footer trivial
date 135 dual-shape formatter (YYYY/MMM/DD) small
content 158 emit body html raw trivial
excerpt 166 strip HTML + truncate N words small
foreach 184 iteration with @index/@first/@last/@even/@odd medium — TS map gets index; rest unused in current .hbs files (confirmed by grep)
tag, author, page, post (block) 202/205/390/393 scope-dive structural; replaced by TS functions that take the scoped object as arg
reading_time 210 "N min read" trivial

Built-in ({{#if}}, {{else}}, {{!-- comment --}}, {{!< layout}}) are template-language features that go away once we render via TS functions; no port needed.

Mismatches: {{#foreach}}'s @first/@last/@even/@odd is the only data plumbing TS map doesn't give for free. Grep of .hbs files confirms none of those data-frame fields are referenced in current templates. Safe to drop in the TS port.

Templates inventory (sx-themes/syntax/)

default.hbs (14 lines — wrapper), index.hbs (757 lines — marketing homepage HTML inline; partial syntax-home.hbs no longer used per sx-editor/src/index.ts:284-289), post.hbs (37), page.hbs (40), tag.hbs (24), author.hbs (24).

assets/css/syntax.css is 812 lines. tdd.md's public/style.css is ~25 KB. Combining is a real CSS pass; classes like .hero-content, .feature-card, .use-case-card, .gradient-text don't exist in tdd.md today.

TS-native equivalents land in

  • c51_render_theme.tsrenderPost(post), renderPage(page), renderTagArchive(tag, posts), renderAuthorArchive(author, posts), renderHomepage(). Each replaces one .hbs file.
  • c51_render_meta.ts (or extend existing c51_render_layout.ts) — ghost_head-equivalent. tdd.md already emits OG/meta in c51_render_layout.ts:49+; combine, don't reimplement.
  • The five small string helpers (asset, date, excerpt, reading_time, post_class) live inline in c51_render_theme.ts as private functions. No external file warranted.

4 — Shortcode-engine port

What sx-filter/src/shortcodes.ts (650 lines) does

BUILT_IN registry at line 546-563. Three categories:

  • Pure (no I/O): ping, now, spec-version, event-validate, catalog-sample, query-demo, catalog-lookup (reads in-process DEMO_CATALOG), emit+demo-flow (writes in-process events.ts ring buffer).
  • HTTP-fetching (external API): github-repo, npm, crate, gist.
  • Ghost-API-fetching: event-count, posts-list (Ghost content API), login-page (Ghost _login-skin page).

Module-level SHARED_EVENT_LOG (line 14) + DEMO_CATALOG (lines 26-107) push the file to 650 lines. One more handler tips it over 700.

SAMA placement

The handlers split by layer:

  • c32: pure regex match, format, validate. query-demo, event-validate, catalog-sample, catalog-lookup, event-count parser (just an int).
  • c14: HTTP wrappers for external APIs. c14_github.ts already exists. New: c14_npm.ts, c14_crates.ts, c14_gist.ts — or one combined c14_package_registries.ts (recommended for fewer files).
  • c13: queries against posts/sx_documents for posts-list etc., extending c13_database.ts.
  • c32_event_log.ts: pure in-memory ring buffer; required only if emit/demo-flow ship.

Where the substitute loop lives

sx-filter/src/index.ts:81-120 does the rewrite:

  1. parse upstream HTML (already-rendered page),
  2. build skip-regions (<meta>, <link>, <script>),
  3. for each SHORTCODE_RE match, call handler, splice output.

This is render-time HTML rewriting, runs after sxdoc → HTML. It's a c51 concern wrapping c14/c32 handlers. Cannot live in c11_server.ts — c11 forbids route logic / HTML rewriting per content/sama/architecture.md:12.

Recommended shape:

  • c32_shortcode_parse.ts (+test) — extract {name, args, range} tokens from text. Pure regex; same pattern as today.
  • Handler functions at their natural layer.
  • c51_render_post.ts calls the parser, dispatches handlers inline (~10 lines for a switch). No central registry; each handler is just a function imported where needed.

Single-process advantage

Podman's filter is a separate Bun service proxying Ghost. tdd.md is one process — substitute is a function call, not an HTTP hop. The ~100 lines of sx-filter/src/index.ts doing upstream-proxy wiring are deleted; the ~30 lines of skip-region + substitute logic move into c51.


5 — File inventory (server-side, podman → tdd.md)

sx-editor/src/

  • ai.ts (317) → c14_openrouter.ts + c32_ai_edit_block.ts — HTTP client (c14), prompt assembly + JSON validation (c32). plan.md's c31_ai_edit_block.ts is wrong layer.
  • build.ts (61) → c14_client_bundle.ts — calls Bun.build, I/O.
  • db.ts (124) → split: SQL into c13_posts.ts; htmlToSx fallback into the handler. plan.md's c14_sxdoc_store.ts is a different file (sx-doc only); db.ts is core posts.
  • index.ts (437) → dispatcher entries in c21_app.ts; per-route handler bodies in c21_handlers_admin_{list,edit,new,upload,ai, preview}.ts. 4-6 files of 80-150 lines.
  • routes.ts (44) → merge into existing c31_site_config.ts.
  • templates.ts (482) → c51_render_admin.ts. At Atomic limit; watch for growth.
  • upload.ts (87) → c14_media.ts.
  • sxdoc/types.ts (240) → c31_sxdoc.ts. Types only; no sibling test (informational only).
  • sxdoc/html-to-sx.ts (315) → c32_sxdoc_parse.ts (+ .test.ts).
  • sxdoc/sx-to-html.ts (266) → c32_sxdoc_render.ts (+ .test.ts).
  • sxdoc/db.ts (64) → c14_sxdoc_sidecar.ts (Option A) or c14_sxdoc_store.ts (Option B). Same shape, different backend.
  • sxdoc/index.ts (14) → DELETE (barrel, SAMA-forbidden).
  • core/posts.ts (148) → merge into c13_posts.ts with content's.
  • core/schema.ts (103) → merge into c13_database.ts.
  • db/sqlite.ts (41) → merge into c13_database.ts.
  • scripts/backfill-sxdoc.tsscripts/migrate_content_to_sxdoc.ts.
  • scripts/import-homepage.ts → discard.

sx-content/src/

  • db.ts (11) → merge into c13_database.ts. Trivial.
  • images.ts (125) → c14_media.ts (combined with upload.ts). sharp is I/O — c14, not c31 as plan.md proposed.
  • index.ts (536) → c21_handlers_content.ts (+ optional c21_handlers_ghost_api.ts — see open question 7).
  • posts.ts (140) → merge into single c13_posts.ts.
  • render.ts (398) → c51_render_theme.ts. Drops the Handlebars dep.
  • routes.ts (199) → split: URL patterns into c31_site_config.ts, classifyUrl logic into c32_url_classify.ts (+ test).
  • sitemap.ts (134) → c51_render_sitemap.ts + c21_handlers_sitemap.ts.
  • sxdoc/* (913 total) → duplicates of editor's; single source of truth in tdd.md, both reads and writes use the same c31/c32/c14 triplet.
  • core/posts.ts (254), core/schema.ts (101), core/settings.ts (118), db/sqlite.ts (43) → merge as listed for editor's equivalents; core/settings.ts extends c31_site_config.ts.

sx-filter/src/

  • admin.ts (114) → DELETE. tdd.md has real auth; no injection.
  • events.ts (211) → c32_event_log.ts if event-demo shortcodes ship. Otherwise DELETE.
  • index.ts (379) → discard proxy logic; substitute loop moves to c51 (described in §4).
  • login-page-skin.html (174), login-page-template.ts (205) → DELETE (syntax.ai demo asset).
  • shortcodes.ts (650) → c32_shortcode_parse.ts + handler files at natural layers + dispatch inline in c51. Demo shortcodes (event-* / catalog-* / login-page) are open question 4.

Non-clean mappings flagged

  • Two big dispatcher files (editor index.ts 437, content index.ts
    1. must split: dispatcher entries go into c21_app.ts, handler bodies into per-domain c21_handlers_*.ts.
  • sxdoc/ duplicated between editor and content services — keep one copy in tdd.md.
  • core/schema.ts, db/sqlite.ts duplicated — one copy.
  • marked already in tdd.md deps (c51_render_layout.ts:8). The migration uses it; after cutover see open question 12.

Client (sx-editor/src/client/**)

Lands at src/client/** (outside verifier glob). Sizes preserved. Key file: render.ts (775 — must split before landing). Natural split per-block-kind matches the existing blocks/* and blocks/typed/* breakdown.

Open: slashmenu.ts (590) vs slashmenu-v2.ts (216) — figure out which is canonical before porting.


6 — Content migration mechanics

Algorithm

// scripts/migrate_content_to_sxdoc.ts
for (const file of glob("content/**/*.md")) {
  if (file.startsWith("content/games/"))       continue;
  if (file.startsWith("content/git-history/")) continue;
  const raw = await Bun.file(file).text();
  const { fm, body } = splitFrontmatter(raw); // skill.md only
  const html = await marked.parse(body, { gfm: true, breaks: false });
  let doc: SxDocument;
  try { doc = htmlToSx(html); }
  catch (e) {
    // Fallback: single html-block holding the markdown-rendered HTML.
    doc = { v: 1, blocks: [{ t: "html", src: html }] };
  }
  const sxdocPath = file.replace(/\.md$/, ".sxdoc.json");
  await Bun.write(sxdocPath, JSON.stringify(doc, null, 2));
}
// one batched commit
git add content/**/*.sxdoc.json
git commit -m "Migrate content to sxdoc sidecars (one-time)"

Edge cases

  • Tables → single {t:"html"} block per table. Renders identically; un-editable as discrete blocks in the block editor. Acceptable.
  • Frontmatter (skill.md) → strip first, parse body. Decide separately what happens to the name:/description: fields: today they probably render as visible text via marked. Pre-check live site behaviour before migrating.
  • HR (--- mid-document) is NOT frontmatter. Frontmatter pattern: /^---\n[a-zA-Z_]+:/ at byte 0.
  • Parse fail → escape hatch as shown. Page still renders (sx-to-html.ts:60-62 emits raw HTML untouched). Editor surfaces "open /edit-raw/... for this section".
  • Code fences all currently sh / ts / text — parsed by parseLangFromClass (line 296-298) into {t:"code", lang, src}. No issue.
  • Round-trip drift<b> collapses to <strong>, mark order normalised. Acceptable since .md stays authoritative.

Commit strategy: single batch

18 files → one "Migrate: content → sxdoc" commit. Per-file commits add noise without informational value. Future re-migration after parser improvements stays a single revertable commit.

Games confirmed out of scope

content/games/{fizzbuzz,string-calc}/ are multi-file units: spec.md + spec.ts + hidden/. Read by c31_games.ts directly, not by the CMS; the companion .ts and hidden/ directory make the post/page abstraction wrong. Keep games entirely outside the CMS. No /edit/games/... route should exist — edit via vim+git like source code.

git-history out of scope

content/git-history/syntaxai__tdd.md{,.tests}.json (160 KB total) are generated artifacts read by c32_real_reports.ts / c32_real_tests.ts. Not content.


Open beslismomenten voor de mens

  1. Storage canon — A (git-canon) or B (SQLite-canon)? plan.md defaults A; my read supports A (the existing content/blog/sama-meets-git-cms.md is the product pitch and contradicts B). Confirm A, or pick B and accept rewriting that post + memory update.

  2. sxdoc parser layer — c31 or c32? plan.md says c31; I argue c32 (deterministic transform with logic, not pure types/registry). Affects file name and whether sibling tests are mandatory (c32 yes, c31 informational).

  3. Single-commit vs two-commit per editor save (Option A). Either extend c14_git.commitFile to multi-path (recommended, ~30 LOC) OR write .md and .sxdoc.json as two commits (simpler, doubled log noise, atomicity hole if step 2 fails).

  4. Ship the syntax.ai event-demo shortcodes? emit, catalog-lookup, demo-flow, login-page, event-validate, catalog-sample, query-demo, event-count, posts-list. These exist for syntax.ai's product story; tdd.md is a different product. Default: off. Saves ~500 LOC (skip events.ts port + 5 handler files + the DEMO_CATALOG constant).

  5. Ghost-style permalink /blog/{primary_tag}/{slug}/ vs current /blog/{slug}/? Switching costs 9 redirects in c21_app.ts and breaks external links. Recommend keep current.

  6. Typed marketing blocks (hero, feature-card, feature-grid, stats-row, steps-grid, use-case-card, cta-band) — port? tdd.md's home.md is text + 1 table + 1 list — none would apply unless we redesign the homepage. Default: skip. Saves 600 LOC across c31_sxdoc.ts (80 lines smaller) + c32_sxdoc_render.ts (typed renderers) + client/blocks/typed/*.ts (7 files).

  7. Ghost Content API compatibility surface (/ghost/api/content/{posts,pages}/...) — keep? sx-content/src/index.ts:78-115. No consumers today. Default: drop. Saves ~150 LOC.

  8. Client-side TS placement — src/client/, client/, or public/src/? Recommend src/client/. Affects bundler paths and Playwright fixture wiring.

  9. client/render.ts (775) split shape. Per-block-kind (render-p.ts, render-h.ts, …, 12 small files) or by sub-system (render-blocks.ts, render-marks.ts, render-typed.ts, 3 medium files). Affects readability vs file count.

  10. c32 parser tests — snapshot vs explicit-assertion? Snapshot (toMatchSnapshot) qualifies under the placeholder-test check, but explicit asserts are more readable. Decide before writing.

  11. OPENROUTER_API_KEY in prod (plan.md open Q3). Still open. AI ✨ returns 503 with hint when unset (sx-editor/src/index.ts:367-369). Acceptable to ship without the key in prod.

  12. Keep marked post-migration? marked is used during migration (md → html before sxdoc parse) and currently at runtime by c51_render_layout.ts:8. After cutover, sxdoc → HTML is the new render path. Decide: keep marked as a runtime dep for legacy paths, or vendor a tiny md-to-blocks shim inside the migration script and drop marked entirely.

  13. /admin/preview rendering path. Plan.md doesn't address that preview cannot go through c14_git.commitFile (debounce too tight). Handler must take in-memory sxdoc and call c32_sxdoc_renderc51_render_theme directly. Shape the handler accordingly from the start; don't refactor later.