Skip to content

Engineering process quality arc — AC gates, smoke rules, UX audit automation

Engineering process quality arc — AC gates, smoke rules, UX audit automation

Status: Delivered
CAS: CAS-2731
Delivered: 2026-05-14
PRs: #724, #725, #726, #727, #728, #729, #730, #731

What’s new

A full arc of process-quality improvements shipped in a single coordinated batch. Tasks now require acceptance criteria before implementation begins. UI PRs require a UX walk and visual evidence before merge. Every stateful surface requires a two-action smoke test. Four test layers — Storybook Loki, axe a11y, Playwright visual sanity, and Eivind’s device walk — run automatically on every relevant PR. The result: the quality checks that used to depend on someone remembering to do them are now structural requirements.

Components delivered

CASWhat shippedDoc
CAS-2732Acceptance criteria required on every taskdoc
CAS-2733Two-action smoke gate for stateful surfacesdoc
CAS-2734UX walk auto-fires on every UI PRdoc
CAS-2737Eivind’s UX walk skill + autonomous Paperclip routinedoc
CAS-2740Storybook Loki + test-runner + axe CI infrastructuredoc
CAS-2741Storybook page-level stories backfilleddoc
CAS-2742Playwright visual sanity + axe per routedoc
CAS-2743Astrid UX advisor on every UI PRdoc

Why we built it

A post-mortem on the iOS TestFlight arcs (CAS-2460–2587) showed a consistent pattern: bugs that should have been caught at review time were instead caught by the regent on a physical device after a build shipped. The root causes were:

  1. Tasks without acceptance criteria → no agreed success condition → “it works” was self-assessed.
  2. UI PRs without UX review → layout bugs invisible in code review.
  3. No visual regression baseline → pixel-level drift undetected.
  4. No a11y gate → contrast and semantic markup regressions undetected.
  5. No automated stateful-surface smoke → race conditions and broken flows shipped.

CAS-2731 addressed all five root causes in a single coordinated effort. Each child issue is independently useful, but their combined effect is a quality pipeline that catches at review time what used to be caught in production.

What changed under the hood

See each child epic’s feature doc for the technical details. At the process level:

  • Skill layer: casaconomy-task-worktree, casaconomy-review-protocol, casaconomy-planning all updated to enforce the new gates.
  • CI layer: Two new GitHub Actions jobs (playwright-visual-sanity, storybook-loki); axe-playwright and @storybook/test-runner added to devDependencies.
  • Agent layer: Eivind’s Paperclip routine fires every 30 minutes; Astrid’s UX advisory fires on every UI PR via astrid-ux-advisor skill.
  • Docs layer: docs/ux/ux-ui-guidelines-v0.md, docs/ux-baseline/, docs/testing-strategy.md updated.

Known limitations / follow-on work

  • Eivind’s walk is skill-guided but not yet fully scripted. Full Simulator automation is deferred.
  • The playwright-visual-sanity screenshot baseline requires a first clean run with --update-snapshots before drift detection kicks in.
  • Storybook page-level stories cover 7 surfaces; remaining pages are follow-on work (CAS-2741 tracks the backlog).