Engineering process quality arc — AC gates, smoke rules, UX audit automation
Engineering process quality arc — AC gates, smoke rules, UX audit automation
Status: Delivered
CAS: CAS-2731
Delivered: 2026-05-14
PRs: #724, #725, #726, #727, #728, #729, #730, #731
What’s new
A full arc of process-quality improvements shipped in a single coordinated batch. Tasks now require acceptance criteria before implementation begins. UI PRs require a UX walk and visual evidence before merge. Every stateful surface requires a two-action smoke test. Four test layers — Storybook Loki, axe a11y, Playwright visual sanity, and Eivind’s device walk — run automatically on every relevant PR. The result: the quality checks that used to depend on someone remembering to do them are now structural requirements.
Components delivered
| CAS | What shipped | Doc |
|---|---|---|
| CAS-2732 | Acceptance criteria required on every task | doc |
| CAS-2733 | Two-action smoke gate for stateful surfaces | doc |
| CAS-2734 | UX walk auto-fires on every UI PR | doc |
| CAS-2737 | Eivind’s UX walk skill + autonomous Paperclip routine | doc |
| CAS-2740 | Storybook Loki + test-runner + axe CI infrastructure | doc |
| CAS-2741 | Storybook page-level stories backfilled | doc |
| CAS-2742 | Playwright visual sanity + axe per route | doc |
| CAS-2743 | Astrid UX advisor on every UI PR | doc |
Why we built it
A post-mortem on the iOS TestFlight arcs (CAS-2460–2587) showed a consistent pattern: bugs that should have been caught at review time were instead caught by the regent on a physical device after a build shipped. The root causes were:
- Tasks without acceptance criteria → no agreed success condition → “it works” was self-assessed.
- UI PRs without UX review → layout bugs invisible in code review.
- No visual regression baseline → pixel-level drift undetected.
- No a11y gate → contrast and semantic markup regressions undetected.
- No automated stateful-surface smoke → race conditions and broken flows shipped.
CAS-2731 addressed all five root causes in a single coordinated effort. Each child issue is independently useful, but their combined effect is a quality pipeline that catches at review time what used to be caught in production.
What changed under the hood
See each child epic’s feature doc for the technical details. At the process level:
- Skill layer:
casaconomy-task-worktree,casaconomy-review-protocol,casaconomy-planningall updated to enforce the new gates. - CI layer: Two new GitHub Actions jobs (
playwright-visual-sanity,storybook-loki);axe-playwrightand@storybook/test-runneradded to devDependencies. - Agent layer: Eivind’s Paperclip routine fires every 30 minutes; Astrid’s UX advisory fires on every UI PR via
astrid-ux-advisorskill. - Docs layer:
docs/ux/ux-ui-guidelines-v0.md,docs/ux-baseline/,docs/testing-strategy.mdupdated.
Known limitations / follow-on work
- Eivind’s walk is skill-guided but not yet fully scripted. Full Simulator automation is deferred.
- The
playwright-visual-sanityscreenshot baseline requires a first clean run with--update-snapshotsbefore drift detection kicks in. - Storybook page-level stories cover 7 surfaces; remaining pages are follow-on work (CAS-2741 tracks the backlog).