Skip to content
Jordan Goulart

AI workflowJun 1, 2026

Qualyx: an AI UX scorecard, dressed by Heurix

A UX scorecard that runs a heuristic review of any screen — Nielsen's 10 heuristics plus Krug, scored by Claude as a baseline for a human reviewer. It's also the one product the Heurix design system was built to dress.

#ux#usability-heuristics#claude#ai-evaluation#express#design-systems
Qualyx: an AI UX scorecard, dressed by Heurix

Brief

Build a UX-evaluation tool that actually works: paste a URL, get a structured heuristic scorecard you can trust enough to start from — fast, repeatable, and organised by project.

Initial idea

A heuristic review is mechanical at the start (run the ten heuristics, note the obvious) and judgment at the end. Hand the mechanical first pass to Claude as a baseline and keep the verdict human. The tool's job is to make that first pass fast and consistent — never to pretend it's the final word.

Decisions

  1. 01The rubric is fixed and explicit: Nielsen's 10 usability heuristics, scored one by one, with Krug's 'Don't Make Me Think' as a lens across all of them. No vague 'UX score' — every point ties to a named heuristic.
  2. 02Three evaluation scopes — page, flow, component — change how the model reasons. A single button isn't judged on 'Help and documentation'; the scope tells the model which heuristics are structurally N/A and to say so out loud.
  3. 03The AI suggests a starting baseline, not a final verdict. The system prompt says it, the UI frames it, and confidence is reported per criterion so a reviewer knows where to look first.
  4. 04A screen becomes evaluable in two steps: scrape the HTML structure (cheerio) and capture a screenshot (Puppeteer), then hand the structure to claude-sonnet-4-6.
  5. 05Evaluations live inside projects — one per product, squad, or client — so a scorecard is never an orphan; it sits in context with everything else run against the same thing.
  6. 06One backend, two homes. Locally it writes JSON files and screenshots to disk; on Vercel it auto-detects and switches to Upstash Redis + Vercel Blob + @sparticuz/chromium, with no code change.
  7. 07Every surface is built from the Heurix design system. EvalRow, SuggestionCard, AIBlock, StatsTile — the organisms ship from the system; Qualyx only composes them, never restyles them.

Conclusion

Qualyx is the product Heurix kept hinting at — the EvalRow and AIBlock in that system exist because this app needed them. Building the two as a pair paid off: the hard UI calls were made once, in the system, and the app stayed thin enough to focus on the real problem — turning a messy screen into a heuristic baseline a human can run with.

Qualyx is the answer to a question the Heurix design system kept raising. That system ships organisms with oddly specific names — `EvalRow`, `SuggestionCard`, `AIBlock` — and this is the app they were named for. The product and the system were built as a pair, which is why the app stays so thin: the hard interface decisions already live in the system, and Qualyx just composes them.

The interesting constraint is the one the tool puts on itself. It runs Nielsen's ten heuristics through Claude and returns a score per criterion — but the system prompt, the confidence flags, and the UI all insist on the same thing: this is a baseline, not a verdict. A heuristic review is mechanical at the start and judgment at the end; Qualyx automates the start so a human can spend their attention on the end.

Live components

The Qualyx scorecard, composed from the real Heurix design-system components. Pick a scope, run the evaluation, and read the AI baseline — a suggestion, not a verdict.