All Case Studies

The Work

Every project documented with full methodology, data tables, what worked, what failed, and all deliverables linked. Updated as new work ships.

Landing Page Simulation  ·  March 7, 2026
Talk Stories: Landing Page to 6.65/10
5 rounds of synthetic persona simulation lifted conversion intent by 55% in a single day. No real users. No waiting. All models ran locally.
+55%
Intent lift
5
Sim rounds
100
Persona evals
4
Variants tested
2x
Share rate

Key Findings

  • Subtraction beat addition in round one — removing 3 things (ghostwriter label, "beta," scary Slack line) lifted intent by 1.75 points
  • "Voice Engine" won the framing test at 7.35/10 — "Story Engineer" failed for the same reason abstract labels always fail
  • Security section dropped privacy objections from 35% to 25% in one iteration — mechanisms, not reassurances
  • Testimonials doubled the share rate — engineered to the exact objection, not generic praise
  • The page hit a copy ceiling at 6.65/10 — the remaining objection requires product experience, not more words
AI Infrastructure  ·  March 6–7, 2026
Local LLM Eval Farm: 24 Models, Zero Cloud
Complete evaluation infrastructure on a single Mac Studio. Six eval dimensions, a 5.8x throughput advantage discovered in a different backend, and a routing system built on real data.
24
Models
6
Eval dims
5.8x
MLX vs Ollama
88k
Peak tok/s
$0
Cloud cost

Key Findings

  • Size is not quality for conversation — qwen2.5:7b (5GB) scored 100% multi-turn; llama3.3:70b (42GB) scored 47.8%
  • MLX delivers 5.8x aggregate throughput vs Ollama at 32 concurrent users — invisible at low concurrency, massive at scale
  • --decode-concurrency 8 made things worse — MLX's dynamic batcher outperforms any fixed value
  • qwen2.5:7b wins on value — 80.6% quality, 100% multi-turn, 93% domain, 10k tok/s, 5GB
  • When every model scores 0%, the task is broken — found and fixed a wrong answer key mid-run