Work — Rak

Landing Page Simulation · March 7, 2026

Talk Stories: Landing Page to 6.65/10

5 rounds of synthetic persona simulation lifted conversion intent by 55% in a single day. No real users. No waiting. All models ran locally.

+55%

Intent lift

Sim rounds

100

Persona evals

Variants tested

Share rate

Key Findings

Subtraction beat addition in round one — removing 3 things (ghostwriter label, "beta," scary Slack line) lifted intent by 1.75 points
"Voice Engine" won the framing test at 7.35/10 — "Story Engineer" failed for the same reason abstract labels always fail
Security section dropped privacy objections from 35% to 25% in one iteration — mechanisms, not reassurances
Testimonials doubled the share rate — engineered to the exact objection, not generic praise
The page hit a copy ceiling at 6.65/10 — the remaining objection requires product experience, not more words

Deliverables

Final

Talk Stories Landing Page v5

Production-ready — all 5 sim rounds applied. Zero em dashes, proofread, ship-ready.

↗

Page

Talk Stories Landing Page v4

Security section + grounded voice examples. Useful for A/B reference.

↗

Ref

Talk Stories Landing Page v3

Post-framing experiment — early access, no ghostwriter label.

↗

Read the Case Study →

AI Infrastructure · March 6–7, 2026

Local LLM Eval Farm: 24 Models, Zero Cloud

Complete evaluation infrastructure on a single Mac Studio. Six eval dimensions, a 5.8x throughput advantage discovered in a different backend, and a routing system built on real data.

Models

Eval dims

5.8x

MLX vs Ollama

88k

Peak tok/s

Cloud cost

Key Findings

Size is not quality for conversation — qwen2.5:7b (5GB) scored 100% multi-turn; llama3.3:70b (42GB) scored 47.8%
MLX delivers 5.8x aggregate throughput vs Ollama at 32 concurrent users — invisible at low concurrency, massive at scale
--decode-concurrency 8 made things worse — MLX's dynamic batcher outperforms any fixed value
qwen2.5:7b wins on value — 80.6% quality, 100% multi-turn, 93% domain, 10k tok/s, 5GB
When every model scores 0%, the task is broken — found and fixed a wrong answer key mid-run

Read the Case Study →

The Work

Key Findings

Deliverables

Key Findings