Model Evaluation

Qwen3.5-9B-DeepSeek-V4-Flash · Q5_K_M

A 9B Qwen 3.5 finetune distilled from DeepSeek V4, benchmarked side-by-side with the official Qwen3.5-9B base at the same Q5_K_M quant. The distill thinks tighter on agentic tasks AND produces visibly better one-shot creative front-end design — open the A/B buttons on the cards below and judge for yourself. Tool-calling and throughput tie. Full breakdown in the report.

0 / 5distill cap hits
3 / 5base cap hits
2.2×faster agentic
5/6tools (both)
142tok/s (both)

Web design · open to preview

SaaS landing pagePrism — AI observability
distill: 44.2 KB · same prompt, both runs
Analytics dashboardLight theme, emerald accent
distill: 41.1 KB · same prompt, both runs
Designer portfolioMaya Chen — kinetic typography
distill: 18.0 KB · same prompt, both runs
Pricing page3 tiers + animated toggle + FAQ
distill: 25.6 KB · same prompt, both runs
Mobile app marketingStillwater — CSS-only iPhone mock
distill: 30.5 KB · same prompt, both runs

Agentic reasoning · text output

Code debug (4 bugs)k-th smallest element
3170 tok · 22 s
Multi-step planningURL shortener deploy plan
2899 tok · 20 s
Self-critique loopPalindrome · O(n³) → O(n²)
1969 tok · 14 s
Structured JSON extractionCalendar + roster from prose
4353 tok · 30 s
Tool-use planningWeather + flights + hotel
1415 tok · 10 s