Model Evaluation

Qwen3.5-9B-DeepSeek-V4-Flash · Q5_K_M

eval by Kyle Hessling · model by Jackrong

A 9B Qwen 3.5 finetune distilled from DeepSeek V4, benchmarked side-by-side with the official Qwen3.5-9B base at the same Q5_K_M quant. The distill thinks tighter on agentic tasks AND produces visibly better one-shot creative front-end design — open the A/B buttons on the cards below and judge for yourself. Tool-calling and throughput tie. Full breakdown in the report.

Read the full report → Model on Hugging Face More from Jackrong → Follow @KyleHessling1

0 / 5distill cap hits

3 / 5base cap hits

2.2×faster agentic

5/6tools (both)

142tok/s (both)

Web design · open to preview

SaaS landing pagePrism — AI observability

Distill15347 tok · 109 s Base9849 tok · 68 s

distill: 44.2 KB · same prompt, both runs

Analytics dashboardLight theme, emerald accent

Distill13032 tok · 93 s Base13187 tok · 91 s

distill: 41.1 KB · same prompt, both runs

Designer portfolioMaya Chen — kinetic typography

Distill6213 tok · 44 s Base5930 tok · 41 s

distill: 18.0 KB · same prompt, both runs

Pricing page3 tiers + animated toggle + FAQ

Distill8367 tok · 59 s Base9503 tok · 65 s

distill: 25.6 KB · same prompt, both runs

Mobile app marketingStillwater — CSS-only iPhone mock

Distill10161 tok · 72 s Base32000 tok · 228 s ⚠

distill: 30.5 KB · same prompt, both runs

Agentic reasoning · text output

Code debug (4 bugs)k-th smallest element

3170 tok · 22 s

Multi-step planningURL shortener deploy plan

2899 tok · 20 s

Self-critique loopPalindrome · O(n³) → O(n²)

1969 tok · 14 s

Structured JSON extractionCalendar + roster from prose

4353 tok · 30 s

Tool-use planningWeather + flights + hotel

1415 tok · 10 s