#qwen — Hans Christian Thjømøe

selective focus photography of white animal

May 22, 2026

One llama.cpp Flag Turns MTP From Dead Weight to 68% Faster

The --spec-draft-p-min filter in llama.cpp PR #22397 rescues MTP for Qwen3.6-27B: 48.9 tok/s vs 29 tok/s at 2000 tokens on a 24GB card.

Close-up portrait of a llama looking directly at the camera

May 16, 2026

Your Local Qwen3.6 Throughput Probably Just Halved (and How to Fix It)

llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.

Close-up of server rack components with cables and indicator lights

May 15, 2026

The Local AI Inflection Point: May 2026

Three model releases in three weeks moved local AI from 'good enough for hobbies' to 'good enough for production'. Here's what changed and why it matters.

Close-up of computer hardware showing a GPU and motherboard components

May 11, 2026

Running Qwen3.6-27B Locally: Hardware, Quantization, and What Actually Works

A practical guide to running Qwen3.6-27B on consumer hardware in 2026 — memory requirements per quant level, recommended runners, and the MTP trick that doubles your tokens per second.

Laptop screen displaying performance analytics graphs and dashboards

May 8, 2026

A 27B Model on a Single GPU Is 10 Points Off Claude Opus 4.7

Qwen3.6-27B running locally now scores within 10 points of frontier closed models on SWE-bench Verified. The benchmark table, lined up side by side.

← All posts