Your Local Qwen3.6 Throughput Probably Just Halved (and How to Fix It)
llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.
4 posts
llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.
Three model releases in three weeks moved local AI from 'good enough for hobbies' to 'good enough for production'. Here's what changed and why it matters.
A practical guide to running Qwen3.6-27B on consumer hardware in 2026 — memory requirements per quant level, recommended runners, and the MTP trick that doubles your tokens per second.
Qwen3.6-27B running locally now scores within 10 points of frontier closed models on SWE-bench Verified. The benchmark table, lined up side by side.