Your Local Qwen3.6 Throughput Probably Just Halved (and How to Fix It)
llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.
2 posts
llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.
A practical guide to running Qwen3.6-27B on consumer hardware in 2026 — memory requirements per quant level, recommended runners, and the MTP trick that doubles your tokens per second.