#llama.cpp — Hans Christian Thjømøe

Close-up portrait of a llama looking directly at the camera

May 16, 2026

Your Local Qwen3.6 Throughput Probably Just Halved (and How to Fix It)

llama.cpp renamed the MTP flag on May 13. The old --spec-type mtp is silently ignored. If your tok/s dropped from 140 to 70 you are likely running without speculative decoding.

Close-up of computer hardware showing a GPU and motherboard components

May 11, 2026

Running Qwen3.6-27B Locally: Hardware, Quantization, and What Actually Works

A practical guide to running Qwen3.6-27B on consumer hardware in 2026 — memory requirements per quant level, recommended runners, and the MTP trick that doubles your tokens per second.

← All posts