Tool

LM Studio

Björn

04 Mar, 2026 – 1 min read

Forget everything I told you about Ollama or llama.cpp. On Mac, you have to use LM Studio for LLM inference. Using MLX simply doubles the speed while reducing memory consumption.

Here are some examples: With llama.cpp, I ran the Qwen3.5-35-a3b model (GGUF-4Bit) on my M3 Pro at about 25 tokens per second. With LM Studio, the Qwen3.5-35-a3b model (MLX-4Bit) ran at about 50 tokens per second on my M3 Pro.

In addition, the MLX model uses less memory and less energy during inference. Twice as fast and more economical. That’s amazing. Don’t make the same mistake I did and use LM Studio right away.

Older post

Qwen 3.5 size vs score

Newer post

LM Studio

Qwen 3.5 size vs score

Skills

MindCraft Studio

Skills

LM Studio

LM Studio

Related